Tag: Sameer Mhaisekar

Autogrowth on SCOM Operational DB?

This is another of the hot topics I find with differences in opinion among the experts.

The other one we discussed was Windows Agents and Failover – Debunking the Myth!

Should you enable autogrowth on SCOM Operational Database?

I did some some research online and consulted some of the best SCOM experts I know and put together an article that explains why you would NOT want to autogrow your SCOM DB.

The short version is:

DO NOT autogrow your SCOM Operational DB, unless you absolutely need to. Autogrowing DB comes with its own set of disadvantages and might affect the performance of the DB.

So, choose the size of your DB very carefully while you are designing your Management Group!

The longer and more detailed version is here:

Should You Enable Autogrowth on SCOM Operations Database?


PS. Special thanks to Stoyan Chalakov and “SCOM Bob” Cornelissen for reviewing the article and suggesting edits! 🙂

Management Server Frequently Greying Out?

I have seen this issue happening a number of times now. The cause of this can be a few things going wrong, but as part of the troubleshooting I’ve noticed a way that works almost every time, if it applies.

Problem :

All of the sudden, the management server(s) greys out. You check the services, all the services are running. Still for good measure, you restart the services – but no use. You then also try flushing the health state folder cache on the affected MS. And sure, the MS becomes healthy again.

But again after some time you notice that the MS has greyed out. You repeat the process of flushing the cache, it becomes green, and after some time becomes grey again. This cycle continues.

In the event log you may see several events, but not sure where to start. Now these can be any events that may actually be the cause of the problem, or maybe the consequence of it. That’s why you need to read carefully through each of them and find out what event is exactly the problem and which ones are the consequences.

The event we’re discussing here is one particular event 4502. Now this event ID is logged for a number of different reasons and with different descriptions. The one we’re looking for goes something like this (sample only, your descriptions would change acoordingly):

A module of type "Microsoft.EnterpriseManagement.Mom.Modules.SubscriptionDataSource.InstanceSpaceSubscriptionDataSource" reported an exception System.ArgumentNullException: Value cannot be null.

Parameter name: value

   at System.Collections.CollectionBase.OnValidate(Object value)

   at System.Collections.CollectionBase.System.Collections.IList.Add(Object value)

   at Microsoft.EnterpriseManagement.Mom.Modules.SubscriptionDataSource.HttpRESTClient.PostDataAsync(Byte[] data, Object context)

   at Microsoft.EnterpriseManagement.Mom.Modules.SubscriptionDataSource.SubscriptionDataSource`2.WriteToCloud(List`1 items, DateTime firstTryDateTime)

   at Microsoft.EnterpriseManagement.Mom.Modules.SubscriptionDataSource.SubscriptionDataSource`2.PostAsync(List`1 items, DateTime firstTryDateTime) which was running as part of rule "Microsoft.SystemCenter.CollectInstanceSpace" running for instance "All Management Servers Resource Pool" with id:"{4932D8F0-C8E2-2F4B-288E-3ED98A340B9F}" in management group "MG".

These events may come in conjuncture with several others, but I like to fix this one first, as it solves the problem most of the times.

Analysis : 

The event might seem cryptic at first, especially if you aren’t used to troubleshooting, but it provides a valuable piece of information. Note the last line of the description. It says,

which was running as part of rule "Microsoft.SystemCenter.CollectInstanceSpace" running for instance "All Management Servers Resource Pool" with id:"{4932D8F0-C8E2-2F4B-288E-3ED98A340B9F}" in management group "MG".

Here, you get some interesting information, as to which exact rule/monitor is failing, and running for what instance.

Ok, so we have the rule ID and the target. The rule is “Microsoft.SystemCenter.CollectInstanceSpace”. With a quick glance at the System Center Wiki tells me that the display name of this rule is “Send Instancespace to the Cloud” and it is a “System rule that sends instancespace up to the cloud.”

So what happens here is, the rule runs at it’s scheduled interval, and fails. This causes the MS where it’s running on to go grey. When you re-initialize the cache on the MS, everything is reset, and the MS becomes green. Then again, the rule runs at its interval and fails again, the MS goes grey again, and the cycle goes on.

Resolution :

Ok, so now we have some solid information to work on. Grab the rule name, find it in the Rules in your console. Once you do, take a look at the properties. You’d know what is it exactly doing, any overrides, what MP is it coming from, etc.

Now that you’ve found the rule that is the root of the problem, disable it. Now, go back and flush the cache on the MS again. As it is downloading the configuration again, keep an eye on the event log for any errors.

If the MS becomes and remains green, we’re done! If if goes back to grey, follow the process all over again, until you notice there are no more failing workflows from rules/monitors that are causing the MS to go grey.

One step further, if you notice that all these rules/monitors are from the same MP, chances are that MP has been corrupted and you may want to remove or update the MP.

Note that although this might solve your problem, it may not be the only one causing the issue. E.g., bad performance of your databases can also result in this problem. So if you find the problem is still persisting, look for other relevant events that might give you a hint. 🙂

You can refer to these threads from the Technet forums for further reading:

SCOM Health Service greyed out on Management Server

Management server getting greyed out again and again

Hope this helps someone out there with similar issues.



SCOM Event Based Monitoring – Part 2 – Rules

In the last post we discussed about event based monitoring options SCOM  provides with Monitors. You can find it here:

SCOM Event Based Monitoring – Part 1 – Monitors

In this post we are going to discuss the event based monitoring options using SCOM Rules. Basically the highlighted options in the image below:


As we can see, we have 2 kinds of rules for monitoring events. “Alert Generating Rules” and “Collection Rules”. Let’s walk through them one by one.

Alert Generating Rules:

As the name suggests, this type of rules raise an alert if they detect the event ID.

As you’re going through the Rule Creation wizard you will notice that there are several options in “Rule category”, as shown in the pic below. In my experience, there is no difference whatever you choose. These are more like for “Logical Grouping” of these rules, that you can maybe make use of if working on them through Powershell. Since we’re here to create an “alert generating” rule, the most obvious option here would be “Alert”.


Now, this step is pretty important and if you’re new to creating this, you’re very likely to miss this. As you reach the final window, on the bottom of “Alert Description” (which you can modify btw), is the box for “Alert Suppression”. This is used to tell SCOM what events should be considered “duplicates” and hence not raise a new alert if there’s already one open in active alerts. “Auto-suppression” happens automatically for monitors – it only increases the repeat count for every new detection – but for rules, you’ll have to do this manually. If you don’t do this, you’re gonna get a new alert every time this event is detected. In this demo, I’m telling SCOM to consider the event as duplicate if it has the same Event ID AND the same Event Source.

I HIGHLY recommend using this since I learned this hard way some time back. I missed out configuring alert suppression for some rule and a couple nights later, woke up to a call from our Command Center guys. They said, “Dude! We’ve received THOUSANDS of  emails in our mailbox since last 15 minutes…and they’re all the same. Help!”

I rushed and turned it off, further investigation brought to light the cause that something had gone wrong on the server and it was writing the same event over and over again, thousands of times in the event log, and since I had not configured the suppression criteria, it created a new alert every time and sent mail as set up in subscription. Now I check for suppression criteria at least 3 times 🙂


Now that is set up, I created the event in the logs with a simple PoSh:

Write-EventLog -Logname Application -ID 1000 -Source custom -Message "This is a test message"

And as you can see, the alert was raised.


Collection Rules:

Collection rules are used only to collect the events in the SCOM database. They do not create any alert. In fact, you’ll hardly even notice their presence at all. Why create these rules then? Most commonly for reporting/auditing purposes. You can also create a dashboard showing the detection of these events.



Notice in the left side of the wizard that there’s no “Configure Alerts” tab as we had in the “Alert Generating” rule.

So what this rule is going to do is detect the occurrence of event 500 with source “custom” and if it detects it, just saves it in the database.

I wrote this event in the logs a bunch of times, now let’s see if SCOM is really collecting them or not. We’ll create a simple “Event View” dashboard for this.


And yup, we can indeed see that the event is actually being collected by SCOM.


Here’s another thing that might be a bit tricky. After you create these 2 types of events, if you open the properties, you’ll see they’re almost identical. If you’ve named them properly, kudos to you but if you (or someone else) hasn’t, how will you find out whether the rule is generating alerts or just collecting events? Take a close look at the content in the yellow boxes below. See the difference?

Alert Generating Rule Properties:


Event Collection Rule Properties:


You rightly noticed that the response for the alert generating rule is “Alert”, which means this rule is going to respond to the event detection by generating an alert. If you click on the “Edit” option just beside that, you’ll see wizard for alert name change, format change, suppression criteria, etc.

On the other hand, the yellow box in the collection rule is “Event Data Collection Write Action” and “Event data publisher”, which indicates that it is only writing it in the databases. You can also verify this with Powershell:


You can also fetch a report for the event collection rule for auditing purposes, but unfortunately I do not have the Reporting feature installed in my lab so can’t show a demo for that. 🙂

Thus concluding the event monitoring options SCOM offers with monitors as well as Rules. Now you know what abilities SCOM has, now it’s up to you to decide which option do you wanna choose 🙂



SCOM Event Based Monitoring – Part 1 – Monitors

In this post I’m discussing about the possibilities SCOM provides with event detection monitoring using monitors.

I’ve written a similar blog for creating services, which you can see here:


Alright, so just go to Authoring -> Expand Management Pack Objects -> Monitors -> Create a Monitor -> Unit monitor. This is the screen that you should have got:


The options enclosed in the box is what we’re concerned about at this time. So let’s go through them, one by one. The three “Reset” options, “Manual Reset”, “Timer Reset” and “Windows Event Reset” exist for all the monitors (even though I’ve expanded only the first 2 in the pic above).

  • Manual Reset: Choose this option when you want the alert to stay in the console unless you close/resolve it manually.
  • Timer Reset: Choose this option when you want the alert to close itself automatically after a given period of time.
  • Windows Event Reset: With this option you can choose to automatically close the alert only when a second healthy event is detected in a given time period. So, one bad event raises the alert, and the second good event resolves it. If the healthy event is not detected in the given time, the alert stays in the console until you close it manually.

Simple Event Detection:

This is the option that may know the best. It’s the simplest and does exactly the same as the name suggests – simply detects the occurrence of an event in the specified Event Log and raises an alert.


Manual Reset –




Now that we have the monitor set up, let’s test it.

We’ll create a custom event with Powershell and try to detect that. Here’s a simple Posh:

#create a custom source
New-EventLog -LogName Application -Source "Custom"
#write event
Write-EventLog -LogName Application -Source "Custom" -EventId 100 -Message "This is a test event"

Just making sure the event was created:

new event

Right, looks good. Now onto the Ops Console:


As we can see, the alert has been raised. The alert will be resolved when the monitor producing it will be healthy. Since this is a manual reset monitor, it’ll only turn back healthy when you manually reset it.

There’s a good side to this and a bad one.

Good side:

You will always notice when the alert has been raised, and you can take any responsive measures as applicable. After you’re done, reset the monitor to make sure some action has been taken on this.

Bad side:

Unless and until you’re making sure to manually reset the monitor, there won’t be a new alert. As the monitor is critical already, it can’t be critical again and so won’t generate a new alert. It’ll only increase the repeat count, which may or may not be what you want. The work-around for this is to run a scheduled script that resets the monitors periodically to turn them back to healthy to make way for a new alert.

Timer Reset –

The only extra option you have here is to specify the wait time for reset. I’ve created this monitor to detect event 101 in Applications log.


With tests similar to the previous one, I get an alert for this.


You will have to take my word for it, the alert disappeared after 15 minutes 😉

Windows Event Reset –

Pay attention to the Wizard options here. You have to configure 2 event expressions, one for unhealthy and other for healthy. I set up the unhealthy event as event 102 with source “custom” in Application log while the healthy event is event 102 with source “custom1”.

Unhealthy event:


Healthy event:


As soon as I created the unhealthy event, I received an alert which was automatically resolved when I triggered the healthy event.
Repeated Event Detection:

Choose this monitor when you want to raise an alert if the specific event is raised repeatedly, with given settings. Here’s where the things get a little tricky.


You have a bunch of different (and confusing) options to set up here. Luckily, it’s all very well documented here on Technet : Repeating Events

What I’m doing is to configure the monitor to raise an alert when the event 103 is raised 3 times within 15 seconds. And sure enough, I do get an alert.

Missing Event Detection:

Choose this monitor when you’re expecting some event to be written in the Event Log – maybe due some kind of scheduled activity like backup, maintenance, scripted events, etc – at the given time. If the monitor doesn’t detect it, it generates an alert.


So what I’m basically telling SCOM is, “I’m expecting the event ID 104 from source “custom” in the Application event log every 15 minutes, let me know if it doesn’t show up, will ya? Thanks!”

To test this, I did NOT create an event with ID 104, and sure enough, I got the alert.


(Do not worry about the mismatch in the alert name and the monitor name, I made a typo in the alert name. It should say “anaops – missing event detection – manual reset” instead of the “repeated” as the name of the monitor at bottom suggests)
Correlated Event Detection:

Choose this option if you want an alert based on some correlation between two event ID’s. “Some correlation” can vary, as you can see in the wizard.



This can be bit confusing. In this demo, what I’m telling SCOM is,”Hey, let me know if event 105 from source “custom” is raised AND within 5 minutes of its occurrence, event ID 105 from source “custom1″ is also raised (in that order). Cool?”

SCOM said “Cool!”, so I tested it with writing these two events mentioned above within the interval of 5 minutes. And yup, I got an alert.

Correlated Missing Event Detection:

Choose this one when you need an alert when you have “some correlation” between two events – first one occurs, we’re expecting the other within 5 minutes, but it isn’t raised.

For testing this, I created the event 106 from source “custom” in applications log but did NOT create the other event 106 from source “custom1” within the next 5 minutes. Sure enough, here’s the alert I got:


As you can imagine the other two monitor reset strategies “timer reset” and “windows event reset” will have slightly different wizards, but I’m sure you guys can figure it out 😉

Also, As you may have noticed, unlike many other monitors, there’s no “interval” at which the event detection monitors are running. Meaning, it is looking for the events in the log “all the time”. So the event monitoring you get is almost real-time.

This concludes this fairly long blog, but I hope it gives you some clarity about what options you have for event detection monitoring and help you in choosing the right one. 🙂

We’ll talk about the event monitoring options with rules in the next post.


SCOM Basic Service Monitor Vs. Windows Service Template

Every now and then I’ve seen questions regarding this on the Technet forums. The most usual question is “A service XXX failure alert is being generated by a server where this service isn’t even present! What’s going on?”

The Basic Service Monitor:

This is a simple monitor that simply puts an instance of the monitor you create on EVERY server where it’s targeted. Most of the times the class you select is “Windows Server”, and so the monitor is delivered to every Windows server – regardless of whether the service is actually present there or not.



I suggest you to create a brand new MP and save this monitor in it. Now, export it and analyze what you see in the XML. You’ll notice that there aren’t a lot of things, just a single simple monitor.

So what this basically does is to put up an instance of the monitor on EVERY instance of the target class. It does not bother to check whether the service actually even exists on that server or not. And this more often than sometimes causes false alerts stating that the service is “down” on the servers where it isn’t even present! This is why I call this the “dumb” service monitor.

If you want to apply this only on a group of servers, you need to go through the additional step of disabling this monitor and enable it through override explicitly for the group you’ve created.

Windows Service Template:

Now let’s create a service monitor using the Windows Service template. As you’ll notice while creating the monitor, this wizard offers much more than just simple service availability monitoring. You can also specify to get alerts based on how much CPU and memory the service is actually using.

While setting up the target for this monitor, you’d also notice that you need a group to target this against (instead of the whole classes as we did in case of basic service monitor). What this does is to provide the precise targeting for this monitor to where you want to run this. If you want to target this to all Windows servers in your environment, just select the “All Windows Computers” group.


Now, let’s do the same thing – save this in a separate test MP and export it for our analysis. You’ll see some interesting stuff in the XML.

This MP will be considerably larger than the previous one and the first thing you’ll notice is the discovery. This monitor creates it’s own discovery for the service. And when you have a discovery, you also have a class. As you create this monitor, SCOM automatically detects the presence of this service on the servers (in the group provided earlier) and populates the class. Once the class is populated, the monitoring is targeted only to the instances of this class, saving SCOM and you the trouble to narrow down the scope later. Pretty neat, eh? 🙂 This is why I like to call this the “intelligent” service monitor.

You’ll also see that when you create this one monitor, under the hood SCOM creates several monitors as well as rules:

Type Description Enabled?
Monitors Running state of the service Enabled.
CPU utilization of the service Enabled if CPU Usage monitoring is selected in the wizard.
Memory usage of the service Enabled if Memory Usage monitoring is selected in the wizard.
Collection Rules Collection of events indicating a change in service’s running states. Enabled.
Collection of CPU utilization for the service Enabled if CPU Usage monitoring is selected in the wizard.
Collection of memory usage for the service Enabled if Memory Usage monitoring is selected in the wizard.
Collection of Handle Count for the service Disabled. Can be enabled with an override.
Collection of Thread Count for the service Disabled. Can be enabled with an override.
Collection of Working Set for the service Disabled. Can be enabled with an override.

So you see, this one wizard is actually creating THREE different monitors and SIX different performance collection rules. Also another upside of this is, since the class has also been created, you can target this class for any rules or monitors that you may want to create for this particular sub-set of servers where the service is running.

Another great thing about this is that since you have a class for this, you can even pull an availability report against this object to measure the uptime of your service.

OK then, which one should I choose?

After all said and seen, the obvious question you have in mind is probably one of the below:

  1. Cool, so the Windows Service template option looks pretty awesome. I should be using this one all the time, right?
  2. Wow, I never know that. I’ve never created a service monitor with the Windows Template option. Did I make a mistake?
  3. Why would anyone even create the basic service monitor then?

These are all legit questions, and you might be surprised to know my answer if you ask me my preferred way of monitoring a service. Yes, I (and many others) would still prefer the basic service monitor. Why? There are several reasons to do that.

  1. You only want to monitor the availability of the service. You are not concerned about the amount of CPU memory it is consuming. In fact, this is the case most of the times. You’re mainly focused only on the up/down status of the service. And in case you’re worried about CPU and memory utilization being consumed, you do have special dedicated monitors for them anyway.
  2. As the Window Service template creates a lot of things along with the service availability monitoring (1 class, 3 monitors and 6 rules), if you don’t actually need them, they’re just unnecessary overhead for SCOM. Now imagine you creating (1+3+6) 10 objects in SCOM for EACH service out of which 9 are not being used, how much litter you have created in SCOM. Basic service monitor on the other hand only creates 1 object (the actual availability monitor).
  3. It is much more work to disable the Windows Service template monitor than the basic service monitor. As you can imagine, if you no longer want to monitor the service, you’ll have to disable all 10 objects related to this monitor as opposed to just one in basic service monitor.

Hence, always first decide whether you REALLY want all this additional functionality that the Windows Service template provides, and if the answer is “Yes”, go for this way. Else, the good old basic service monitor is your friend 😉

Hopefully this clears up some things for you. 🙂



Run Powershell scripts in Console Tasks

I am working on one of the projects and as a part of it I needed to create some console tasks that would run a Powershell script to do the stuff I want. I knew that it was no problem for a script a line of two long, but any more than that and it is a real pain to pass it as the parameter in the console task. The other way I was aware of is you can point the path to your script in the parameters if you have it locally saved on your management servers (each and every one of them, at the exact same path). This didn’t really serve my purpose as I wanted to embed in a re-usable XML so I decided to do something on my own.

After a bit of researching the Internet and a lot of trial-and-error, I finally got it working. The key points to remember here are:

1. CDATA to parse it through xml

2. -command “cmd1;cmd2” to pass the block as a single input

3. Using “;” to break the cmdlets

4. Use the escape character “\” before every double quote (“) to skip the character otherwise the compiler misunderstands it for -command syntax and throws errors.

Here’s an example of the XML, with a simple Powershell code that creates an event in the custom event source:


<ManagementPackFragment SchemaVersion="2.0" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
		<Category ID ="Cat.CustomTasks.CreateEvent"  Target ="CustomTasks.CreateEvent" Value ="System!System.Internal.ManagementPack.ConsoleTasks.MonitoringObject"/>


			<ConsoleTask ID="CustomTasks.CreateEvent" Accessibility="Internal" Enabled="true" Target="Windows!Microsoft.Windows.Computer" RequireOutput="true">

					<Argument Name ="WorkingDirectory"/>
					<Argument Name ="Application">%windir%\system32\windowsPowershell\v1.0\powershell.exe</Argument>
						<![CDATA[ -command 
#Create a custom source;
New-EventLog -Source 'Task Source Name' -LogName 'Operations Manager';
#Write the event;
Write-EventLog -LogName 'Operations Manager' -Source 'Task Source Name' -EntryType Warning -EventId 1010 -Message 'This is a test event created by task'


		<LanguagePack  ID ="ENU" IsDefault ="true">
				<DisplayString  ElementID ="CustomTasks.CreateEvent">
					<Name>CT - Create Test Event</Name>
					<Description>Creates a Warning test event</Description>
		<Assembly ID ="CustomTasks.CTCreateEventAssembly" Accessibility ="Public" FileName ="Res.CustomTasks.CTCreateEventAssembly" HasNullStream ="true" QualifiedName ="Res.CustomTasks.CTCreateEventAssembly" />

Here’s the output:


Hope this helps someone out there with similar need.

For further reading, you can go through this thread:

Powershell script in a console task?

Keep SCOMing 🙂


NEW Author Announcement – Sameer Mhaisekar ( A talented blogger, a SCOM expert, and ambitious guy)

This is an exciting period for AnalyticOps Insights.It is growing and it is becoming from personal blog to a community blog. Our new author Sameer Mhaisekar is a very talented blogger, a young ambitious SCOM expert and a community contributor. Let’s get know him a little better.

Who is Sameer?

Hello, I am a young addition to the SCOM community from India. I’ve been working with SCOM for the last couple of years and fell in love with it. After being blessed by the awesome community for a long time, a few months ago I started contributing my little share. I’m serving the community mainly in the Operations Manager forums. I aim to be a capable SCOM admin and MP author. Apart from SCOM, I also take a keen interest in Powershell, SCCM, Azure, and OMS (which I am still learning).When I’m not working I enjoy reading, blogging, traveling, sports, online gaming, etc.

I have read your Linkedin and Microsoft Tech profiles, and the first impression I got is that you are a very ambitious guy. In two years you have done so much. Your progress is very impressive. And therefore it makes sense to me that you have great ideas and goals that you want to achieve. So, first of all, am I right? If I am, what are they?
You are right, I am pretty ambitious and willing to work hard for it. I aim to be a capable IT professional all-around and to serve the community as much as I can. My goal is to be a person who can get your work done, whenever you ask me to.

What was the biggest challenge in your workplace that you accomplished?
The biggest challenge I faced (which I still face very often) is just coming up with the sheer vastness of IT. Having come from a non-IT background, this was pretty tough for me in the beginning. However, after a while, I got used to it, and now I actually love that there’s always more and new things to learn!
Do you think there is a future for SCOM?
Definitely. Apart from the fact that there is a vast majority of organizations who are highly dependent on SCOM environments, it is simply not possible or feasible to move everything to cloud and achieve the same level of competency. Not to mention SCOM is becoming better and better, just look at the latest version SCOM 1801!
Do you think the OMS will replace SCOM?
Not in near future, no it won’t. I believe SCOM and OMS both work the best hand-in-hand, and they compliment each other very well. I think the advantages on-premise software providers are not matched in cloud solutions (yet, at least). However, let’s not pretend that OMS will never replace SCOM in future, but for now, SCOM is here to stay.
And finally a traditional question, Star Wars or Star Trek?

Well, please don’t hate me for saying this, but I haven’t watched both and to be honest, I’m not a fan of it either.
For me though, the better question would be “Who’s better, Messi or Ronaldo?”

Thank you.

Chat with Sameer in SCOM Community chat room.

Or contact on LinkedIn for professional advice.

Or find him on Technet.