Author: Sam

An IT enthusiast in Microsoft technologies focusing on SCOM and Azure.

SCOM Event Based Monitoring – Part 1 – Monitors

In this post I’m discussing about the possibilities SCOM provides with event detection monitoring using monitors.

I’ve written a similar blog for creating services, which you can see here:

SCOM BASIC SERVICE MONITOR VS. WINDOWS SERVICE TEMPLATE

Alright, so just go to Authoring -> Expand Management Pack Objects -> Monitors -> Create a Monitor -> Unit monitor. This is the screen that you should have got:

monitors1

The options enclosed in the box is what we’re concerned about at this time. So let’s go through them, one by one. The three “Reset” options, “Manual Reset”, “Timer Reset” and “Windows Event Reset” exist for all the monitors (even though I’ve expanded only the first 2 in the pic above).

  • Manual Reset: Choose this option when you want the alert to stay in the console unless you close/resolve it manually.
  • Timer Reset: Choose this option when you want the alert to close itself automatically after a given period of time.
  • Windows Event Reset: With this option you can choose to automatically close the alert only when a second healthy event is detected in a given time period. So, one bad event raises the alert, and the second good event resolves it. If the healthy event is not detected in the given time, the alert stays in the console until you close it manually.

Simple Event Detection:

This is the option that may know the best. It’s the simplest and does exactly the same as the name suggests – simply detects the occurrence of an event in the specified Event Log and raises an alert.

Examples:

Manual Reset –

monitors2

monitors3

monitors5

Now that we have the monitor set up, let’s test it.

We’ll create a custom event with Powershell and try to detect that. Here’s a simple Posh:

#create a custom source
New-EventLog -LogName Application -Source "Custom"
#write event
Write-EventLog -LogName Application -Source "Custom" -EventId 100 -Message "This is a test event"

Just making sure the event was created:

new event

Right, looks good. Now onto the Ops Console:

cons

As we can see, the alert has been raised. The alert will be resolved when the monitor producing it will be healthy. Since this is a manual reset monitor, it’ll only turn back healthy when you manually reset it.

There’s a good side to this and a bad one.

Good side:

You will always notice when the alert has been raised, and you can take any responsive measures as applicable. After you’re done, reset the monitor to make sure some action has been taken on this.

Bad side:

Unless and until you’re making sure to manually reset the monitor, there won’t be a new alert. As the monitor is critical already, it can’t be critical again and so won’t generate a new alert. It’ll only increase the repeat count, which may or may not be what you want. The work-around for this is to run a scheduled script that resets the monitors periodically to turn them back to healthy to make way for a new alert.

Timer Reset –

The only extra option you have here is to specify the wait time for reset. I’ve created this monitor to detect event 101 in Applications log.

1

With tests similar to the previous one, I get an alert for this.

1

You will have to take my word for it, the alert disappeared after 15 minutes 😉

Windows Event Reset –

Pay attention to the Wizard options here. You have to configure 2 event expressions, one for unhealthy and other for healthy. I set up the unhealthy event as event 102 with source “custom” in Application log while the healthy event is event 102 with source “custom1”.

Unhealthy event:

1

Healthy event:

2

As soon as I created the unhealthy event, I received an alert which was automatically resolved when I triggered the healthy event.
Repeated Event Detection:

Choose this monitor when you want to raise an alert if the specific event is raised repeatedly, with given settings. Here’s where the things get a little tricky.

1

You have a bunch of different (and confusing) options to set up here. Luckily, it’s all very well documented here on Technet : Repeating Events

What I’m doing is to configure the monitor to raise an alert when the event 103 is raised 3 times within 15 seconds. And sure enough, I do get an alert.

1
Missing Event Detection:

Choose this monitor when you’re expecting some event to be written in the Event Log – maybe due some kind of scheduled activity like backup, maintenance, scripted events, etc – at the given time. If the monitor doesn’t detect it, it generates an alert.

1

So what I’m basically telling SCOM is, “I’m expecting the event ID 104 from source “custom” in the Application event log every 15 minutes, let me know if it doesn’t show up, will ya? Thanks!”

To test this, I did NOT create an event with ID 104, and sure enough, I got the alert.

Capture5

(Do not worry about the mismatch in the alert name and the monitor name, I made a typo in the alert name. It should say “anaops – missing event detection – manual reset” instead of the “repeated” as the name of the monitor at bottom suggests)
Correlated Event Detection:

Choose this option if you want an alert based on some correlation between two event ID’s. “Some correlation” can vary, as you can see in the wizard.

3

4

This can be bit confusing. In this demo, what I’m telling SCOM is,”Hey, let me know if event 105 from source “custom” is raised AND within 5 minutes of its occurrence, event ID 105 from source “custom1″ is also raised (in that order). Cool?”

SCOM said “Cool!”, so I tested it with writing these two events mentioned above within the interval of 5 minutes. And yup, I got an alert.

1
Correlated Missing Event Detection:

Choose this one when you need an alert when you have “some correlation” between two events – first one occurs, we’re expecting the other within 5 minutes, but it isn’t raised.

For testing this, I created the event 106 from source “custom” in applications log but did NOT create the other event 106 from source “custom1” within the next 5 minutes. Sure enough, here’s the alert I got:

1

As you can imagine the other two monitor reset strategies “timer reset” and “windows event reset” will have slightly different wizards, but I’m sure you guys can figure it out 😉

Also, As you may have noticed, unlike many other monitors, there’s no “interval” at which the event detection monitors are running. Meaning, it is looking for the events in the log “all the time”. So the event monitoring you get is almost real-time.

This concludes this fairly long blog, but I hope it gives you some clarity about what options you have for event detection monitoring and help you in choosing the right one. 🙂

We’ll talk about the event monitoring options with rules in the next post.

Cheers!

SCOM Basic Service Monitor Vs. Windows Service Template

Every now and then I’ve seen questions regarding this on the Technet forums. The most usual question is “A service XXX failure alert is being generated by a server where this service isn’t even present! What’s going on?”

The Basic Service Monitor:

This is a simple monitor that simply puts an instance of the monitor you create on EVERY server where it’s targeted. Most of the times the class you select is “Windows Server”, and so the monitor is delivered to every Windows server – regardless of whether the service is actually present there or not.

 

BSM

I suggest you to create a brand new MP and save this monitor in it. Now, export it and analyze what you see in the XML. You’ll notice that there aren’t a lot of things, just a single simple monitor.

So what this basically does is to put up an instance of the monitor on EVERY instance of the target class. It does not bother to check whether the service actually even exists on that server or not. And this more often than sometimes causes false alerts stating that the service is “down” on the servers where it isn’t even present! This is why I call this the “dumb” service monitor.

If you want to apply this only on a group of servers, you need to go through the additional step of disabling this monitor and enable it through override explicitly for the group you’ve created.

Windows Service Template:

Now let’s create a service monitor using the Windows Service template. As you’ll notice while creating the monitor, this wizard offers much more than just simple service availability monitoring. You can also specify to get alerts based on how much CPU and memory the service is actually using.

While setting up the target for this monitor, you’d also notice that you need a group to target this against (instead of the whole classes as we did in case of basic service monitor). What this does is to provide the precise targeting for this monitor to where you want to run this. If you want to target this to all Windows servers in your environment, just select the “All Windows Computers” group.

WST

Now, let’s do the same thing – save this in a separate test MP and export it for our analysis. You’ll see some interesting stuff in the XML.

This MP will be considerably larger than the previous one and the first thing you’ll notice is the discovery. This monitor creates it’s own discovery for the service. And when you have a discovery, you also have a class. As you create this monitor, SCOM automatically detects the presence of this service on the servers (in the group provided earlier) and populates the class. Once the class is populated, the monitoring is targeted only to the instances of this class, saving SCOM and you the trouble to narrow down the scope later. Pretty neat, eh? 🙂 This is why I like to call this the “intelligent” service monitor.

You’ll also see that when you create this one monitor, under the hood SCOM creates several monitors as well as rules:

Type Description Enabled?
Monitors Running state of the service Enabled.
CPU utilization of the service Enabled if CPU Usage monitoring is selected in the wizard.
Memory usage of the service Enabled if Memory Usage monitoring is selected in the wizard.
Collection Rules Collection of events indicating a change in service’s running states. Enabled.
Collection of CPU utilization for the service Enabled if CPU Usage monitoring is selected in the wizard.
Collection of memory usage for the service Enabled if Memory Usage monitoring is selected in the wizard.
Collection of Handle Count for the service Disabled. Can be enabled with an override.
Collection of Thread Count for the service Disabled. Can be enabled with an override.
Collection of Working Set for the service Disabled. Can be enabled with an override.

So you see, this one wizard is actually creating THREE different monitors and SIX different performance collection rules. Also another upside of this is, since the class has also been created, you can target this class for any rules or monitors that you may want to create for this particular sub-set of servers where the service is running.

Another great thing about this is that since you have a class for this, you can even pull an availability report against this object to measure the uptime of your service.

OK then, which one should I choose?

After all said and seen, the obvious question you have in mind is probably one of the below:

  1. Cool, so the Windows Service template option looks pretty awesome. I should be using this one all the time, right?
  2. Wow, I never know that. I’ve never created a service monitor with the Windows Template option. Did I make a mistake?
  3. Why would anyone even create the basic service monitor then?

These are all legit questions, and you might be surprised to know my answer if you ask me my preferred way of monitoring a service. Yes, I (and many others) would still prefer the basic service monitor. Why? There are several reasons to do that.

  1. You only want to monitor the availability of the service. You are not concerned about the amount of CPU memory it is consuming. In fact, this is the case most of the times. You’re mainly focused only on the up/down status of the service. And in case you’re worried about CPU and memory utilization being consumed, you do have special dedicated monitors for them anyway.
  2. As the Window Service template creates a lot of things along with the service availability monitoring (1 class, 3 monitors and 6 rules), if you don’t actually need them, they’re just unnecessary overhead for SCOM. Now imagine you creating (1+3+6) 10 objects in SCOM for EACH service out of which 9 are not being used, how much litter you have created in SCOM. Basic service monitor on the other hand only creates 1 object (the actual availability monitor).
  3. It is much more work to disable the Windows Service template monitor than the basic service monitor. As you can imagine, if you no longer want to monitor the service, you’ll have to disable all 10 objects related to this monitor as opposed to just one in basic service monitor.

Hence, always first decide whether you REALLY want all this additional functionality that the Windows Service template provides, and if the answer is “Yes”, go for this way. Else, the good old basic service monitor is your friend 😉

Hopefully this clears up some things for you. 🙂

Cheers!

 

Run Powershell scripts in Console Tasks

I am working on one of the projects and as a part of it I needed to create some console tasks that would run a Powershell script to do the stuff I want. I knew that it was no problem for a script a line of two long, but any more than that and it is a real pain to pass it as the parameter in the console task. The other way I was aware of is you can point the path to your script in the parameters if you have it locally saved on your management servers (each and every one of them, at the exact same path). This didn’t really serve my purpose as I wanted to embed in a re-usable XML so I decided to do something on my own.

After a bit of researching the Internet and a lot of trial-and-error, I finally got it working. The key points to remember here are:

1. CDATA to parse it through xml

2. -command “cmd1;cmd2” to pass the block as a single input

3. Using “;” to break the cmdlets

4. Use the escape character “\” before every double quote (“) to skip the character otherwise the compiler misunderstands it for -command syntax and throws errors.

Here’s an example of the XML, with a simple Powershell code that creates an event in the custom event source:

 

<ManagementPackFragment SchemaVersion="2.0" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
	<Categories>
		<Category ID ="Cat.CustomTasks.CreateEvent"  Target ="CustomTasks.CreateEvent" Value ="System!System.Internal.ManagementPack.ConsoleTasks.MonitoringObject"/>
	</Categories>
	<Presentation>

		<ConsoleTasks>

			<ConsoleTask ID="CustomTasks.CreateEvent" Accessibility="Internal" Enabled="true" Target="Windows!Microsoft.Windows.Computer" RequireOutput="true">
				<Assembly>CustomTasks.CTCreateEventAssembly</Assembly>
				<Handler>ShellHandler</Handler>

				<Parameters>
					<Argument Name ="WorkingDirectory"/>
					<Argument Name ="Application">%windir%\system32\windowsPowershell\v1.0\powershell.exe</Argument>
					<Argument>
						<![CDATA[ -command 
						"
#Create a custom source;
New-EventLog -Source 'Task Source Name' -LogName 'Operations Manager';
#Write the event;
Write-EventLog -LogName 'Operations Manager' -Source 'Task Source Name' -EntryType Warning -EventId 1010 -Message 'This is a test event created by task'
"
						]]>
						
					</Argument>
				</Parameters>

			</ConsoleTask>
		</ConsoleTasks>
	</Presentation>

	<LanguagePacks>
		<LanguagePack  ID ="ENU" IsDefault ="true">
			<DisplayStrings>
				<DisplayString  ElementID ="CustomTasks.CreateEvent">
					<Name>CT - Create Test Event</Name>
					<Description>Creates a Warning test event</Description>
				</DisplayString>
			</DisplayStrings>
		</LanguagePack>
	</LanguagePacks>
	<Resources>
		<Assembly ID ="CustomTasks.CTCreateEventAssembly" Accessibility ="Public" FileName ="Res.CustomTasks.CTCreateEventAssembly" HasNullStream ="true" QualifiedName ="Res.CustomTasks.CTCreateEventAssembly" />
	</Resources>  
</ManagementPackFragment>

Here’s the output:

createeventoutput

Hope this helps someone out there with similar need.

For further reading, you can go through this thread:

Powershell script in a console task?

Keep SCOMing 🙂

Cheers

SCOM Management Group

SCOM MG

Here we’re going to talk about something very important and basically the root of everything. Without this your SCOM does not exist. Essentially, this is what your whole SCOM setup is called – The Management Group. Yet, this is not something you’d work with daily. In fact, in most environments it’s just an install-and-forget kinda thing. You may sometimes run into wizards that’d ask you to specify the name of your MG, like manual agent install wizards, but that’s mostly all. However, there are some cases where you’d want to give some special attention to your MG, let’s discuss those here.

So the technical definition of an MG according to Technet goes like this:

Installing Operations Manager creates a management group. The management group is the basic unit of functionality. At a minimum, a management group consists of a management server, the operational database, and the reporting data warehouse database.

…and that’s really all about it. You have to specify the name of your MG when you install SCOM. Once everything is up and running, all your components (MS, GW, DBs, Agents, etc) would eventually be connected to this MG.

Where can I see the name of my MG?

You can view the name of your MG at the very top of your Ops Console. Like here:

The highlighted text is the name of your MG.

scom MG.PNG

(random screenshot from Internet)

You can also retrieve the name of the MG using Powershell:

Get-SCOMManagementGroup | select Name

The name of the Management Group can not be changed later, so plan ahead.

Speaking of planning, you must go through this document here to decide on a high level how the structure of your MG (or MGs) should be like, and a great insight into the components that make up an MG:

Planning a Management Group Design

There really isn’t a applies-to-all criteria of how big your MG should be, but there are certainly some constraints on how much data or load the underlying components may take. There’s an excellent tool that would make it easier for you to plan your deployment, its called the Operations Manager Sizing Helper Tool.

The Operations Manager 2012 Sizing Helper is an interactive document designed to assist you with planning & sizing deployments of Operations Manager 2012. It helps you plan the correct amount of infrastructure needed for a new OpsMgr 2012 deployment, removing the uncertainties in making IT hardware purchases and optimizes cost. A typical recommendation will include the recommended hardware specification for each server role, topology diagram and storage requirement.

You can download it HERE

The general recommendations from Microsoft go like this:

Monitored item Recommended limit
Simultaneous Operations consoles 50
Agent-monitored computers reporting to a management server 3,000
Agent-monitored computers reporting to a gateway server 2,000
Agentless Exception Monitored (AEM)-computers per dedicated management server 25,000
Agentless Exception Monitored (AEM)-computers per management group 100,000
Collective client monitored computers per management server 2,500
Management servers per agent for multihoming 4
Agentless-managed computers per management server 10
Agentless-managed computers per management group 60
Agent-managed and UNIX or Linux computers per management group 6,000 (with 50 open consoles); 15,000 (with 25 open consoles)
UNIX or Linux computers per dedicated management server 500
UNIX or Linux computers monitored per dedicated gateway server 100
Network devices managed by a resource pool with three or more management servers 1,000
Network devices managed by two resource pools 2,000
Agents for Application Performance Monitoring (APM) 700
Applications for Application Performance Monitoring (APM) 400
URLs monitored per dedicated management server 3000
URLs monitored per dedicated management group 12,000
URLs monitored per agent 50

As you can see, it really depends on the size of your infrastructure and also on your Hardware. I’d really still keep a little margin in these numbers as well, just in case. So, a thorough and careful planning in the beginning will make your (and many others’) life easier later 🙂

Now, what if your environment is really huge and spread across the globe that just one MG isn’t enough?

No worries, you can connect multiple MG’s AND manage them centrally too 🙂

Connected Management Groups

Consider this – Your company Contoso has data centers all over the world, and the total number of devices (servers, network devices, etc) to be monitored is around 50k, distributed all across the world. Let us say you have offices in the US, Europe, India, Australia, etc. and you have dedicated teams that are supposed to handle the servers belonging to the Data centers in their region. These environments are basically different and they need to be monitored differently, independent of other regions. You do not want admins from one region to interfere between the operations of other regions, but – you also want to have a universal console at your HQ (let’s say in the US) from where you can keep an eye on all your different regions’ monitoring operations.

This is the best example of when you’d want to handover different and dedicated MG’s to each of your regions, and then you consolidate them all in a central MG you have at your HQ.

The MG’s that you consolidate are called the Connected Management Groups and the MG that you consolidate them under is called the Local Management Group. All the connected MG’s are peers and they do not have any visibility into other MG’s. Functionally, all these MG’s (connected and local) can work completely independent of each other – with different MP’s, different infra, different admins, different monitoring standards, different everything. In fact, the peer MG’s are pretty much unaware of the other MG’s. Once you connect multiple MG’s to a single master MG, you can view all the alerts coming from all the different connected MG’s in a single console.

To apply this to the scenario we described earlier, it’d be something like this:

The Ind_MG, the Eur_MG and the Aus_MG would be peers and would be the “Connected MG’s”, while

The Us_MG would be the “Local MG” where you can view alerts from all other MG’s- additional to its own.

There are some excellent walk-throughs on how to do this, like,

Connecting Management Groups in Operations Manager

SCOM Connected Management Groups–2016 and 2012

so let’s not go through the same thing again 😉

Just a note, if you’re thinking, “I wonder if I can connect my SCOM 07 MG to my SCOM 12 or 16 MG…” – Nope, can’t do that. All the MG’s involved must be the same SCOM build versions 🙂

Ok, that’s all for today!

Happy SCOM-ing 🙂

Cheers!