Category: SCOM

Why this computer still exists in the SCOM console?

 Why this computer still exists in the SCOM console? I swear that I deleted it!

If you are an SCOM administrator you probably heard this before, if not trust me you will.

This is a known issue at SCOM. It is a very mysterious one. I call it the “orphaned object”: It is an object that you try to delete but for some reason it won’t.

There are many articles written about this. Most of them explain how to remove the “orphaned object” by an SQL query. Although I don’t agree with the solution, I still think you should get to know it.

There is another great solution by Jan Van Meirvenne. Here is a great post that he wrote that explains it– How stuff is discovered by Operations Manager, and how you can remove it. The blog contains a good explanation and a PowerShell script solution.

 

Before you click on the link. I will share a very simple way to figure out why the object still exists:

  1. Open SCOM Console
  2. Left click on the “orphaned object”, choose Operations Manager Shell option.
  3. Wait the shell open and type the command $context.GetDiscoverySources(). The result is a list of discovery sources.
  4. Next step depends on source type property. There are three different source types.
  • User – ex. manually added a computer.
    • Delete the agent from SCOM
  • Connector – ex. SCVMM Connector.
    • Type a command Get-SCOMConnector -Id [ConnectorId]
  • Workflow – All other discoveries.
    • Type a command Get-SCOMDiscovery -Id [WorkflowId]

Example:

DiscoveryBy

Now, for more details, click here .

Many thanks to Jan Van Meirvenne for sharing his experience and knowledge.

A Challenge to All SCOM Experts

If you’re just like me, a big fan of SCOM and open source projects on GitHub, then you are probably already involved in open source project. How about a new SCOM challenge?

Five months ago I published a project on GitHub– Monitor Applications Using SQL Queries. I challenged my counterparts to write a guide and/or maintain the code and/or develop new features.  For me, this project commitment is the result of some thinking about a way to help the community grow. I’d be more than glad if you join. The goal is to contribute to SCOM community as much as possible. Since I started the challenge I gained experience and knowledge that I want to share with you:

  1. Being Involved in open source project forces you to be creative.
  2. It helps you with your work/life balance since you need to build a new habit and not just a one-week challenge 🙂
  3. It’s rewarding to know that you help someone, especially given the importance of GitHub.

Having you aboard would be even more awesome for a few more reasons. First, it’s always great to make connections and work with people. Second, telling the world that you’re involved in a challenge is a great way to stay motivated. And finally, we can contribute to each other’s projects. The more we are the more there will be ways to make daily contributions

If you want to take part in the challenge, feel free to submit a Pull Request or an issue. I’ve also created a gitter channel so we can chat about cool ideas, projects, suggestions or whatever.

Did You Ever Think of Disabling Performance Collection in SCOM?

We need to disable all performance data collection rules and instead collect those performances by other more suitable systems such as Elasticsearch or Splunk.

Don’t get me wrong SCOM is a fantastic platform for monitoring, it has a very sophisticated discovery engine, great self-monitoring agent capabilities, operation tasks, alerting and monitoring. They are all great, but what about the performance data? It seems to me that performance data structure is not suitable for SCOM object-oriented structure modal. I have two main reasons as to why SCOM and performance data collection are not a good match:

  1. It’s difficult to execute performance reports – you need to know exactly on which target class types the data exists since everything in SCOM performance is targeted to a specific class.
  2. Different intervals – each performance collection rule has a different configuration of an interval between samples. This practice is bad for deep investigation.

In addition, from my experience, many organizations use only a few performance metrics even though they collect a huge amount of performance data, it has a direct impact on SCOM performance.

What is your opinion? Do you agree with me or not? Did you already do that in your company? Or maybe you know someone who did this?

Monitor SQL Queries Management Pack, Then and Now

Two years ago one of my project managers came to me with a request for monitoring the project he is responsible for. The project services different complex applications. This request was unique because he wanted to make sure that none of the applications that use the service have a problem. So, I had one of two options: 1) to develop a management pack for each application. 2) to monitor the repository with an SQL query. You guessed right I have chosen the second option. At first, I created dedicated a Management Pack and after a while, we found this idea of monitoring an application by using an SQL query very useful.

I decided to create a template so it could be easier for my teammate and me to create such monitors.

This is how my new management pack was born.

This Management Pack doesn’t add monitors directly to SCOM, rather, it adds a new template to the Authoring tab of SCOM console so you can easily take any SQL query (Note: return value has to be a single numeric value) and use it as the basis for SCOM monitoring.

Now, you can monitor the behavior of your application such as the number of active users on your website or the number of minutes without any interaction on your application. You can configure it so that it would alert when the number dips are below or above a certain threshold. Moreover, the query result is collected as performance so you can use it for graphing and analyzing.

This Management Pack includes files; QueryOleDbMonitorLibrary.mp – it includes all the definitions and QueryOleDbMonitorTemplate.mpb – it includes the template definition. Import these two files to your SCOM environment.

The next chapter will be much more technical. I will explain how I develop template wizard in the Visual Studio IDE.

Challenge Accepted

Once in a while, a disaster happens and when it happens, the first question will be which servers were affected?

Unfortunately, the answer to this question is not accurate. You are probably thinking that I am wrong. Since the “Failed to connect to computer” alert will pop up when the agent will stop sending a heartbeat for 3 seconds (the default value) and no ping answer from the Management Server. This is a great monitor, however, the issue comes up when the agent has some kind of a problem. Let’s say the agent had a problem before the disaster such as service being shut down or data being corrupted or even somehow the agent is deleted from the server. In these scenarios, the “Failed to connect to computer” alert will not be pop up. I’m sure many of the SCOM administrators are familiar with the problem.

This problem motivated me to create a new management pack.

My solution is based on SQL query and ping check from a management server.

The following steps will explain to you how to solve the problem:

First, get all the unhealthy agents that are not in maintenance mode. The reason for this is that the unhealthy agents are the only agents that could be affected by the disaster. Second, check the ping status from agent’s management server (Note: The management server itself can be shut down, therefore, if necessary we will check the ping from the secondary management server and if needed from the RMS server). The output will be stored as an event in the Operations Manager database. The third step is to run a simple SQL query that will return a list of servers that have been affected as a result of the disaster (I added a sample query at the end of this page).

While I was working on this management pack I had another idea, which was to repair the unhealthy agent automatically. Let me explain since we are already collecting unhealthy agents that answer to ping, we can easily reach the conclusion, that the problem is not a server. Therefore, trying to repair the agent is necessary. So to repair the agent, all we will do is run a task of restarting the agent. For the agents that have a problem with collecting performance data, we will run the flush agent task.

Link to the SCOM Administration Add-Ons.

SQL Query Example:

DECLARE @StartTime DateTime

DECLARE @EndTime DateTime




SET @StartTime =DATEADD(MINUTE, ((DATEPART(MINUTE,  GETUTCDATE()) / 5) * 5)-30, DATEADD(HOUR, DATEDIFF(HOUR, 0, GETUTCDATE()), 0))

SET @EndTime =  DATEADD(MINUTE, ((DATEPART(MINUTE,  GETUTCDATE()) / 5) * 5), DATEADD(HOUR, DATEDIFF(HOUR, 0, GETUTCDATE()), 0))




SELECT

       v.TimeGenerated

       ,DATEADD(MINUTE, ((DATEPART(MINUTE, v.TimeGenerated) / 5) * 5), DATEADD(HOUR, DATEDIFF(HOUR, 0, v.TimeGenerated), 0)) AS TimeGeneratedFixed

       ,EventParametersXML

       ,y.Status

       ,y.StatusCode

       ,y.ResponseTime

       ,y.AgentServerName

       ,y.ManagementServerName

INTO #EventAllView

FROM EventAllView v

OUTER APPLY (SELECT

              CAST(v.EventParameters AS XML) AS EventParametersXML) x

OUTER APPLY (SELECT

              x.value('Param[1]', 'VARCHAR(80)') AS Status

              ,x.value('Param[2]', 'VARCHAR(80)') AS StatusCode

              ,x.value('Param[3]', 'VARCHAR(80)') AS ResponseTime

              ,x.value('Param[4]', 'VARCHAR(80)') AS AgentServerName

              ,x.value('Param[5]', 'VARCHAR(80)') AS ManagementServerName

       FROM x.EventParametersXML.nodes('/') AS NodeValues (x)) y

WHERE

PublisherName = 'ServerConnectivityCheck'

AND TimeGenerated >= @StartTime

AND DATEADD(MINUTE, DATEDIFF(MINUTE, 0, TimeGenerated), 0) <= @EndTime




;WITH TimesCTE ([Date])

AS

(

       SELECT

       @StartTime

       UNION ALL

       SELECT DATEADD(MINUTE, 5, [Date]) FROM TimesCTE WHERE [Date] < @EndTime

)

SELECT

res.AgentServerName,

res.[Date],

v.TimeGenerated,

v.Status,

v.StatusCode,

v.ResponseTime,

v.ManagementServerName into #tmpPivot

FROM

(

       SELECT

       srv.AgentServerName,

       c.[Date]

       FROM

       (

              SELECT DISTINCT AgentServerName FROM #EventAllView

       ) srv

       CROSS APPLY

       (

              SELECT [Date] FROM TimesCTE

       ) c   

) res

JOIN

#EventAllView v

       ON DATEADD(MINUTE, ((DATEPART(MINUTE, v.TimeGenerated) / 5) * 5), DATEADD(HOUR, DATEDIFF(HOUR, 0, v.TimeGenerated), 0)) = res.[Date]

       AND v.AgentServerName = res.AgentServerName

ORDER BY

res.AgentServerName,

res.[Date]







DECLARE @CMD VARCHAR(MAX)

SET @CMD = 'SELECT * FROM (SELECT AgentServerName, StatusCode, CONVERT(VARCHAR(80), DATEADD(HOUR, (DATEDIFF(HOUR, GETUTCDATE(), GETDATE())), [Date]), 108) AS [Date] FROM #tmpPivot) AS SourceTable PIVOT (MAX(StatusCode) FOR [Date] IN ('




SET @CMD = @CMD + (

       SELECT

       LEFT(txt.grouped, LEN(txt.grouped) - 1)

       FROM

       (

              SELECT

              '[' + CONVERT(VARCHAR(80), DATEADD(HOUR, (DATEDIFF(HOUR, GETUTCDATE(), GETDATE())), [Date]), 108) + '],' AS [text()]

              FROM

              #tmpPivot

              GROUP BY

              CONVERT(VARCHAR(80), DATEADD(HOUR, (DATEDIFF(HOUR, GETUTCDATE(), GETDATE())), [Date]), 108)

                       ORDER BY

                       CONVERT(VARCHAR(80), DATEADD(HOUR, (DATEDIFF(HOUR, GETUTCDATE(), GETDATE())), [Date]), 108)

              FOR XML PATH('')

       ) txt(grouped))




SET @CMD = @CMD + ')) AS PivotTable ORDER BY AgentServerName'




EXEC (@CMD)







DROP TABLE #EventAllView

DROP TABLE #tmpPivot