SCOM can be used to monitor all kinds of devices, other than servers and workstations. As long as SNMP and/or ICMP is enabled and not blocked by firewalls, you can extract useful monitoring data from the devices. This blog will not show how to set up this monitoring by means of the right management packs, configure accounts and profiles and finally configure the discovery rule. There is already plenty of documentation about this configuration process.
The thing I would like to talk about is a specific problem we had with the discovery process of these devices.
What was the problem?
We have an explicit discovery rule configured with 162 devices and selected the right discovery server and resource pool. There are multiple RunAs accounts defined for public and secret v1 and v2 community strings.
Every run of this rule gave us different results. Sometimes it found 96 devices, another run found 113 devices and yet another found 41 devices. What device got discovered was totally random. We came to the conclusion that if we manipulated the rule during several runs by changing ICMP and SNMP to only ICMP and re-run the rule and changing it back to ICMP and SNMP, it nearly found all the devices. Some runs also took a very long time to process, it could be more than 12 hours.
But that is of course not how it is supposed to work. It is unreliable. SCOM should find all the devices in one run, either by ICMP only, SNMP only or both.
We tried every solution we could find on the internet. There is not much useful information to be found about discovery issues. Especially our problem (random discoveries) was never mentioned. Then finally, we opened a support call with Microsoft. A nice guy in Romania was assigned to our case and he asked me to get all the debug logging from the SCOM server for analysis. Weeks passed by, several tests were done and loggings were sent to Microsoft.
And then, almost 6 weeks later, the solution entered my mailbox!
The directory C:\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\NetworkMonitoring\local\repos\icf (depending on your install location) contains a rps file, which is a repository file which stores all the discovery devices. This file can, somehow, get corrupted, but still it is used for the discovery process.
To restore this file and get a clean discovery, you should do the following:
Delete all the devices that are already being discovered (Administration > Network Management > Network Devices and Network Devices Pending Management).
Delete the rps file in the directory mentioned earlier.
Restart the Health Service by entering the following command in CMD: net stop healthservice && net start healthservice.
Edit the discovery rule and add the devices to discover.
Run the discovery.
I hope this will help others solving Discovery problems in SCOM. If you encounter any problems, please do not hesitate to contact ConoScenza.