A while ago I wrote a simple but useful script which I’m sharing here to detect upstream provider HSRP failover events via traceroute. It can be used for all kinds of virtual IP routing failover like VRRP, Check Point Cluster XL, actual routing protocols like BGP/OSPF or similar technologies where IP packets can be routed across multiple hops.
The script executes traceroutes to a given destination and checks whether the path is being routed over a certain hop, with the ability to send mail notifications if this is not the case.
You can get the most recent version of this script on my Github here. If you have any suggestions or improvements (which I’m sure there is plenty of room for), feel free to drop a comment or an issue or a pull-request on Github.
Besides good ol’ SNMP, the open Common Information Model (CIM) interface on an ESXi host provides a useful way of remotely monitoring the hardware health of your hosts via the Web-Based Enterprise Management (WBEM) protocol. Pretty much every major hardware management solution and agent today supports using WBEM to monitor hosts of various OSes.
Unlike SNMP (except for the painful to implement version 3), it builds on a standard HTTP(S) API, allowing secure SSL/TLS protected authentication and communication between the host and the management stations. Of course you can also use SNMP and WBEM independently at the same time too.
On ESXi, the CIM interface to is implemented through the open Small Footprint CIM Broker (SFCB) service.
Seems great, right? To manage your hosts via CIM/WBEM with for example the HP Systems Insight Management (SIM) pictured above, you just need to provide a local user on the ESXi host which SIM can use to authenticate against the host.
You can use the standard root user for example, but is that a good idea? I certainly disagree about that, even more so in environments of administrative disparity where you still have strict separation of virtualization admins and hardware admins (I agree this separation makes no sense in this day and age and causes all sorts of problems besides just this one, but this is the daily reality I’m facing).