As per Jason’s comment, with a new ILO4 update HP apparently has fixed an issue related to booting from SD cards. Whether this is the same issue is unclear though since the original KB article I linked to has not been updated.
Important note: The general symptom of such a Red Screen of Death described here is NOT specific to ESXi or booting from SD cards in general. It can happen with Windows, Linux or any other OS as well as other boot media such as normal disks/RAID arrays, if the server has a problem booting from this device (broken boot sector/partition/boot loader etc).
A couple of weeks ago I was updating a few HP Proliant DL360p Gen8 servers running ESXi on a local SD card with ESXi patches via VUM, so business as usual. Almost, because on one of the servers I ran into the following issue:
After rebooting the host, the BIOS POST completed fine and the Proliant DL360p Gen8 server should now boot the ESXi OS from it’s attached USB SD card where ESXi was installed; but instead it displayed this unsightly screen telling something went very, very wrong:
I reset the server several times via iLO but the issue persisted and I had no idea what exactly went bonkers here. Then I decided to boot a Linux live image, which worked fine, narrowing down the issue to the OS installation (device) itself. I thought the updates corrupted the installation but that actually wasn’t the case.
When attempting to mount the SD card USB drive from within the live Linux I noticed it was actually completely absent from the system. The USB bus was still ok, but lsusb showed no SD card reader device in the system at all!
Besides good ol’ SNMP, the open Common Information Model (CIM) interface on an ESXi host provides a useful way of remotely monitoring the hardware health of your hosts via the Web-Based Enterprise Management (WBEM) protocol. Pretty much every major hardware management solution and agent today supports using WBEM to monitor hosts of various OSes.
Unlike SNMP (except for the painful to implement version 3), it builds on a standard HTTP(S) API, allowing secure SSL/TLS protected authentication and communication between the host and the management stations. Of course you can also use SNMP and WBEM independently at the same time too.
On ESXi, the CIM interface to is implemented through the open Small Footprint CIM Broker (SFCB) service.
Seems great, right? To manage your hosts via CIM/WBEM with for example the HP Systems Insight Management (SIM) pictured above, you just need to provide a local user on the ESXi host which SIM can use to authenticate against the host.
You can use the standard root user for example, but is that a good idea? I certainly disagree about that, even more so in environments of administrative disparity where you still have strict separation of virtualization admins and hardware admins (I agree this separation makes no sense in this day and age and causes all sorts of problems besides just this one, but this is the daily reality I’m facing).
We recently changed the iLO local account logins in favor of LDAP authentication against our AD, which is cool but raised the issue that sometimes logins seemed to work with my AD account and sometimes not, because not every system was configured for LDAP authentication properly.
Instead of checking logins on dozens of servers manually (with the nice iLO failed login delay), I took a stab at analyzing the login procedures and scripting the logins myself.
So I came up with this horrible piece of bash script doing exactly that. I checked this script with all known iLO versions 1, 2, 3 and 4, and it worked with all of them (the login procedure for versions 1/2 and 3/4 are identical). Running it requires an argument pointing to a file containing the iLO hostnames or IPs to connect to.
Here’s the script on pastebin with formatting: http://pastebin.com/i2Y0xSTQ:
Update: I have moved my scripts to GitHub and updated some of them a bit. You can find the current version of this particular script here.
Attention: [Update 16.01.2013]
HP actually pulled the updates (which were titled “February” updates) from their VIBs Depot site and purged the references from the depot metadata indexes as well. I’m not sure what’s going on but you won’t be able to apply these updates (via Update Manager) unless you downloaded them already. But even if you did, you should refrain from using these bundles at this time. Unfortunately there seems to be no way of properly removing them from Update Manager if it pulled the metadata already.
HP re-released the VIBs available at http://vibsdepot.hp.com/hpq/feb2013/
(Thanks to milanod for the hint in the comments)
HP actually removed the re-released updates from the vibsdepot yet again?!
The updated bundles are still listed on the software/support/drivers lists for Proliant Servers though:
I’m speechless in the face of this unprecedented fail.
Uh-oh, the updates SEEM to be back at http://vibsdepot.hp.com/hpq/feb2013/. File dates are from Jan 4th and the bundles md5sums match the ones from the initial release mid-January (which this post was about) exactly. So if there really was a bug with the release, it must still be there.
Taking bets on how long it’ll take HP to offline them again.
(Thanks to Wu in the comments)
The issue with the SmartArray warning which this bundle brought us has been fixed in a recent update.
After some very minor updates back in October that did not come with release notes it’s time for another round of updates to the ESXi HP extensions and other stuff. Unfortunately, we don’t seem to be getting release notes or general infos now either.
But these updates are publicly available on http://vibsdepot.hp.com/hpq/feb2013/ already and your VMware Update Manager should have already picked them up if you set it up to use the HP VIB depot.
Since HP is so kind to not provide release notes, we can only guess about actual fixes or improvements, but we can at least check which of the VIBs contained in the offline bundles really do provide updates (spoiler: not that much).
You may not always have the convenient option to install vendor-specific hardware management agents/extension on ESXi hosts or physical servers, for example with appliance-ish OSes like the Check Point SPLAT/Gaia platform (which is just a custom RHEL descendant), or you may run into a server without these tools installed. So how can you still query firmware information on such systems directly from the command line? I will outline a couple of ways here which make it possible to obtain that information.
The example information captured here is from HP Proliant Servers (since G5), but most of it should work in similar ways with other hardware platforms too. Unless noted otherwise, the example commands here should work regardless of whether you have CIM providers or hardware management agents installed or not.