Renamed VMware Tools components and automatic installation

With the release of vSphere 6.0 and a recent update for the 5.5 VMware Tools on May 8th 2015 (9.4.12 build 2627939), VMware also changed the Windows VMware Tools installer slightly by renaming some vShield-related components:
VMware ESXi 5.5, Patch ESXi550-201505402-BG: Updates tools-light

The vShield Endpoint drivers are renamed as Guest Introspection Drivers and two of these drivers, NSX File Introspection Driver (vsepflt.sys) and NSX Network Introspection Driver (vnetflt.sys), can be installed separately now. This allows you to install the file driver without installing the network driver.

If you’ve been using custom automated installations of the VMware Tools like me then you might have to adjust the installer command for newer tools versions.

Continue reading

Illegal OpCode Red Screen of Death while booting a HP Proliant server from an USB SD card

[Update]
As per Jason’s comment, with a new ILO4 update HP apparently has fixed an issue related to booting from SD cards. Whether this is the same issue is unclear though since the original KB article I linked to has not been updated.
[/Update]

Important note: The general symptom of such a Red Screen of Death described here is NOT specific to ESXi or booting from SD cards in general. It can happen with Windows, Linux or any other OS as well as other boot media such as normal disks/RAID arrays, if the server has a problem booting from this device (broken boot sector/partition/boot loader etc).

A couple of weeks ago I was updating a few HP Proliant DL360p Gen8 servers running ESXi on a local SD card with ESXi patches via VUM, so business as usual. Almost, because on one of the servers I ran into the following issue:
After rebooting the host, the BIOS POST completed fine and the Proliant DL360p Gen8 server should now boot the ESXi OS from it’s attached USB SD card where ESXi was installed; but instead it displayed this unsightly screen telling  something went very, very wrong:

iloillegalopcodeI reset the server several times via iLO but the issue persisted and I had no idea what exactly went bonkers here. Then I decided to boot a Linux live image, which worked fine, narrowing down the issue to the OS installation (device) itself. I thought the updates corrupted the installation but that actually wasn’t the case.
When attempting to mount the SD card USB drive from within the live Linux I noticed it was actually completely absent from the system. The USB bus was still ok, but lsusb showed no SD card reader device in the system at all!

Continue reading

[Script] Poor man’s vSphere network health check for standard vSwitches

Among many other new features, the vSphere 5.1 distributed vSwitch brought us the Network Health Check feature. The main purpose of this feature is to ensure that all ESXi hosts attached to a particular distributed vSwitch can access all VLANs of the configured port groups and with the same MTU. This is really useful in situations where you’re dealing with many VLANs and the roles of virtualization and network admin are strongly separated.

Unfortunately, like pretty much all newer networking features, the health check is only included in the distributed vSwitch which requires vSphere Enterprise+ licenses.

There are a couple other (cumbersome) options you have though:
If you’re have ESXi 5.5 you can use the pktcap-uw utility on the ESXi shell to check if your host receives frames for a specific VLAN on an uplink port:
The following example command will capture receive-side frames tagged with VLAN 100 on uplink vmnic3:
# pktcap-uw –uplink vmnic3 –dir 0 –capture UplinkRcv –vlan 100
If systems are active in this VLAN you should see a few broadcasts or multicasts already and meaning the host is able to receive frames on this NIC for this VLAN. Repeat that for every physical vmnic uplink and VLAN.

Another way to check connectivity is to create a vmkernel interface for testing on this VLAN and using vmkping. Since manually configuring a vmkernel interface for every VLAN seems like a huge PITA, I came up with a short ESXi shell script to automate that task.
Check out the script below. It uses a CSV-style list to configure vmkernel interfaces with certain VLAN and IP settings, pinging a specified IP on that network with the given payload size to account for MTU configuration. This should at least take care of initial network-side configuration errors when building a new infrastructure or adding new hosts.
This script was tested successfully on ESXi 5.5 and should work on ESXi 5.1 as well. I’m not entirely sure about 5.0 but that should be ok too. (Please leave a comment in case you can confirm/refute that).

Introducing the ghetto-vSwitchHealthCheck.sh:

Update: I have moved my scripts to GitHub and updated some of them a bit. You can find the current version of this particular script here.

Continue reading

openssl heartbleed attack – The cryptocalypitc judgement day has arrived

The internet is now (no really, this time it is, pinky-promise) officially broken™. Or at least a large part of it that is responsible for securely handing your shopping cart and credit card information, e-mails, passwords, router configuration or whatever flows through your tubes over encrypted SSL/TLS connections.

That escalated quickly.Since yesterday there’s a big fuss on the internet over the so-called openssl heartbleed bug that was disclosed yesterday. It exploits the TLS heartbeat extensions and allows an attacker to read up to 64KiB of memory managed by openssl from the target system. This memory could contain sensitive data such as passwords, session-cookies, credit card information, or in one of the worst cases even the server’s x509 certificate private key. Since the attack is basically untraceable (no webserver logs or anything), it’s entirely possible that certificate private keys have been compromised en masse already without anyone noticing and never being able to find out.
You can find some interesting background information about the discovery and disclosure in this article.

To be on the really safe side, you have to replace your certificates, revoking the old ones, changing passwords and generally being PITA’d. Note that this has nothing to do with the openssl version a certificate was generated with, but is about the runtime application using openssl for it’s SSL/TLS traffic. Your private key can be just as compromised if you generated the certificate with Windows CA tools or gnutls and use it with an openssl-based application. Continue reading

Storage performance testing: Not all IOPS are created equal

Every once in a while I come across postings of astonishingly awesome IOPS numbers achieved with a relatively moderate setup. Often this is due to the fact that the benchmark used ran a “maximum throughput” IO pattern, which doesn’t say a lot about actual storage performance. This is because these kinds of patterns issue idealized, sequential IO (often with small IO sizes) to measure the maximum, theoretical throughput of the storage subsystem. Unfortunately this kind of IO pattern practically never occurs with real world applications.

An IO stream hitting a storage device:

  • Is Random to a certain degree
  • Has a certain read/write ratio
  • Can have variable IO sizes between 512 Byte and several MiB (though that is hardly real-world relevant, up to 128KiB is normal)
  • Is stuffed into queues at various levels in the storage stack with various queue depths

All these factors have a huge influence on what everyone casually and generalizing calls “IO Operations per Second”, or IOPS. The quite basic rule of thumb is:
The smaller the IO size and the more sequential the workload, the higher the IOPS number will be.
It should be noted that randomness of IO does not affect flash-based storage like SSDs, but impacts traditional spinning disks to a very large degree. Also, taken a single (spinning) disk, write IOs are not necessarily more expensive than read IOs, considering the local cache on every disk has is a pure read-cache (and don’t you dare to enable write-caching on the disks themselves). Of course this is where RAID-penalties come in, but I’m trying to focus on IO itself regardless of physical storage properties in this article.

When vendors claim their system or disk delivers (up to, hehe) [insert arbitrary number here] IOPS, they usually refer to a workload that is completely random, mainly consists of writes and has a 4-32KiB IO block size.
If you conduct your own storage IO testing, you should use similar IO patterns if you want to compare real-world storage
performance. Otherwise you may end up with unrealistic huge numbers like in the maximum throughput cases.

A great resource for comparing storage IO performance is this thread on the VMware community forums, where hundreds of results using a common IOmeter configuration are posted.
You can get the IOmeter config file here and also paste your resulting csv files there for summarizing numbers.

There is also the VMware IO analyzer virtual appliance, providing various pre-defined IO pattern configs for applications like Exchange or SQL server, but last time I tested it behaved a bit “funky”. (It runs Windows IOmeter in Linux via wine – urks).

Last but not least, a tip if you want to analyze just what kind of IO workload your specific VM/application is issuing: vscsistats is your friend and provides tremendous in-depth information on this.

Further recommended reading:
http://www.symantec.com/connect/articles/getting-hang-iops-v13

Configuring and securing local ESXi users for hardware monitoring via WBEM

Besides good ol’ SNMP, the open Common Information Model (CIM) interface on an ESXi host provides a useful way of remotely monitoring the hardware health of your hosts via the Web-Based Enterprise Management (WBEM) protocol. Pretty much every major hardware management solution and agent today supports using WBEM to monitor hosts of various OSes.
Unlike SNMP (except for the painful to implement version 3), it builds on a standard HTTP(S) API, allowing secure SSL/TLS protected authentication and communication between the host and the management stations. Of course you can also use SNMP and WBEM independently at the same time too.
On ESXi, the CIM interface to is implemented through the open Small Footprint CIM Broker (SFCB) service.

sim1 sim2








Seems great, right? To manage your hosts via CIM/WBEM with for example the HP Systems Insight Management (SIM) pictured above, you just need to provide a local user on the ESXi host which SIM can use to authenticate against the host.
You can use the standard root user for example, but is that a good idea? I certainly disagree about that, even more so in environments of administrative disparity where you still have strict separation of virtualization admins and hardware admins (I agree this separation makes no sense in this day and age and causes all sorts of problems besides just this one, but this is the daily reality I’m facing).

Continue reading

[Script] PowerCLI – find out the host on which VMs are running when vCenter is down

Many moons ago, when I started playing with the wonderfulness that is PowerCLI, one of the first things I wrote with a particular problem in mind was a small script to quickly locate the host running our vCenter server in case anything went wrong and I lost access to vCenter directly.
So instead of trying to connect to every possible host of the cluster manually with the vSphere Client, why not just connect to all of them via PowerCLI and query them quickly?

Where’s Waldo?

This resulted in the small, simple script posted below. For this script, you can provide either a list of hosts to connect to, an alias for a cluster which member hosts you pre-populated in the script, along with one or more search strings. This search is matched against the VM names and outputs the list of found VMs with their current power state and most importantly, the host running the VM. This way you can get a VM-Host mapping of not only your vCenter VM, but other VMs as well.

I remembered this script while reading a cool article on v-front.de about various other ways to keep track on which host your vCenter VM is running.
I “polished” the old, simple code a bit but yeah, I’m still pretty horrible when it comes to scripting. Anyways, here it is in case anyone finds it useful:

Continue reading