vSphere 5.1 is here – time for info spoonfeeding

(More or less). Be sure to lurk on planetv12n for more complete and ordered info around everything concerning the still smoking hot news of vSphere 5.1.  So as it was leaked in some places already, vSphere 5.1 was just officially announced at the VMworld keynote. I hope you haven’t been spoilered beforehand too much so you’re just as excited as me. To me this seems like an even more important and better change compared to the 4.1->5.0 step. Release will be mid-September.

Update: vSphere 5.1 was released today on September 11.

There’s tons of new stuff. To confirm one of the best news first:

 

vRAM licensing died the death it deserved

Funny how they openly admitted that the whole idea of vRAM was bollocks during the presentation, asking for standing ovations from the audience for this indeed welcome change. Check details about the changes on the licensing front here:

>vRAM Entitlement Unlimited     Unlimited     Unlimited
Oh boy, they really totally showed vRAM who’s boss:
Q: Does the new VMware vSphere 5 licensing model – per-CPU without limitation on the number of VMs, cores or amount of physical RAM – apply to both vSphere 5.0 and vSphere 5.1?
A: Yes. The VMware vSphere 5 licensing model applies to both
vSphere 5.0 and 5.1 customers.
This effectively means, that the vRAM shenanigans end now even if you are/stay on 5.0. It was never enforced or properly tracked except for essentials anyways.

Loads of new features and enhancements

New Replication feature and svMotion are part of Standard edition now. It also seems as if prices for each edition went down a bit, which is nice too. There is now only per socket licensing, no vRAM, no cores, no 25 VM packs.

There are tons of really awesome improvements on the distributed vSwitch front. And none of them seem to require using the Nexus 1000V! Some of the highlights include dvSwitch configuration backup and restore, SR-IOV, LACP support and BDPU filtering. That’s exactly the kind of enhancement’s I was looking for.  And probably not only me. Chris Wahl already has a series of must-read articles presenting the dvSwitch changes over at:
http://wahlnetwork.com/2012/08/27/new-5-1-distributed-switch-features-part-1-network-health-check/

Furthermore, we now get a new virtual hardware version 9, providing support for up to 64 vCPUs now. Not like we and most others will ever get anywhere close there, but it’s good we haven’t really bothered updating the virtual HW from 7 to 8 with the vSphere 5 update,  now we can skip directly to 9.

vSphere now supports shared-nothing live migration, called enhanced vMotion. This is based on the new replication mechanism, or rather, the host-based replication stuff that was present in 5.0 already but only used in conjunction with SRM. The crazy guys also achieved over one million IOPS on a single VM on vSphere 5.1.

Here’s a list of random other facts you should be aware of, or read up further from the other documents listed below:

  • New Zero-downtime upgrade for VMware Tools (wonder how that’s supposed to work for driver updates on Windows?)
  • WebClient is now the primary administrative GUI of choice – no more fat, 400MB C# client?
  • EMC Avamar-based backup replacing the wobbly vDR backup solution
  • Flexible, space-efficient storage for virtual desktop infrastructure (VDI) (whatever that really means)
  • support for locally administered MAC addresses
  • ESXi hosts now support SNMPv3
  • Stateful Caching and Install mode with Auto-Deploy
  • ESXi hosts now support 256 pCPU support, as opposed to 160 before
  • special Latency Sensitivity Setting that automatically makes low-level changes in the vSphere kernel to reduce latency for the virtual machine
  • Improved Graphics, Hardware-Accelerated 3D Graphics Support
  • Storage DRS Enhancements – Correlation Improvements
  • New VAAI SAN-snapshot offloading, but only with vCloud Director
  • Space reclamation/wipe/shrink via SCSI unmap –  hopefully now less bugged than under 5.0
  • New APD and PDL handling – for when your storage breaks
  • vShield Endpoint now included with base vSphere 5.1 license

Detailed reading material

I recommend checking out some more What’s New documents for your reading pleasure in the meantime until it’s released:

The Interoperability Matrix is not updated yet, neither is the HCL.

There’s a whole lot of news on the vCloudandwhatnot side too, but I won’t get into detail here. The new vCloud Suite now bundles licenses of  the base vSphere Ent+ with vCloud Director, Operations, SRM and more. Current Ent+ licenses get a free upgrade to the vCloud Suite Standard (with vCD). Check the website for more details.

And as if all that wasn’t even enough, I just noticed a few other new whitepapers in the technical papers section you should pick up while you’re at it:

Keep ’em coming Jim.

Advertisements

Running Check Point ClusterXL member interfaces on different subnets than the virtual interfaces

We have a couple of 802.11q VLAN interfaces on our Check Point Firewalls for a number of small public DMZ networks, providing full layer 2 isolation for usually only one or two related servers in those networks. Using a /29 subnet for each VLAN means minus broadcast and network address, we can assign 6 (public) IP addresses in those networks. We also need an IP for the virtual ClusterXL interface to be used as a gateway by the servers. Up until recently, we also used 2 more IPs of each subnet for the members VLAN interfaces, so they can communicate via Check Points Cluster Control Protocol or CCP. This left us with merely 3 public IPv4 addresses per /29 subnet for actual servers inside those VLANs.
Wasting IPs like this on interfaces transparent for your actual services is not really practical in times of scarce IPv4 address space though. So what we did was getting rid of using IP addresses of the public subnet for the VLAN interfaces of each cluster member during our upgrade to R75.40 and Gaia on new systems.  This configuration is fully supported (actually since long ago) as per ClusterXL Admin Guide:

Configuring Cluster Addresses on Different Subnets
Only one routable IP address is required in a ClusterXL cluster, for the virtual cluster interface that faces the Internet. All cluster member physical IP addresses can be non-routable.Configuring different subnets for the cluster IP addresses and the member addresses is useful in order to:
– Enable a multi-machine cluster to replace a single-machine gateway in a pre-configured network, without the need to allocate new addresses to the cluster members.
– Allow organizations to use only one routable address for the ClusterXL Gateway Cluster. This saves routable addresses.

Continue reading

July 2012 ESXi and vCenter updates

Yesterday and today VMware released an unusual Update for vCenter (and VCSA) titled “5.0 Update 1a” as well as couple regular bug fixes and security patches for ESXi 5 hosts.
Update Aug 20 2012: VMware recently released vCenter 5.0 Update 1b, which replaces 1a due to some issues with Oracle databases. Apart from that, there seem no other new fixes that weren’t present in 1a already. So if you’re on 1a already with a non-oracle database, there is no real need to upgrade to 1b.

VMware vCenter Server 5.0 Update 1a – a as in accident? Update 1b – b as in bummer

Grab the official release notes here:
https://www.vmware.com/support/vsphere5/doc/vsp_vc50_u1a_rel_notes.html
https://www.vmware.com/support/vsphere5/doc/vsp_vc50_u1b_rel_notes.html

It’s the first time VMware officially released such an official out-of-band update for vCenter with a small set of fixes and enhancements, I was quite surprised too. As expected of such a minor update, it doesn’t provide any significant new features. Adding support for using vCenter with some very specific Oracle version, or switching from DB2 to PostgreSQL for the VCSA? Ok, next please.
Much more interesting and probably the reason why they made this move in the first place, are a couple of bugs this update fixes. One of them is the infamous Storage vMotion dvSwitch issue for which until now the workaround meant some manual labor or running scripts after each Storage vMotion (including SDRS).
The other fixes don’t seem particulary interesting, but I’m glad VMware finally fixed the Storage vMotion issue.

Oh joy, installing another complete vSphere Client of a whooping 350MB for just a few vCenter-side fixes! Not speaking of the actual vCenter “update” process which is essentially a whole reinstall of the new version every time.
Will VMware ever offer a proper patching method for this? Praying seems to be the only hope left.

New set of ESXi 5 patches – (a as in) adhering to a patchday policy?

Note: You do not need the above mentioned 5.0 U1a vCenter for these ESXi patches.
Less unusual appears to be the release of a series of new ESXi 5 patches for both, security and bug fixing reasons. It’s been almost exactly a month (hello patchday) since the last security patch, which was a really important one, but luckily these two new security patches for a libxml 3rd party component and a stability issue in the VMware Tools don’t seem that critical to require your immediate attention (rated important by VMware).
I will highlight a few fixes and points from the patch notes I personally deem important:

Security patches:
VMware ESXi 5.0, Patch ESXi500-201207101-SG: Updates esx-base
>This patch updates the esx-base VIB to incorporate an important libxml2 security update.
VMware ESXi 5.0, Patch ESXi500-201207102-SG: Updates tools-light
   >This patch updates the tools-light VIB to resolve a stability issue in VMware Tools.
The full advisory that came with the first patch is available here: http://www.vmware.com/security/advisories/VMSA-2012-0012.html

Other bug fixing patches:
VMware ESXi 5.0, Patch ESXi500-201207401-BG: Updates esx-base
   >PR 831801: The default value of FIN_WAIT_2 timer was erroneosly set to TCPTV_KEEPINTVL * TCP_KEEPCNT = 75* 0x400. This discrepancy results in the socket at FIN_WAIT_2 state to exist for a much longer time and if multiple such sockets are accumulated, they might impact new socket creation.
    >PR 835040: An ESXi host might not respond or get disconnected due to the esx.conf file being locked. This might happen because when updating the ESX configuration (esx.conf configuration file) a lock file /etc/vmware/esx.conf.LOCK is created and it is linked to the process attempting to lock the file. If the link (known as a symlink) is not valid then it prevents the esx.conf from being unlocked.
    >PR 838922: An ESXi host might not restart UDP logging after a temporary interruption that might be caused by target server reboot or network UDP package being lost.
    >PR 838946: During the installation of Windows Server 2012 or Windows 8 64-bits virtual machine, the virtual machine displays a black screen with a loading icon and stops responding during the start-up process.
    >PR 848382: An ESXi host might become unresponsive with a purple diagnostic screen (PSOD) that displays messages similar to the following if any changes are made to BufferCache.HardMaxDirty.
    >PR 866810: On Dvportgroup, a promiscuous port might not work in promiscuous mode if the DVMirror sessions are reconfigured.
    >PR 855177: If you configure the loghost of syslogd incorrectly via esxcli or edit the /etc/vmsyslog.conf file directly followed by syslog reload, syslogd might terminate abruptly. You might notice the following symptoms with this issue:
VMware ESXi 5.0, Patch ESXi500-201207402-BG: Updates tools-light
 >PR 751508: Windows Vista and later versions of Windows do not recommend allowing services to be interactive; however, VMware Tools Service is installed as an interactive service.
VMware ESXi 5.0, Patch ESXi500-201207403-BG: Updates scsi-mptsas
VMware ESXi 5.0, Patch ESXi500-201207405-BG: Updates misc-drivers
VMware ESXi 5.0, Patch ESXi500-201207406-BG: Updates net-e1000
Just your usual driver updates with no detailed info of the last 3 here.

You can install VMware Tools updates without maintenance mode or reboots btw. since it effectively only overwrites the Tools ISO images in /vmimages/tools-isoimages. Note that ESX(i)4 patches are also pending to be released. As per VMware support for a case we have open, I also expect a real vCenter and ESXi  5.0U2 to be released soon-ish. Maybe around VMworld 2012 and with the potential release of vSphere 5.1?

Linux not responding to ARP-Request from Juniper Firewall with subnet IP

We recently got a new firewalled subnet and VLAN for servers of a certain department and ran into a strange issue when we deployed our first system with Linux in there.
The system would seem to become randomly unreachable for an arbitrary amount of time, ranging from a few minutes to hours. Plugging a 2nd VM into the VLAN (on a different host and a different switch) confirmed that connectivity inside the VLAN was working as usual without issues, so the cause would have to be with the gateway or routing to the subnet.
Note that this being about “VMs” is completely irrelevant to this issue and would have happened the with physical servers too.

When the issue was brought to my attention, a quick trace of the network traffic and tests with the 2nd VM quickly revealed what was going wrong:
The Juniper firewall, acting as gateway in that subnet, sent ARP-Requests asking to “tell” the answer to the network address of the subnet and the Linux OS wouldn’t respond.

We have a plain, simple subnet of 10.9.24.0/24 with the Linux box in this case having the IP 10.9.24.56/24. The gateway is actually supposed to run with the 10.9.24.1/24, which it does too. But when sending ARP-Requests for forwarding incoming traffic, the gateway is using the subnet IP in the ARP-Request (“who-has 10.9.24.56 tell 10.9.24.0“). What?
Linux, unlike Windows does not seem to like this and just ignores the ARP-Request without even thinking of replying. Consequently, the gateway will never be able to resolve the MAC and thus never forward traffic to that IP.

A quick confirmation of this behavior from the viewpoint of a Linux box in this subnet:

# ifconfig eth0 10.9.24.56/24
# tcpdump -i eth0 -nnev arp or rarp
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
  12:39:32.403516 f8:c0:01:12:5d:48 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 10.9.24.56 tell 10.9.24.0, length 46
  12:39:34.635769 f8:c0:01:12:5d:48 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 10.9.24.56 tell 10.9.24.0, length 46
  12:39:36.635680 f8:c0:01:12:5d:48 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 10.9.24.56 tell 10.9.24.0, length 46

>The Juniper keeps sending ARPs with the subnet IP as Sender protocol address, but the Linux OS ignores them and doesn’t bother to respond.

Now we do the following:

# ifconfig eth0 10.9.24.56/16
 # tcpdump -i eth0 -nnev arp or rarp
 tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
 12:40:22.833832 f8:c0:01:12:5d:48 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 10.9.24.56 tell 10.9.24.0, length 46
 12:40:22.833871 00:50:56:84:00:a2 > f8:c0:01:12:5d:48, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply 10.9.24.56 is-at 00:50:56:84:00:a2, length 28

>Now running with a /16 subnet mask for example, the IP 10.9.24.0 is just like any other host-IP to us and the Linux OS responds to the ARPs properly.

Workaround

Not being in charge of the networking side, I raised the issue with our network guys and in the meantime worked around it by manually broadcasting an ARP-Reply for our IP with the arping utility periodically. This is a great resource on ARP basics with arping counterparts for your experimenting pleasure.

The following command will broadcast a single ARP-Reply for our IP 10.9.24.56 through interface eth0:

# arping -c 1 -I eth0 -A 10.9.24.56
 # tcpdump -i eth0 -nnev arp or rarp
  tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
  12:43:42.897764 00:50:56:84:00:a2 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply 10.9.24.56 is-at 00:50:56:84:00:a2, length 28

So here we got our more or less dirty but necessary workaround in the form of a Gratious ARP. I cronjob’d this and the issue was gone.
Afterwards, the networking guys set a static ARP entry on the Juniper , but I (and them probably) still have no idea why it sent ARPs with the subnet IP in the first place. We have other Juniper firewalled subnets where they use a physical host IP properly.

Again, Windows (tested 2008 R2) seems to be oblivious to those requests and answers anyways. I confirmed this with RHEL6 and Fedora 17, but I would be surprised if any other Linux with a halfway recent kernel handles this any different. Also this issue is NOT specific to a VMware ESXi VM or virtualization in general.
Skimming through the ARP RFC briefly, I haven’t found anything definite saying a system should not reply to the subnet IP or anything though.

ESX host notify switches behavior

This section is completely unrelated to the issue at hand here, but on the topic of ARPs, I was also curious as to how exactly the frames look which an ESX(i) host sends with the “notify switches” option when you vMotion a VM. I was always under the impression it was a Gratious ARP like I do with arping above, but it turns out it’s just Reverse-ARP broadcast frames. Not like it matters much, since it does the job of updating switching tables (which are NOT related to ARP tables that stay unaffected anyways).
Just found this too: This KB Article on NLB Unicast Issues actually explains pretty well that it is RARP.

# tcpdump -i eth0 -nnev arp or rarp
  tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
  13:22:57.447460 00:50:56:84:03:78 > ff:ff:ff:ff:ff:ff, ethertype Reverse ARP (0x8035), length 60: Ethernet (len 6), IPv4 (len 4), Reverse Request who-is 00:50:56:84:03:78 tell 00:50:56:84:03:78, length 46
  13:22:57.842676 00:50:56:84:03:78 > ff:ff:ff:ff:ff:ff, ethertype Reverse ARP (0x8035), length 60: Ethernet (len 6), IPv4 (len 4), Reverse Request who-is 00:50:56:84:03:78 tell 00:50:56:84:03:78, length 46
  13:22:58.842661 00:50:56:84:03:78 > ff:ff:ff:ff:ff:ff, ethertype Reverse ARP (0x8035), length 60: Ethernet (len 6), IPv4 (len 4), Reverse Request who-is 00:50:56:84:03:78 tell 00:50:56:84:03:78, length 46
  13:22:59.842650 00:50:56:84:03:78 > ff:ff:ff:ff:ff:ff, ethertype Reverse ARP (0x8035), length 60: Ethernet (len 6), IPv4 (len 4), Reverse Request who-is 00:50:56:84:03:78 tell 00:50:56:84:03:78, length 46