Invoke-VMScript issue: “The guest operations agent could not be contacted”

[This issue has been fixed with the VMware Tools releases of ESXi 5.0 Update 2 and ESXi 5.1 Update 1.
Note that you can run more recent versions of VMware Tools to fix this on your VMs even if you haven’t updated your hosts, this is fully supported]

The Invoke-VMScript PowerCLI function provides a cool way to execute scriptcode inside guests through the VMware tools, on Linux and Windows VMs (as long as a couple of prerequisites are met). While playing with it a while ago, I noticed that on many VMs, it wouldn’t run properly, instead just spilling the error “The guest operations agent could not be contacted”. To fix this, the VMware Tools service inside the guest had to be restarted. Not thinking too much of the oddness, I left it at that since I didn’t have a real use case for the Invoke-VMScript function.
This changed later on, when I ran into the same issue again that made me finally dig a bit deeper.

Quite a few other people reported this issue on the VMTN forums since some time, but there was no solution posted. A reference to Backup Exec initiated backup operations made me curious though – had I not fought with another problem induced by it already?

I then was able to track it down to the Reload Virtual Machine operation, which Backup Exec triggers each time before removing the snapshot of a backup too. But not only that, unlike my previous issue, this problem is fully reproducible manually without any involvement of Backup Exec: Continue reading

Snapshot removal issues with BackupExec and locked files

<Status update Nov 22nd 2012>
Check out the comments after reading this article, some interesting points there. The case at Symantec never got resolved to this very day. We’re also still using Backup Exec 2010 R3.
</Status update>

<Update Sept 12th 2013>
The issue is now finally resolved since our backup admins upgraded to Backup Exec 2010 R3 Service Pack 3.

We are currently facing an issue with a small number of VMs, where  snapshots created by our backup software, Symantec BackupExec, can’t be removed properly because of locked files (neither through BackupExec nor manually).
In vCenter, the warning “Virtual machine disks consolidation failed.” is being logged as a simple event (I might create an alarm for this now that I think about it).

The problem

You will not see these snapshots in the snapshot manager (fix this for good, VMware!), but only on the filesystem.
Unfortunately, unlike with vSphere 4 there is no obvious, specific error. The remove snapshot task completes successfully and you’ll only notice on the VM summary page that it needs disk consolidation.

Pre-vSphere 5, the task would fail with an error about how it couldn’t consolidate the snapshot due to a locked file. This info is now only found in the vmware.log file (and surely vmkernel log files) of the VM in its datastore:

# grep -i lock vmware.log
2012-06-04T08:06:21.069Z| vcpu-0| AIOGNRC: Failed to open ‘/vmfs/volumes/4fb20a9a-1b7f7c20-0363-002481e443c1/SomeVM002/SomeVM002-flat.vmdk’ : Failed to lock the file (400000003) (0x2013).2012-06-04T08:06:21.069Z| vcpu-0| AIOMGR: AIOMgr_OpenWithRetry: Descriptor file ‘/vmfs/volumes/4fb20a9a-1b7f7c20-0363-002481e443c1/SomeVM002/SomeVM002-flat.vmdk’ locked (try 0)2012-06-04T08:06:22.580Z| vcpu-0| DISKLIB-VMFS : “/vmfs/volumes/4fb20a9a-1b7f7c20-0363-002481e443c1/SomeVM002/SomeVM002-flat.vmdk” : failed to open (Failed to lock the file): AIOMgr_Open failed. Type 3
2012-06-04T08:06:22.580Z| vcpu-0| DISKLIB-LINK : “/vmfs/volumes/4fb20a9a-1b7f7c20-0363-002481e443c1/SomeVM002/SomeVM002.vmdk” : failed to open (Failed to lock the file).
2012-06-04T08:06:22.580Z| vcpu-0| DISKLIB-CHAIN : “/vmfs/volumes/4fb20a9a-1b7f7c20-0363-002481e443c1/SomeVM002/SomeVM002.vmdk” : failed to open (Failed to lock the file).
2012-06-04T08:06:22.580Z| vcpu-0| DISKLIB-LIB : Failed to open ‘/vmfs/volumes/4fb20a9a-1b7f7c20-0363-002481e443c1/SomeVM002/SomeVM002.vmdk’ with flags 0x20a Failed to lock the file (16392).
2012-06-04T08:06:22.580Z| vcpu-0| SNAPSHOT:Failed to open disk /vmfs/volumes/4fb20a9a-1b7f7c20-0363-002481e443c1/SomeVM002/SomeVM002.vmdk : Failed to lock the file (16392)
2012-06-04T08:06:22.601Z| vcpu-0| DISK: Failed to open disk for consolidate ‘/vmfs/volumes/4fb20a9a-1b7f7c20-0363-002481e443c1/SomeVM002/SomeVM002-000004.vmdk’ : Failed to lock the file (16392) 53452012-06-04T08:06:22.657Z| vcpu-0| Vix: [675925 vigorCommands.c:577]: VigorSnapshotManagerConsolidateCallback: snapshotErr = Failed to lock the file (5:4008)

Nice, isn’t it? Creating a new snapshot and selecting delete all snapshots will not work because it’s still locked. It will only increase the number of delta files for your VM.

Digging down to the root of the issue

Continue reading