Snapshot removal issues with BackupExec and locked files

<Status update Nov 22nd 2012>
Check out the comments after reading this article, some interesting points there. The case at Symantec never got resolved to this very day. We’re also still using Backup Exec 2010 R3.
</Status update>

<Update Sept 12th 2013>
The issue is now finally resolved since our backup admins upgraded to Backup Exec 2010 R3 Service Pack 3.

We are currently facing an issue with a small number of VMs, where  snapshots created by our backup software, Symantec BackupExec, can’t be removed properly because of locked files (neither through BackupExec nor manually).
In vCenter, the warning “Virtual machine disks consolidation failed.” is being logged as a simple event (I might create an alarm for this now that I think about it).

The problem

You will not see these snapshots in the snapshot manager (fix this for good, VMware!), but only on the filesystem.
Unfortunately, unlike with vSphere 4 there is no obvious, specific error. The remove snapshot task completes successfully and you’ll only notice on the VM summary page that it needs disk consolidation.

Pre-vSphere 5, the task would fail with an error about how it couldn’t consolidate the snapshot due to a locked file. This info is now only found in the vmware.log file (and surely vmkernel log files) of the VM in its datastore:

# grep -i lock vmware.log
2012-06-04T08:06:21.069Z| vcpu-0| AIOGNRC: Failed to open ‘/vmfs/volumes/4fb20a9a-1b7f7c20-0363-002481e443c1/SomeVM002/SomeVM002-flat.vmdk’ : Failed to lock the file (400000003) (0x2013).2012-06-04T08:06:21.069Z| vcpu-0| AIOMGR: AIOMgr_OpenWithRetry: Descriptor file ‘/vmfs/volumes/4fb20a9a-1b7f7c20-0363-002481e443c1/SomeVM002/SomeVM002-flat.vmdk’ locked (try 0)2012-06-04T08:06:22.580Z| vcpu-0| DISKLIB-VMFS : “/vmfs/volumes/4fb20a9a-1b7f7c20-0363-002481e443c1/SomeVM002/SomeVM002-flat.vmdk” : failed to open (Failed to lock the file): AIOMgr_Open failed. Type 3
2012-06-04T08:06:22.580Z| vcpu-0| DISKLIB-LINK : “/vmfs/volumes/4fb20a9a-1b7f7c20-0363-002481e443c1/SomeVM002/SomeVM002.vmdk” : failed to open (Failed to lock the file).
2012-06-04T08:06:22.580Z| vcpu-0| DISKLIB-CHAIN : “/vmfs/volumes/4fb20a9a-1b7f7c20-0363-002481e443c1/SomeVM002/SomeVM002.vmdk” : failed to open (Failed to lock the file).
2012-06-04T08:06:22.580Z| vcpu-0| DISKLIB-LIB : Failed to open ‘/vmfs/volumes/4fb20a9a-1b7f7c20-0363-002481e443c1/SomeVM002/SomeVM002.vmdk’ with flags 0x20a Failed to lock the file (16392).
2012-06-04T08:06:22.580Z| vcpu-0| SNAPSHOT:Failed to open disk /vmfs/volumes/4fb20a9a-1b7f7c20-0363-002481e443c1/SomeVM002/SomeVM002.vmdk : Failed to lock the file (16392)
2012-06-04T08:06:22.601Z| vcpu-0| DISK: Failed to open disk for consolidate ‘/vmfs/volumes/4fb20a9a-1b7f7c20-0363-002481e443c1/SomeVM002/SomeVM002-000004.vmdk’ : Failed to lock the file (16392) 53452012-06-04T08:06:22.657Z| vcpu-0| Vix: [675925 vigorCommands.c:577]: VigorSnapshotManagerConsolidateCallback: snapshotErr = Failed to lock the file (5:4008)

Nice, isn’t it? Creating a new snapshot and selecting delete all snapshots will not work because it’s still locked. It will only increase the number of delta files for your VM.

Digging down to the root of the issue

Continue reading