Created attachment 604066 [details] Patch to fix issue When in error case during lockspace creation, libvirt will delete the lockspace. In a clustered environment, this completely broke the lock management and allowed vm's to be started on multiple nodes thereby causing corruption. This was seen by running lsof -p `pidof sanlock` on all nodes. Most of the nodes showed that they were accessing a deleted __LIBVIRT__DISKS__ file. sanlock 26804 root 9u REG 0,29 1048576 393228 /fs0/lock/sanlock/__LIBVIRT__DISKS__ (deleted) Further investigation showed, the node that was not using an unlinked __LIBVIRT__DISKS__ space had in it's log 2012-08-07 19:44:18.029+0000: 18023: error : virLockManagerSanlockSetupLockspace:246 : Unable to add lockspace /fs0/lock/sanlock/__LIBVIRT__DISKS__: Operation now in progress I've attached a patch that will not unlink lockspaces on sanlock daemon errors. Version-Release number of selected component (if applicable): This was reproduced with RHEL6.2 release libvirt-0.9.4-23.el6_2.9
Can you please also post your patch upstream to libvir-list, so it will get reviewed faster?
Will be in 0.10.0: commit ff73c6d3bc60eb6557fedd12f14b8416c81fcda6 Author: Asad Saeed <asad.saeed> Date: Mon Aug 13 13:21:10 2012 -0700 sanlock: don't unlink lockspace if registration fails This is a patch for bug 847848 If registering an existing lockspace with the sanlock daemon returns an error, libvirt should not proceed to unlink the lockspace. Signed-off-by: Asad Saeed <asad.saeed>