Description of problem: This is related to bug #843073, which introduced passing io_timeout directly to add_lockspace, ignoring the global io_timeout set by the -o switch. As a result, the io_timeout is always set to the default value of 10 seconds and VMs will be killed after 80 seconds (8 * io_timeout) without any way to adjust this via any configuration file. This is what happens: in src/locking/lock_driver_sanlock.c it calls sanlock_add_lockspace (line 337): if ((rv = sanlock_add_lockspace(&ls, 0)) < 0) { Sanlock provides two functions -- one with a way to add a timeout value, and one without (src/client.c:145): int sanlock_add_lockspace(struct sanlk_lockspace *ls, uint32_t flags) { return cmd_lockspace(SM_CMD_ADD_LOCKSPACE, ls, flags, 0); } int sanlock_add_lockspace_timeout(struct sanlk_lockspace *ls, uint32_t flags, uint32_t io_timeout) { return cmd_lockspace(SM_CMD_ADD_LOCKSPACE, ls, flags, io_timeout); } Since we call sanlock_add_lockspace, the timeout is hardcoded to 0, which then results in being set to the default value (src/cmd.c:917): if (!io_timeout) io_timeout = DEFAULT_IO_TIMEOUT; ...which is hardcoded to 10 seconds. We desperately need a way to adjust lockspace io timeout, otherwise sanlock ALWAYS starts killing VMs after 80 seconds regardless of any configuration settings (like -o). Suggested mechanism: 1. add io_timeout option to /etc/libvirt/qemu-sanlock.conf with a default setting of 10 2. Call sanlock_add_lockspace_timeout() function instead, passing the timeout specified in the configuration file. Version-Release number of selected component (if applicable): 0.10.2-54.el6
Patch proposed upstream: https://www.redhat.com/archives/libvir-list/2015-October/msg00717.html
Another try: https://www.redhat.com/archives/libvir-list/2015-October/msg00789.html
I've just pushed the patch upstream: commit bd3e16a3cf89303c3ec5281c818acce418b75f50 Author: Michal Privoznik <mprivozn> AuthorDate: Fri Oct 23 13:21:22 2015 +0200 Commit: Michal Privoznik <mprivozn> CommitDate: Wed Nov 18 10:56:56 2015 +0100 locking: Add io_timeout to sanlock https://bugzilla.redhat.com/show_bug.cgi?id=1251190 So, if domain loses access to storage, sanlock tries to kill it after some timeout. So far, the default is 80 seconds. But for some scenarios this might not be enough. We should allow users to adjust the timeout according to their needs. Signed-off-by: Michal Privoznik <mprivozn> v1.2.21-76-gbd3e16a