Bug 1251190

Summary: No way to adjust io_timeout for sanlock lockspace via libvirt
Product: [Community] Virtualization Tools Reporter: Konstantin Ryabitsev <icon>
Component: libvirtAssignee: Michal Privoznik <mprivozn>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: dyuan, herlo1, rbalakri, rday, xuzhang, yanyang
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-1.2.22 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-18 10:09:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1292984    

Description Konstantin Ryabitsev 2015-08-06 16:22:10 UTC
Description of problem:
This is related to bug #843073, which introduced passing io_timeout directly to add_lockspace, ignoring the global io_timeout set by the -o switch. As a result, the io_timeout is always set to the default value of 10 seconds and VMs will be killed after 80 seconds (8 * io_timeout) without any way to adjust this via any configuration file.

This is what happens:

in src/locking/lock_driver_sanlock.c it calls sanlock_add_lockspace (line 337):

    if ((rv = sanlock_add_lockspace(&ls, 0)) < 0) {

Sanlock provides two functions -- one with a way to add a timeout value, and one without (src/client.c:145):

    int sanlock_add_lockspace(struct sanlk_lockspace *ls, uint32_t flags)
    {
        return cmd_lockspace(SM_CMD_ADD_LOCKSPACE, ls, flags, 0);
    }

    int sanlock_add_lockspace_timeout(struct sanlk_lockspace *ls, uint32_t flags, uint32_t io_timeout)
    {
        return cmd_lockspace(SM_CMD_ADD_LOCKSPACE, ls, flags, io_timeout);
    }

Since we call sanlock_add_lockspace, the timeout is hardcoded to 0, which then results in being set to the default value (src/cmd.c:917):

    if (!io_timeout)
        io_timeout = DEFAULT_IO_TIMEOUT;

...which is hardcoded to 10 seconds.

We desperately need a way to adjust lockspace io timeout, otherwise sanlock ALWAYS starts killing VMs after 80 seconds regardless of any configuration settings (like -o).

Suggested mechanism:
1. add io_timeout option to /etc/libvirt/qemu-sanlock.conf with a default setting of 10
2. Call sanlock_add_lockspace_timeout() function instead, passing the timeout specified in the configuration file.


Version-Release number of selected component (if applicable):
0.10.2-54.el6

Comment 3 Michal Privoznik 2015-10-23 11:38:22 UTC
Patch proposed upstream:

https://www.redhat.com/archives/libvir-list/2015-October/msg00717.html

Comment 4 Michal Privoznik 2015-10-27 15:30:39 UTC
Another try:

https://www.redhat.com/archives/libvir-list/2015-October/msg00789.html

Comment 5 Michal Privoznik 2015-11-18 10:09:11 UTC
I've just pushed the patch upstream:

commit bd3e16a3cf89303c3ec5281c818acce418b75f50
Author:     Michal Privoznik <mprivozn>
AuthorDate: Fri Oct 23 13:21:22 2015 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Wed Nov 18 10:56:56 2015 +0100

    locking: Add io_timeout to sanlock
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1251190
    
    So, if domain loses access to storage, sanlock tries to kill it
    after some timeout. So far, the default is 80 seconds. But for
    some scenarios this might not be enough. We should allow users to
    adjust the timeout according to their needs.
    
    Signed-off-by: Michal Privoznik <mprivozn>


v1.2.21-76-gbd3e16a