Bug 1251190 - No way to adjust io_timeout for sanlock lockspace via libvirt
Summary: No way to adjust io_timeout for sanlock lockspace via libvirt
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libvirt
Version: unspecified
Hardware: All
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Michal Privoznik
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1292984
TreeView+ depends on / blocked
 
Reported: 2015-08-06 16:22 UTC by Konstantin Ryabitsev
Modified: 2015-12-21 08:58 UTC (History)
6 users (show)

Fixed In Version: libvirt-1.2.22
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-18 10:09:11 UTC
Embargoed:


Attachments (Terms of Use)

Description Konstantin Ryabitsev 2015-08-06 16:22:10 UTC
Description of problem:
This is related to bug #843073, which introduced passing io_timeout directly to add_lockspace, ignoring the global io_timeout set by the -o switch. As a result, the io_timeout is always set to the default value of 10 seconds and VMs will be killed after 80 seconds (8 * io_timeout) without any way to adjust this via any configuration file.

This is what happens:

in src/locking/lock_driver_sanlock.c it calls sanlock_add_lockspace (line 337):

    if ((rv = sanlock_add_lockspace(&ls, 0)) < 0) {

Sanlock provides two functions -- one with a way to add a timeout value, and one without (src/client.c:145):

    int sanlock_add_lockspace(struct sanlk_lockspace *ls, uint32_t flags)
    {
        return cmd_lockspace(SM_CMD_ADD_LOCKSPACE, ls, flags, 0);
    }

    int sanlock_add_lockspace_timeout(struct sanlk_lockspace *ls, uint32_t flags, uint32_t io_timeout)
    {
        return cmd_lockspace(SM_CMD_ADD_LOCKSPACE, ls, flags, io_timeout);
    }

Since we call sanlock_add_lockspace, the timeout is hardcoded to 0, which then results in being set to the default value (src/cmd.c:917):

    if (!io_timeout)
        io_timeout = DEFAULT_IO_TIMEOUT;

...which is hardcoded to 10 seconds.

We desperately need a way to adjust lockspace io timeout, otherwise sanlock ALWAYS starts killing VMs after 80 seconds regardless of any configuration settings (like -o).

Suggested mechanism:
1. add io_timeout option to /etc/libvirt/qemu-sanlock.conf with a default setting of 10
2. Call sanlock_add_lockspace_timeout() function instead, passing the timeout specified in the configuration file.


Version-Release number of selected component (if applicable):
0.10.2-54.el6

Comment 3 Michal Privoznik 2015-10-23 11:38:22 UTC
Patch proposed upstream:

https://www.redhat.com/archives/libvir-list/2015-October/msg00717.html

Comment 4 Michal Privoznik 2015-10-27 15:30:39 UTC
Another try:

https://www.redhat.com/archives/libvir-list/2015-October/msg00789.html

Comment 5 Michal Privoznik 2015-11-18 10:09:11 UTC
I've just pushed the patch upstream:

commit bd3e16a3cf89303c3ec5281c818acce418b75f50
Author:     Michal Privoznik <mprivozn>
AuthorDate: Fri Oct 23 13:21:22 2015 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Wed Nov 18 10:56:56 2015 +0100

    locking: Add io_timeout to sanlock
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1251190
    
    So, if domain loses access to storage, sanlock tries to kill it
    after some timeout. So far, the default is 80 seconds. But for
    some scenarios this might not be enough. We should allow users to
    adjust the timeout according to their needs.
    
    Signed-off-by: Michal Privoznik <mprivozn>


v1.2.21-76-gbd3e16a


Note You need to log in before you can comment on or make changes to this bug.