Bug 1292984 - Cannot adjust io_timeout for sanlock lockspace via libvirt
Summary: Cannot adjust io_timeout for sanlock lockspace via libvirt
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Michal Privoznik
QA Contact: Han Han
Jiri Herrmann
URL:
Whiteboard:
Depends On: 1251190
Blocks: 1395265 1401400 1420851
TreeView+ depends on / blocked
 
Reported: 2015-12-18 23:24 UTC by nijin ashok
Modified: 2023-09-07 18:43 UTC (History)
14 users (show)

Fixed In Version: libvirt-3.0.0-1.el7
Doc Type: Release Note
Doc Text:
Adjustable I/O timeout for *sanlock* This update adds the "io_timeout" parameter to the `/etc/libvirt/qemu-sanlock.conf` file. Setting a value for this parameter determines the maximum timeout that the *sanlock* service waits for if the lockspace becomes unavailable while *sanlock* is active. If this timeout is exceeded, *sanlock* aborts all QEMU processes associated with the guests that *sanlock* is locking. The default value of "io_timeout" is 80 seconds.
Clone Of:
Environment:
Last Closed: 2019-03-07 08:40:09 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
The log of sanlock client log_dump (54.94 KB, text/plain)
2016-03-30 11:18 UTC, Han Han
no flags Details
libvirtd log and scripts (3.54 MB, application/x-gzip)
2016-04-29 11:33 UTC, Han Han
no flags Details
log and reproducing script (941 bytes, application/x-gzip)
2017-03-15 10:14 UTC, Han Han
no flags Details
The script of reproducing (2.35 KB, text/plain)
2017-04-25 06:28 UTC, Han Han
no flags Details
The script of reproducing for comment24 (1.08 KB, application/x-gzip)
2017-04-26 09:03 UTC, Han Han
no flags Details
Scripts and logs (290.22 KB, application/x-gzip)
2017-06-12 08:39 UTC, Han Han
no flags Details
The log of lockspace already existing (387.80 KB, application/x-gzip)
2017-08-21 02:36 UTC, Han Han
no flags Details
The log of no lockspace file (249.55 KB, application/x-gzip)
2017-08-21 05:29 UTC, Han Han
no flags Details

Description nijin ashok 2015-12-18 23:24:28 UTC
Description of problem:

Currently the libvirt ignores the global io_timeout set by the -o switch in sanlock. So the VM will be killed after default 80 seconds by ignoring the global io_timeout. 

Need a procedure to control the io_timeout from libvirtd configuration file.

Version-Release number of selected component (if applicable):

Red Hat Enterprise Linux 7.2
libvirt-1.2.17-13

How reproducible:

100%

Steps to Reproduce:

1. Configure io_timeout  greater than 10 seconds

2. When the sanlock loss the lockspace , the qemu process is killed after default 80 seconds .

Actual results:

Libvirt don't have any options to adjust io_timeout

Expected results:

Libvirt should have option to adjust io_timeout as it is helpful in some scenario

Additional info:

This is already fixed in upstream according to the BUG #1251190

Comment 3 Han Han 2016-03-30 11:16:21 UTC
I tested on libvirt-1.3.2-1.el7.x86_64, but the result didn't meet the expect.
I did as following steps:
1. Setup sanlock environment
# cat /etc/libvirt/qemu.conf
lock_manager = "sanlock"
# cat /etc/libvirt/qemu-sanlock.conf
user = "sanlock"
group = "sanlock"
host_id = 1 
auto_disk_leases = 1 
disk_lease_dir = "/var/lib/libvirt/sanlock"
io_timeout = 40
# systemctl restart wdmd; systemctl restart sanlock; systemctl restart libvirtd  
2. Check io_timeout
# for i in {1..5};do virsh start g1; sleep 2; /usr/bin/time -f %e ./io_timeout;rm /var/lib/libvirt/sanlock/* -f;systemctl restart libvirtd;done
# cat ./io_timeout
#!/bin/bash
shred /var/lib/libvirt/sanlock/*
while true; do
    virsh list|grep g1 1>/dev/null
    if [ $? -ne 0 ]; then
        break
    fi
done
The test result is:
57.54
57.60
58.16
56.50
58.16

But the results don't match the io_timeout configuration.

Comment 4 Han Han 2016-03-30 11:18:17 UTC
Created attachment 1141692 [details]
The log of sanlock client log_dump

Comment 5 Michal Privoznik 2016-04-29 08:58:37 UTC
Interesting. Can you upload the daemon logs too please?

Comment 6 Han Han 2016-04-29 11:33:43 UTC
Created attachment 1152235 [details]
libvirtd log and scripts

Since the scripts have virsh list loops, there is many 'virsh list' logs in the libvirtd log.

Comment 8 Michal Privoznik 2016-09-09 13:35:37 UTC
I've no idea why this doesn't work. Libvirt is calling the right API:

Breakpoint 1, sanlock_add_lockspace_timeout (ls=ls@entry=0x7fffab2316e0, flags=flags@entry=0, io_timeout=40) at client.c:180
180             return cmd_lockspace(SM_CMD_ADD_LOCKSPACE, ls, flags, io_timeout);
Missing separate debuginfos, use: debuginfo-install augeas-libs-1.4.0-2.el7.x86_64 boost-system-1.53.0-26.el7.x86_64 boost-thread-1.53.0-26.el7.x86_64 bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.166-2.el7.x86_64 elfutils-libs-0.166-2.el7.x86_64 fuse-libs-2.9.2-7.el7.x86_64 glusterfs-api-3.7.9-12.el7.x86_64 glusterfs-libs-3.7.9-12.el7.x86_64 gmp-6.0.0-12.el7_1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.14.1-26.el7.x86_64 libacl-2.2.51-12.el7.x86_64 libattr-2.4.46-12.el7.x86_64 libcap-2.22-8.el7.x86_64 libcom_err-1.42.9-9.el7.x86_64 libdb-5.3.21-19.el7.x86_64 libffi-3.0.13-18.el7.x86_64 libgcc-4.8.5-11.el7.x86_64 libgcrypt-1.5.3-12.el7_1.1.x86_64 libgpg-error-1.12-3.el7.x86_64 libidn-1.28-4.el7.x86_64 libpcap-1.5.3-8.el7.x86_64 libpciaccess-0.13.4-2.el7.x86_64 librados2-0.94.5-1.el7.x86_64 librbd1-0.94.5-1.el7.x86_64 libsepol-2.5-6.el7.x86_64 libssh2-1.4.3-10.el7_2.1.x86_64 libstdc++-4.8.5-11.el7.x86_64 libtasn1-3.8-3.el7.x86_64 libxslt-1.1.28-5.el7.x86_64 netcf-libs-0.2.8-2.el7.x86_64 nettle-2.7.1-8.el7.x86_64 nspr-4.11.0-1.el7_2.x86_64 nss-3.21.0-17.el7.x86_64 nss-softokn-freebl-3.16.2.3-14.4.el7.x86_64 nss-util-3.21.0-2.2.el7_2.x86_64 openldap-2.4.40-13.el7.x86_64 openssl-libs-1.0.1e-58.el7.x86_64 p11-kit-0.20.7-3.el7.x86_64 pcre-8.32-15.el7_2.1.x86_64 systemd-libs-219-29.el7.x86_64 trousers-0.3.13-1.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0  sanlock_add_lockspace_timeout (ls=ls@entry=0x7fffab2316e0, flags=flags@entry=0, io_timeout=40) at client.c:180                                                                                                                                                             
#1  0x00007fffaa60e84c in virLockManagerSanlockSetupLockspace () at locking/lock_driver_sanlock.c:347                                                                                                                                                                          
#2  0x00007fffaa60f4fd in virLockManagerSanlockInit (version=<optimized out>, configFile=<optimized out>, flags=<optimized out>) at locking/lock_driver_sanlock.c:440                                                                                                          
#3  0x00007ffff73fcd44 in virLockManagerPluginNew (name=0x7fff9015cdb0 "sanlock", driverName=driverName@entry=0x7fffb2be28e8 "qemu", configDir=<optimized out>, flags=flags@entry=0) at locking/lock_manager.c:174
#4  0x00007fffb2b96bd6 in qemuStateInitialize (privileged=true, callback=0x55555556e4f0 <daemonInhibitCallback>, opaque=<optimized out>) at qemu/qemu_driver.c:712
#5  0x00007ffff73c2d1f in virStateInitialize (privileged=true, callback=callback@entry=0x55555556e4f0 <daemonInhibitCallback>, opaque=opaque@entry=0x5555557fceb0) at libvirt.c:770
#6  0x000055555556e54b in daemonRunStateInit (opaque=0x5555557fceb0) at libvirtd.c:959
#7  0x00007ffff732fe62 in virThreadHelper (data=<optimized out>) at util/virthread.c:206
#8  0x00007ffff4953dc5 in start_thread (arg=0x7fffab232700) at pthread_create.c:308
#9  0x00007ffff468273d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113


Look at frame #0: sanlock_add_lockspace_timeout (ls=ls@entry=0x7fffab2316e0, flags=flags@entry=0, io_timeout=40).

I think this is a bug in sanlock. David, can you provide more insight on this please?

Comment 9 David Teigland 2016-09-09 14:52:17 UTC
sanlock_add_lockspace() uses the existing io_timeout recorded in the lockspace when acquiring the delta lease within add_lockspace.  In your case, the existing io_timeout is 10 sec, so add_lockspace(40) will use 10 not 40.  After add_lockspace completes, 40 will be recorded as the existing io_timeout, and subsequent calls to add_lockspace will use the existing 40 not 10.  So, if you run your test a second time, it should use 40.

The reason that the initial io_timeout is 10 sec is because you are using sanlock_init() to initialize the lockspace.  This is the old function for creating a lockspace which does not support a user-specified io_timeout.  The new function for creating a lockspace is sanlock_write_lockspace() which does support specifying an io_timeout.  If you switch to sanlock_write_lockspace() you can use 40 sec from the start.

Comment 10 Han Han 2016-09-13 01:36:11 UTC
Hi Michal, 
From comment9, do we need to apply new patch for the bugfix? Is there any risk for rhel7.3? Pls reply ASAP.

Comment 12 Michal Privoznik 2016-09-15 14:36:53 UTC
Patches proposed upstream:

https://www.redhat.com/archives/libvir-list/2016-September/msg00512.html

Comment 14 Michal Privoznik 2016-09-29 15:05:30 UTC
I've just pushed the patches upstream:

commit 5fe66ea3bfdae386064f782b9128f45f2ac37f64
Author:     Michal Privoznik <mprivozn>
AuthorDate: Thu Sep 15 13:05:34 2016 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Thu Sep 29 16:59:11 2016 +0200

    sanlock: Properly init io_timeout
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1292984
    
    Hold on to your hats, because this is gonna be wild.
    
    In bd3e16a3 I've tried to expose sanlock io_timeout. What I had
    not realized (because there is like no documentation for sanlock
    at all) was very unusual way their APIs work. Basically, what we
    do currently is:
    
        sanlock_add_lockspace_timeout(&ls, io_timeout);
    
    which adds a lockspace to sanlock daemon. One would expect that
    io_timeout sets the io_timeout for it. Nah! That's where you are
    completely off the tracks. It sets timeout for next lockspace you
    will probably add later. Therefore:
    
       sanlock_add_lockspace_timeout(&ls, io_timeout = 10);
       /* adds new lockspace with default io_timeout */
    
       sanlock_add_lockspace_timeout(&ls, io_timeout = 20);
       /* adds new lockspace with io_timeout = 10 */
    
       sanlock_add_lockspace_timeout(&ls, io_timeout = 40);
       /* adds new lockspace with io_timeout = 20 */
    
    And so on. You get the picture.
    Fortunately, we don't allow setting io_timeout per domain or per
    domain disk. So we just need to set the default used in the very
    first step and hope for the best (as all the io_timeout-s used
    later will have the same value).
    
    Signed-off-by: Michal Privoznik <mprivozn>

commit 2bca7cec0ba9e861edb245f20d99d5b6fd22f760
Author:     Michal Privoznik <mprivozn>
AuthorDate: Thu Sep 15 10:45:26 2016 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Thu Sep 29 16:59:11 2016 +0200

    m4: Check for sanlock_write_lockspace
    
    Currently, we are checking for sanlock_add_lockspace_timeout
    which is good for now. But in a subsequent patch we are going to
    use sanlock_write_lockspace (which sets an initial value for io
    timeout for sanlock). Now, there is no reason to check for both
    functions in sanlock library as the sanlock_write_lockspace was
    introduced in 2.7 release and the one we are currently checking
    for in the 2.5 release. Therefore it is safe to assume presence
    of sanlock_add_lockspace_timeout when sanlock_write_lockspace
    is detected.
    
    Moreover, the macro for conditional compilation is renamed to
    HAVE_SANLOCK_IO_TIMEOUT (as it now encapsulates two functions).
    
    Signed-off-by: Michal Privoznik <mprivozn>

commit 78da4296a614c5d8a777ea2de8380ea11208e59b
Author:     Michal Privoznik <mprivozn>
AuthorDate: Thu Sep 15 13:18:13 2016 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Thu Sep 29 16:59:11 2016 +0200

    lock_driver_sanlock: Avoid global driver variable whenever possible
    
    Global variables are bad, we should avoid using them.
    
    Signed-off-by: Michal Privoznik <mprivozn>

v2.3.0-rc1-13-g5fe66ea

Comment 17 Han Han 2017-03-15 10:13:00 UTC
Hi Michal, I found it didn't work on current libvirt and sanlock: libvirt-3.1.0-2.virtcov.el7.x86_64 sanlock-3.4.0-1.el7.x86_64
I test it with following steps:
1. Set configurations for libvirt sanlock environment:
qemu.conf: lock_manager = "sanlock"
qemu-sanlock.conf: user="sanlock"                                               
qemu-sanlock.conf: group="sanlock"                                              
qemu-sanlock.conf: host_id=1                                                    
qemu-sanlock.conf: auto_disk_leases=1                                           
qemu-sanlock.conf: disk_lease_dir="/var/lib/libvirt/sanlock"                    
qemu-sanlock.conf: io_timeout=50

2. Restart sanlock and libvirtd

3. Start vm

4. Shred and remove lockspace file.

5. Wait for VM stop

But VM didn't stop.
Pls check if there are problems here.

Comment 18 Han Han 2017-03-15 10:14:22 UTC
Created attachment 1263273 [details]
log and reproducing script

Comment 19 Michal Privoznik 2017-03-16 17:07:39 UTC
(In reply to Han Han from comment #17)
> Hi Michal, I found it didn't work on current libvirt and sanlock:
> libvirt-3.1.0-2.virtcov.el7.x86_64 sanlock-3.4.0-1.el7.x86_64
> I test it with following steps:
> 1. Set configurations for libvirt sanlock environment:
> qemu.conf: lock_manager = "sanlock"
> qemu-sanlock.conf: user="sanlock"                                           
> 
> qemu-sanlock.conf: group="sanlock"                                          
> 
> qemu-sanlock.conf: host_id=1                                                
> 
> qemu-sanlock.conf: auto_disk_leases=1                                       
> 
> qemu-sanlock.conf: disk_lease_dir="/var/lib/libvirt/sanlock"                
> 
> qemu-sanlock.conf: io_timeout=50
> 
> 2. Restart sanlock and libvirtd
> 
> 3. Start vm
> 
> 4. Shred and remove lockspace file.
> 
> 5. Wait for VM stop
> 
> But VM didn't stop.
> Pls check if there are problems here.

I can see VM getting killed in my scenario. Truth is, it takes slightly more than those 50 seconds. But that might be a sanlock bug. The VM is definitely getting killed. It may take some time for sanlock to realize that the lockspace is gone.

Comment 20 David Teigland 2017-03-16 17:39:48 UTC
With io_timeout=50, after destroying a lockspace that is in use, you should expect sanlock to kill the VM after id_renewal_fail_seconds = 8 * 50, or 400 seconds.  The 400 is actually a 50 second range because of the inherent scew between renewal and anything else you do, but I can't recall if the 50 second range extents before or after 400.

Comment 21 Han Han 2017-04-25 06:28:14 UTC
Created attachment 1273823 [details]
The script of reproducing

Hi David, I find the iotimeout in lockspace could be updated.
In the script, I initialize the locksapce via:
sanlock.write_lockspace(LOCKSPACE_NAME, disk, iotimeout=IOTIMEOUT)

And check the iotimeout via:
sanlock.read_lockspace(disk)

I set iotimeout as 80 but the result is:
Lockspace info: '{'lockspace': 'lockspace1', 'iotimeout': 10}'

I also get the similar result when invoke sanlock via libvirt. After setting io_timeout and restart libvirtd, the iotimeout value of sanlock.read_lockspace('/var/lib/libvirt/sanlock/__LIBVIRT__DISKS__') doesn't update.

Comment 22 Han Han 2017-04-25 06:29:05 UTC
Please check if it is an issue in sanlock.

Comment 23 David Teigland 2017-04-25 14:49:14 UTC
You also need to pass the io timeout to add_lockspace().

# sanlock direct init -o 30 -s test:0:/dev/loop0:0
# sanlock add_lockspace -o 30 -s test:1:/dev/loop0:0
# sanlock client read -s test:1:/dev/loop0:0
s test:1:/dev/loop0:0
io_timeout 30

Comment 24 Han Han 2017-04-26 08:56:13 UTC
Hi Michal, I find libvirtd doesn't change iotimeout value after resetting this value.
I did it with following steps:
1. Prepare env as comment17. Set io_timeout as 10. Restart libvirtd
2. Execute 'virsh list' and then start&destroy VM.
3. Check the iotimeout value with 'sanlock.read_lockspace' and 'sanlock.get_hosts'
4. Reset io_timeout as 20. Repeat step1~2.
5. Check the iotimeout value with 'sanlock.read_lockspace' and 'sanlock.get_hosts'
6. Compare the results in step3 and step5.

The result of step3:
{'lockspace': '__LIBVIRT__DISKS__', 'iotimeout': 10}
[{'generation': 2, 'host_id': 1, 'flags': 3, 'io_timeout': 10, 'timestamp': 24602}]

The result of step5:
{'lockspace': '__LIBVIRT__DISKS__', 'iotimeout': 10}
[{'generation': 2, 'host_id': 1, 'flags': 3, 'io_timeout': 10, 'timestamp': 24602}]

As above, the iotimeout value is not updated.
Pls check if there is problem here.

Comment 25 Han Han 2017-04-26 09:03:11 UTC
Created attachment 1274129 [details]
The script of reproducing for comment24

Please install augeas sanlock-python first and the run iotimeout.sh to reproduce the issue in comment24. My result is in iotimeout.log.

Comment 26 Michal Privoznik 2017-05-02 12:13:36 UTC
(In reply to Han Han from comment #24)
> Hi Michal, I find libvirtd doesn't change iotimeout value after resetting
> this value.
> I did it with following steps:
> 1. Prepare env as comment17. Set io_timeout as 10. Restart libvirtd
> 2. Execute 'virsh list' and then start&destroy VM.
> 3. Check the iotimeout value with 'sanlock.read_lockspace' and
> 'sanlock.get_hosts'
> 4. Reset io_timeout as 20. Repeat step1~2.
> 5. Check the iotimeout value with 'sanlock.read_lockspace' and
> 'sanlock.get_hosts'
> 6. Compare the results in step3 and step5.

That's because we are not clearing any existing lockspace on daemon restart. Since detecting io_timeout change is not going to be trivial (is it even safe to just destroy it and create a new one instead? or is there an API for changing the io_timeout?), I vote for just documenting that. Any intelligence we try to build here will have flaws and would not fit yet another scenario that nobody uses.

Comment 27 Han Han 2017-06-12 08:36:40 UTC
Hi Michal, I find it doesn't work on libvirt-3.2.0-9.el7.x86_64 qemu-kvm-rhev-2.9.0-9.el7.x86_64 sanlock-3.5.0-1.el7.x86_64

1. In case of the denial from selinux, set selinux as Permissive
# setenforce 0

2. Follow the configurations as step1 in comment17. Then start sanlock and libvirtd service. Then start VM.
# systemctl start sanlock
# systemctl restart libvirtd
# virsh start V
Domain V started

3. Use strace to monitor the signals to qemu-kvm
# strace -t -p `pgrep qemu` -e trace=signal -o qemu.log

4. Shred lockspace and resource files to trigger VM being killed
# date;shred /var/lib/libvirt/sanlock/*                 
2017年 06月 12日 星期一 16:17:04 CST

Wait for long time, but VM was not killed.


To check if iotimeout works for sanlock only. Set iotimeout to 80
# python /tmp/example.py
Creating the sanlock disk
Lockspace is /tmp/tmpV9hlAJ
Registering to sanlock
Initializing 'lockspace1'
Initializing 'resource1' on 'lockspace1'
Acquiring the id '2' on 'lockspace1'
Acquiring 'resource1' on 'lockspace1'
Trying to get lockspace 'lockspace1' hosts
Running main process

Use strace to monitor the process
# strace -t -p `pgrep -f example` -e trace=signal                                                                                              
strace: Process 5832 attached

Shred the lockspace file:
# shred /tmp/tmpV9hlAJ;date
2017年 06月 12日 星期一 16:22:53 CST
After 80 seconds, the process was killed
16:24:13 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=9776, si_uid=0} ---
16:24:13 rt_sigreturn({mask=[]})        = -1 EINTR (Interrupted system call)
16:24:13 +++ killed by SIGKILL +++

So sanlock works well.

Please check if there are problems in libvirt.

Comment 28 Han Han 2017-06-12 08:39:44 UTC
Created attachment 1286969 [details]
Scripts and logs

Notice: the timezone in libvirtd.log is UTC. Other's timezone is UTC+8.

Comment 31 Michal Privoznik 2017-08-18 12:02:52 UTC
Han,

can you please retest with this package? It's basically current upstream with some more debug messages placed right after all sanlock_* calls. Also, can you please try two scenarios: first one is with lockspace already existing; second one is you remove the lockspace after setting io_timeout (before starting libvirtd). I wanna see how this problem behaves when there is a lockspace existing and when libvirt creates one. IIUC, the latter works, right?

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=13871678

If that expires, I've downloaded x86_64 rpms here:

https://mprivozn.fedorapeople.org/sanlock/

In both cases attach full debug logs please.

Comment 32 Han Han 2017-08-21 02:36:23 UTC
Created attachment 1316071 [details]
The  log of lockspace already existing

I tested on libvirt-3.7.0-2.el7sanlockDebug.x86_64 with following steps:
1. Set libvirt sanlock environment without io_timeout conf(To generate a existing /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__ file)
2. Set io_timeout=60 then restart libvirt. Check lockspace file:
In [4]: sanlock.read_lockspace('/var/lib/libvirt/sanlock/__LIBVIRT__DISKS__')
Out[4]: {'iotimeout': 10, 'lockspace': '__LIBVIRT__DISKS__'}

3. Start a VM. Shred the __LIBVIRT__DISKS__ file. Wait for VM killed.But VM is not killed after a long time.
4. Check the libvirtd.log and sanlock client log
Error message of sanlock client log:
2017-08-21 10:09:38 8096342 [9783]: cmd_read_lockspace 3,15 1 /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__:0
2017-08-21 10:09:38 8096342 [9783]: verify_leader 1 wrong magic 1edd59e2 /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__
2017-08-21 10:09:38 8096342 [9783]: leader1 read_lockspace error -223 lockspace 2i;V<85>QÈëLÓõ<95>«5^E<96><8c>FÂêX<85>Mó<8e>Ê[ ÛÍ^UÁi¥æ¡A¿d!­Ú<87>N&Ø÷. host_id 1
2017-08-21 10:09:38 8096342 [9783]: leader2 path /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__ offset 0
2017-08-21 10:09:38 8096342 [9783]: leader3 m 1edd59e2 v 99fc9e96 ss 445078408 nh 17073036744796602517 mh 2574576319209247394 oi 5664813464191896722 og 17535034264780123556 lv 12875562338973879744
2017-08-21 10:09:38 8096342 [9783]: leader4 sn 2i;V<85>QÈëLÓõ<95>«5^E<96><8c>FÂêX<85>Mó<8e>Ê[ ÛÍ^UÁi¥æ¡A¿d!­Ú<87>N&Ø÷. rn 8>}Î<84>^HN^X°sý#Ûdôó^F#ý<93>^V<87>¤ìbä ts 16035220541361229061 cs bb96a06e
2017-08-21 10:09:38 8096342 [9783]: cmd_read_lockspace 3,15 done -223
2017-08-21 10:09:38 8096342 [9775]: ci 3 fd 15 pid -1 recv errno 104

The error message of libvirtd.log
017-08-21 02:07:37.425+0000: 5472: debug : virLockManagerSanlockInit:445 : version=1000000 configFile=/etc/libvirt/qemu-sanlock.conf flags=0
2017-08-21 02:07:37.425+0000: 5472: debug : virConfReadFile:783 : filename=/etc/libvirt/qemu-sanlock.conf
2017-08-21 02:07:37.425+0000: 5472: debug : virFileClose:110 : Closed fd 23
2017-08-21 02:07:37.425+0000: 5472: debug : virConfAddEntry:243 : Add entry user 0x7fd2900f9bf0
2017-08-21 02:07:37.425+0000: 5472: debug : virConfAddEntry:243 : Add entry group 0x7fd2900f9ce0
2017-08-21 02:07:37.425+0000: 5472: debug : virConfAddEntry:243 : Add entry host_id 0x7fd2900f9db0
2017-08-21 02:07:37.425+0000: 5472: debug : virConfAddEntry:243 : Add entry auto_disk_leases 0x7fd2900f9e80
2017-08-21 02:07:37.425+0000: 5472: debug : virConfAddEntry:243 : Add entry disk_lease_dir 0x7fd2900fb5d0
2017-08-21 02:07:37.425+0000: 5472: debug : virConfAddEntry:243 : Add entry io_timeout 0x7fd2900fb650
2017-08-21 02:07:37.425+0000: 5472: debug : virConfGetValueBool:1066 : Get value bool 0x7fd2900f9e80 2
2017-08-21 02:07:37.425+0000: 5472: debug : virConfGetValueString:937 : Get value string 0x7fd2900fb5d0 3
2017-08-21 02:07:37.425+0000: 5472: debug : virConfGetValueUInt:1163 : Get value uint 0x7fd2900f9db0 2
2017-08-21 02:07:37.425+0000: 5472: debug : virConfGetValueBool:1066 : Get value bool (nil) 0
2017-08-21 02:07:37.425+0000: 5472: debug : virConfGetValueUInt:1163 : Get value uint 0x7fd2900fb650 2
2017-08-21 02:07:37.425+0000: 5472: debug : virConfGetValueString:937 : Get value string 0x7fd2900f9bf0 3
2017-08-21 02:07:37.425+0000: 5472: debug : virConfGetValueString:937 : Get value string 0x7fd2900f9ce0 3
2017-08-21 02:07:37.425+0000: 5472: debug : virLockManagerSanlockSetupLockspace:362 : sanlock_read_lockspace(tmp=0x7fd2a37fd6b0, io_timeout=0) = -22
2017-08-21 02:07:37.453+0000: 5472: debug : virLockManagerSanlockSetupLockspace:376 : sanlock_add_lockspace_timeout(ls=0x7fd2a37fd260, io_timeout=60) = -17
2017-08-21 02:07:37.453+0000: 5472: debug : virLockManagerSanlockSetupLockspace:417 : Lockspace /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__ is already registered

For the full logs, please check the attachment.

Comment 33 Han Han 2017-08-21 05:29:55 UTC
Created attachment 1316091 [details]
The  log of no lockspace file

Hi Michal, the comment32 is the 1st scenario. For the 2nd scenario, I did as following
1. Delete /var/lib/libvirt/sanlock/*
2. Set libvirt sanlock confs, the io_timeout is 60. Restart libvirtd. 
3. Check io_time value in lockspace file:
In [6]: sanlock.read_lockspace('/var/lib/libvirt/sanlock/__LIBVIRT__DISKS__')
Out[6]: {'iotimeout': 60, 'lockspace': '__LIBVIRT__DISKS__'}

4. Start an VM. Shred the lockspace file. Wait the vm down.

Got following log about sanlock in libvirtd.log
2017-08-21 02:41:57.703+0000: 9335: debug : virLockManagerSanlockSetupLockspace:250 : Lockspace /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__ does not yet exist
2017-08-21 02:41:57.705+0000: 9335: debug : virLockManagerSanlockSetupLockspace:288 : sanlock_align(host_id_disk=0x7fa1c67fb2a0) = 1048576
2017-08-21 02:41:57.706+0000: 9335: debug : virFileClose:110 : Closed fd 23
2017-08-21 02:41:57.751+0000: 9335: debug : virLockManagerSanlockInitLockspace:192 : sanlock_write_lockspace(ls=0x7fa1c67fb260, max_hosts=0, lockspaceFlags=0, io_timeout=60) = 0
2017-08-21 02:41:57.751+0000: 9335: debug : virLockManagerSanlockSetupLockspace:335 : Lockspace /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__ has been initialized
2017-08-21 02:41:57.751+0000: 9335: debug : virLockManagerSanlockSetupLockspace:376 : sanlock_add_lockspace_timeout(ls=0x7fa1c67fb260, io_timeout=60) = -17
2017-08-21 02:41:57.751+0000: 9335: debug : virLockManagerSanlockSetupLockspace:417 : Lockspace /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__ is already registered

Comment 34 Michal Privoznik 2017-08-31 10:06:26 UTC
David,

So I've put some debug messages into the libvirt code to see what sanlock APIs are called with what arguments. Here's the interesting snippet from the part when libvirt creates the lockspace:

2017-08-21 02:41:57.703+0000: 9335: debug : virLockManagerSanlockSetupLockspace:250 : Lockspace /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__ does not yet exist
2017-08-21 02:41:57.705+0000: 9335: debug : virLockManagerSanlockSetupLockspace:288 : sanlock_align(host_id_disk=0x7fa1c67fb2a0) = 1048576
2017-08-21 02:41:57.706+0000: 9335: debug : virFileClose:110 : Closed fd 23
2017-08-21 02:41:57.751+0000: 9335: debug : virLockManagerSanlockInitLockspace:192 : sanlock_write_lockspace(ls=0x7fa1c67fb260, max_hosts=0, lockspaceFlags=0, io_timeout=60) = 0
2017-08-21 02:41:57.751+0000: 9335: debug : virLockManagerSanlockSetupLockspace:335 : Lockspace /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__ has been initialized
2017-08-21 02:41:57.751+0000: 9335: debug : virLockManagerSanlockSetupLockspace:376 : sanlock_add_lockspace_timeout(ls=0x7fa1c67fb260, io_timeout=60) = -17
2017-08-21 02:41:57.751+0000: 9335: debug : virLockManagerSanlockSetupLockspace:417 : Lockspace /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__ is already registered


The corresponding part of sanlock log (you can find all logs in comment 33):

2017-06-12 09:23:35 2045579 [9775]: sanlock daemon started 3.5.0 host 5d1289e1-015f-4657-a3b1-c2e66098f174.iscsi-serv
2017-06-12 09:23:35 2045579 [9775]: set scheduler RR|RESET_ON_FORK priority 99 failed: Operation not permitted
2017-06-12 09:24:14 2045619 [9783]: cmd_align 2,10
2017-06-12 09:24:14 2045619 [9783]: cmd_align 2,10 done 1048576
2017-06-12 09:24:14 2045619 [9782]: cmd_write_lockspace 2,10 __LIBVIRT__DISKS__:0:/var/lib/libvirt/sanlock/__LIBVIRT__DISKS__:0
2017-06-12 09:24:15 2045619 [9782]: cmd_write_lockspace 2,10 done 0
2017-06-12 09:24:15 2045619 [9783]: cmd_add_lockspace 2,10 __LIBVIRT__DISKS__:1:/var/lib/libvirt/sanlock/__LIBVIRT__DISKS__:0 flags 0 timeout 50

I can still see that timeout is'50' even though libvirt sets io_timeout to 60. Is there something that is doing libvirt wrong?

Comment 35 David Teigland 2017-08-31 16:15:46 UTC
Aren't you looking at old log messages?

This libvirt message from 2017-08-21 at 41 past the hour:

2017-08-21 02:41:57.703+0000: 9335: debug : virLockManagerSanlockSetupLockspace:250 : Lockspace /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__ does not yet exist

corresponds to this sanlock message at the same time which shows the 60 second timeout:

2017-08-21 10:41:57 8098282 [9782]: cmd_add_lockspace 2,10 __LIBVIRT__DISKS__:1:/var/lib/libvirt/sanlock/__LIBVIRT__DISKS__:0 flags 0 timeout 60

Comment 36 Han Han 2017-09-01 01:29:26 UTC
Hi Michael,
I didn't enabled renewal_read_extend_sec in the sanlock config. All my sanlock.conf is the default configuration like:
# renewal_read_extend_sec = <seconds>
# command line: n/a

Comment 37 Michal Privoznik 2017-09-04 10:45:55 UTC
(In reply to David Teigland from comment #35)
> Aren't you looking at old log messages?

No.

> 
> This libvirt message from 2017-08-21 at 41 past the hour:
> 
> 2017-08-21 02:41:57.703+0000: 9335: debug :
> virLockManagerSanlockSetupLockspace:250 : Lockspace
> /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__ does not yet exist

Libvirt's timestamp is in UTC, regardless of local timezone ..

> 
> corresponds to this sanlock message at the same time which shows the 60
> second timeout:
> 
> 2017-08-21 10:41:57 8098282 [9782]: cmd_add_lockspace 2,10
> __LIBVIRT__DISKS__:1:/var/lib/libvirt/sanlock/__LIBVIRT__DISKS__:0 flags 0
> timeout 60

.. which is not the case for sanlock I believe.

Comment 38 Michal Privoznik 2017-09-04 11:28:06 UTC
(In reply to Michal Privoznik from comment #37)
> (In reply to David Teigland from comment #35)
> > Aren't you looking at old log messages?
> 
> No.
> 

Oh, now I get it. Sorry. Yeah I was looking at old messages. Anyway, since libvirt is calling the right APIs and setting the right values, how come the io_timeout is not honoured?

Comment 39 David Teigland 2017-09-05 14:20:34 UTC
The sequence of lockspace operations shown in the log file does not make any sense, it's a jumbled collection of various errors which indicate that sanlock is not being used correctly.  If you start from a clean state, and do a clean series of lockspace operations, send me the sanlock log_dump output from that if it doesn't do what you expect.

Comment 40 Michal Privoznik 2017-09-06 12:24:59 UTC
(In reply to David Teigland from comment #39)
> The sequence of lockspace operations shown in the log file does not make any
> sense, it's a jumbled collection of various errors which indicate that
> sanlock is not being used correctly.

For instance?

> If you start from a clean state, and
> do a clean series of lockspace operations, send me the sanlock log_dump
> output from that if it doesn't do what you expect.

Han, can you please provide that log?

Comment 41 Han Han 2017-09-13 02:14:19 UTC
(In reply to Michal Privoznik from comment #40)
> (In reply to David Teigland from comment #39)
> > The sequence of lockspace operations shown in the log file does not make any
> > sense, it's a jumbled collection of various errors which indicate that
> > sanlock is not being used correctly.
> 
> For instance?
> 
> > If you start from a clean state, and
> > do a clean series of lockspace operations, send me the sanlock log_dump
> > output from that if it doesn't do what you expect.
> 
> Han, can you please provide that log?

Michal, the log of log_dump is in sanlock_client impressed in https://bugzilla.redhat.com/attachment.cgi?id=1316071 and https://bugzilla.redhat.com/attachment.cgi?id=1316091 .

But they are not from a clear state. Do you need log_dump from a clean state? If you needed, I will test it again.

Comment 42 Michal Privoznik 2017-09-13 10:49:52 UTC
(In reply to Han Han from comment #41)

> But they are not from a clear state. Do you need log_dump from a clean
> state? If you needed, I will test it again.

Yes I believe that's what David needs.


Note You need to log in before you can comment on or make changes to this bug.