Bug 820173
Summary: | Libvirtd fails to initialize sanlock driver | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Luwen Su <lsu> | ||||||
Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 6.4 | CC: | acathrow, ajia, bili, dallan, dyasny, dyuan, jdenemar, mzhan, psingare, rwu, tlavigne, ydu | ||||||
Target Milestone: | rc | ||||||||
Target Release: | 6.4 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | libvirt-0.10.2-14.el6 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 850470 886421 (view as bug list) | Environment: | |||||||
Last Closed: | 2013-02-21 07:12:49 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 832156, 886421 | ||||||||
Attachments: |
|
Description
Luwen Su
2012-05-09 10:32:02 UTC
(In reply to comment #0) > Description of problem: > After config qemu.conf and qemu-sanlock.conf for use sanlocak , the libvirtd > will die > > Version-Release number of selected component (if applicable): > libvirt-lock-sanlock-0.9.10-16.el6.x86_64 > libvirt-0.9.10-16.el6.x86_64 > sanlock-1.8-2.el6.x86_64 Luwen, if you follow this link steps, can you still meet the issue on sanlock-1.8-2.el6.x86_64? http://libvirt.org/locking.html I haven't see you append SANLOCKOPTS="-w 0" into /etc/sysconfig/sanlock, in addition, please you show wdmd, sanlock daemon status and virt_use_sanlock selinux bool value, and also confirm you haven't a AVC error in /var/log/audit/audit.log, thanks. > Note : sometimes restart libvirtd will success , but it will die quickly. libvirtd is not dieing. It is intentionally shutting down because it failed to initialize sanlock. > 2012-05-09 07:45:32.689+0000: 13547: error : > virLockManagerSanlockSetupLockspace:191 : Unable to query sector size > /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__: No such device This is the key problem to diagnose. What filesystem is the /var/lib/libvirt/sanlock directory on ? I can't reproduce this problem myself using local disk for that location I can start the libvirtd successfully without a shared filesystem at /var/lib/libvirt/sanlock. But when I mount a filesystem at /var/lib/libvirt/sanlock, I can get the error the same as lsu, and cann't start the sanlock and libvirtd even after umount the filesystem. 1. met the error with the following configure: # getsebool -a|grep sanlock_use_nfs sanlock_use_nfs --> on # mount 10.66.5.5:/tmp on /var/lib/libvirt/sanlock type nfs (rw,vers=3,addr=10.66.5.5) # ll -d /var/lib/libvirt/sanlock/ drwxrwxrwt. 16 root root 12288 May 9 22:02 /var/lib/libvirt/sanlock/ 2. Before I mount a filesystem at /var/lib/libvirt/sanlock, I cann't reproduce it with sanlock-1.8-2.el6 and sanlock-2.1-1.el6. libvirt-0.9.10-18.el6.x86_64 sanlock-lib-1.8-2.el6.x86_64 libvirt-lock-sanlock-0.9.10-18.el6.x86_64 sanlock-1.8-2.el6.x86_64 selinux-policy-3.7.19-148.el6.noarch # tail -1 /etc/libvirt/qemu.conf lock_manager = "sanlock" 2. enable host_id, auto_disk_leases and disk_lease_dir in qemu-sanlock.conf # tail -3 /etc/libvirt/qemu-sanlock.conf host_id = 1 auto_disk_leases = 1 disk_lease_dir = "/var/lib/libvirt/sanlock" # service wdmd start Starting wdmd: [ OK ] [root@rhel63 home]# service wdmd status wdmd (pid 8258) is running... # service sanlock start Starting sanlock: [ OK ] [root@rhel63 home]# service sanlock status sanlock (pid 8295) is running... # getsebool -a|grep sanlock virt_use_sanlock --> on # service libvirtd start Starting libvirtd daemon: [ OK ] # service libvirtd status libvirtd (pid 8488) is running... # virsh list Id Name State ---------------------------------------------------- 1 xp running # ls /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__ -la -rw-------. 1 root root 1048576 May 9 22:03 /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__ upgrade sanlock to 2.1-1.el6 # rpm -qa|grep sanlock sanlock-2.1-1.el6.x86_64 libvirt-lock-sanlock-0.9.10-18.el6.x86_64 sanlock-lib-2.1-1.el6.x86_64 # service libvirtd restart Stopping libvirtd daemon: [ OK ] Starting libvirtd daemon: [ OK ] # service libvirtd status libvirtd (pid 8780) is running... # service libvirtd status libvirtd (pid 8780) is running... # virsh list Id Name State ---------------------------------------------------- 1 xp running # ls /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__ -la -rw-------. 1 root root 1048576 May 9 22:05 /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__ # rm -f /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__ # service libvirtd restart Stopping libvirtd daemon: [ OK ] Starting libvirtd daemon: [ OK ] # service libvirtd status libvirtd (pid 8940) is running... # ls /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__ -la -rw-------. 1 root root 1048576 May 9 22:06 /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__ (In reply to comment #3) > > Note : sometimes restart libvirtd will success , but it will die quickly. > > libvirtd is not dieing. It is intentionally shutting down because it failed to > initialize sanlock. # service libvirtd status libvirtd dead but subsys locked > > > > 2012-05-09 07:45:32.689+0000: 13547: error : > > virLockManagerSanlockSetupLockspace:191 : Unable to query sector size > > /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__: No such device > > This is the key problem to diagnose. > > > What filesystem is the /var/lib/libvirt/sanlock directory on ? I can't > reproduce this problem myself using local disk for that location I can reproduce it in RHEL6.2. # rpm -q libvirt sanlock libvirt-lock-sanlock selinux-policy libvirt-0.9.4-23.el6.x86_64 sanlock-1.8-2.el6.x86_64 libvirt-lock-sanlock-0.9.4-23.el6.x86_64 selinux-policy-3.7.19-126.el6.noarch Are you getting any audit log AVC messages from selinux ? (In reply to comment #4) > upgrade sanlock to 2.1-1.el6 > > # rpm -qa|grep sanlock > sanlock-2.1-1.el6.x86_64 > libvirt-lock-sanlock-0.9.10-18.el6.x86_64 > sanlock-lib-2.1-1.el6.x86_64 In here, if you restart sanlock service and manually remove /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__, does it work after restarting libvirtd service? > > # service libvirtd restart > Stopping libvirtd daemon: [ OK ] > Starting libvirtd daemon: [ OK ] > # service libvirtd status > libvirtd (pid 8780) is running... > # service libvirtd status > libvirtd (pid 8780) is running... In addition, for sanlock-2.1-1.el6 or sanlock-2.2-1, if you append SANLOCKOPTS="-w 0" into /etc/sysconfig/sanlock then restart sanlock service, does sanlock service work? please double check it, in fact, the sanlock is still stopped status due to this error "sanlock[4091]: 181727 could not set socket /var/run/sanlock/sanlock.sock permissions: Operation not permitted", of course, it should be a sanlock bug, I will file a sanlock bug for this. (In reply to comment #6) > Are you getting any audit log AVC messages from selinux ? I cann't start the sanlock (sanlock-1.8-2.el6) with the following AVC deny: type=AVC msg=audit(1336725822.366:6933): avc: denied { dac_override } for pid=5047 comm="sanlock" capability=1 scontext=unconfined_u:system_r:sanlock_t:s0-s0:c0.c1023 tcontext=unconfined_u:system_r:sanlock_t:s0-s0:c0.c1023 tclass=capability type=SYSCALL msg=audit(1336725822.366:6933): arch=c000003e syscall=2 success=yes exit=3 a0=7fff286a0ad0 a1=80041 a2=1b6 a3=fffffff5 items=0 ppid=1 pid=5047 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=45 comm="sanlock" exe="/usr/sbin/sanlock" subj=unconfined_u:system_r:sanlock_t:s0-s0:c0.c1023 key=(null) no AVC msg when I use sanlock-2.1-1.el6, and can not start sanlock. # service sanlock start Starting sanlock: [ OK ] # service sanlock status sanlock is stopped I just filed a new bug 850470 against sanlock for the "No such device" error since according to the error logged by sanlock, sanlock_align() should have returned "Permission denied" error. The error indicates that sanlock daemon could not open /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__ file. Is the NFS share you mount to /var/lib/libvirt/sanlock root squashed? Actually, since both libvirtd and sanlock are running as root, root squashed nfs should not be a real problem. Please, setup the host so that you are able to reproduce the issue and perform the following steps: 0. make sure sanlock daemon is up and running with -w 0 1. service libvirtd stop 2. killall -STOP sanlock 3. service libvirtd start 4. take the output of the following commands: ps -fZwC sanlock ls -lZa /var/lib/libvirt/sanlock/ 5. killall -CONT sanlock (In reply to comment #14) Now, there are three different situations , root_squash nfs , no_squash nfs , no nfs. Short summart: Test envirment: libvirt-0.10.0-1.el6.x86_64 sanlock-lib-2.3-4.el6_3.x86_64 libvirt-lock-sanlock-0.10.0-1.el6.x86_64 sanlock-2.3-4.el6_3.x86_64 sanlock-python-2.3-4.el6_3.x86_64 A.root_squash nfs Fail with error Permission denied. B.no_root_squash nfs Success with no error. C.no nfs Success with error Operation now in progress A.root_squash nfs 0. # ps aux | grep sanlock .... root 11662 0.0 0.8 404504 61896 ? SLsl 11:00 0:00 sanlock daemon -w 0 root 11663 0.0 0.0 20828 268 ? S 11:00 0:00 sanlock daemon -w 0 Nfs status: 10.66.5.140:/export/ on /var/lib/libvirt/sanlock type nfs (rw,vers=3,addr=10.66.5.140) /export *(rw,root_squash) 1. # service libvirtd status libvirtd dead but subsys locked # service libvirtd stop Stopping libvirtd daemon: [FAILED] # service libvirtd status libvirtd is stopped 2. # killall -STOP sanlock 3. #service libvirtd start Starting libvirtd daemon: [ OK ] # service libvirtd status libvirtd dead but subsys locked 4. #ps -fZwC sanlock LABEL UID PID PPID C STIME TTY TIME CMD unconfined_u:system_r:sanlock_t:s0-s0:c0.c1023 root 11662 1 0 11:00 ? 00:00:00 sanlock daemon -w 0 #ls -lZa /var/lib/libvirt/sanlock/ drwxr-xr-x. root root system_u:object_r:nfs_t:s0 . drwxr-xr-x. root root system_u:object_r:virt_var_lib_t:s0 .. libvirtd.log: error : virLockManagerSanlockSetupLockspace:196 : Unable to create lockspace /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__: Permission denied B.no_root_squash 10.66.100.103:/export/ on /var/lib/libvirt/sanlock type nfs (rw,vers=4,addr=10.66.100.103,clientaddr=10.66.100.103) /export *(rw,no_root_squash) # ps -fZwC sanlock LABEL UID PID PPID C STIME TTY TIME CMD unconfined_u:system_r:sanlock_t:s0-s0:c0.c1023 root 15657 1 0 11:00 ? 00:00:00 sanlock daemon -w 0 # ls -lZa /var/lib/libvirt/sanlock/ drwxr-xr-x. root root system_u:object_r:nfs_t:s0 . drwxr-xr-x. root root system_u:object_r:virt_var_lib_t:s0 .. -rw-------. root root system_u:object_r:nfs_t:s0 __LIBVIRT__DISKS__ C.no nfs #ps -fZwC sanlock LABEL UID PID PPID C STIME TTY TIME CMD unconfined_u:system_r:sanlock_t:s0-s0:c0.c1023 root 5981 1 0 11:43 ? 00:00:00 sanlock daemon -w 0 # ls -lZa /var/lib/libvirt/sanlock/ drwx------. root root system_u:object_r:virt_var_lib_t:s0 . drwxr-xr-x. root root system_u:object_r:virt_var_lib_t:s0 .. -rw-------. root root unconfined_u:object_r:virt_var_lib_t:s0 __LIBVIRT__DISKS__ libvirtd.log: error : virLockManagerSanlockSetupLockspace:259 : Unable to add lockspace /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__: Operation now in progress Regardless of whether there's any functional bug here I think that if more information can be provided to the user about what is going wrong that's a very useful fix. Additional documentation at http://libvirt.org/locking.html might also be helpful. Clear the needinfo flag per comment 16. (In reply to comment #16) > (In reply to comment #14) > Now, there are three different situations , root_squash nfs , no_squash nfs > , no nfs. > Short summart: > Test envirment: > libvirt-0.10.0-1.el6.x86_64 > sanlock-lib-2.3-4.el6_3.x86_64 > libvirt-lock-sanlock-0.10.0-1.el6.x86_64 > sanlock-2.3-4.el6_3.x86_64 > sanlock-python-2.3-4.el6_3.x86_64 > > A.root_squash nfs > Fail with error Permission denied. > > B.no_root_squash nfs > Success with no error. > > C.no nfs > Success with error Operation now in progress You mean fail, right? > > > > A.root_squash nfs > 0. > # ps aux | grep sanlock > .... > root 11662 0.0 0.8 404504 61896 ? SLsl 11:00 0:00 sanlock > daemon -w 0 > root 11663 0.0 0.0 20828 268 ? S 11:00 0:00 sanlock > daemon -w 0 > > Nfs status: > 10.66.5.140:/export/ on /var/lib/libvirt/sanlock type nfs > (rw,vers=3,addr=10.66.5.140) > /export *(rw,root_squash) > > 1. > # service libvirtd status > libvirtd dead but subsys locked > # service libvirtd stop > Stopping libvirtd daemon: [FAILED] > # service libvirtd status > libvirtd is stopped > > 2. > # killall -STOP sanlock > 3. > #service libvirtd start > Starting libvirtd daemon: [ OK ] > # service libvirtd status > libvirtd dead but subsys locked > 4. > #ps -fZwC sanlock > LABEL UID PID PPID C STIME TTY > TIME CMD > unconfined_u:system_r:sanlock_t:s0-s0:c0.c1023 root 11662 1 0 11:00 ? > 00:00:00 sanlock daemon -w 0 > > #ls -lZa /var/lib/libvirt/sanlock/ > drwxr-xr-x. root root system_u:object_r:nfs_t:s0 . > drwxr-xr-x. root root system_u:object_r:virt_var_lib_t:s0 .. > > libvirtd.log: > error : virLockManagerSanlockSetupLockspace:196 : Unable to create lockspace > /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__: Permission denied > This error has nothing to do with sanlock; it's open(, O_CREAT); that fails. I assume you don't have write permissions on /export/ fro nobody:nobody. > > B.no_root_squash > 10.66.100.103:/export/ on /var/lib/libvirt/sanlock type nfs > (rw,vers=4,addr=10.66.100.103,clientaddr=10.66.100.103) > /export *(rw,no_root_squash) > > # ps -fZwC sanlock > LABEL UID PID PPID C STIME TTY > TIME CMD > unconfined_u:system_r:sanlock_t:s0-s0:c0.c1023 root 15657 1 0 11:00 ? > 00:00:00 sanlock daemon -w 0 > > # ls -lZa /var/lib/libvirt/sanlock/ > drwxr-xr-x. root root system_u:object_r:nfs_t:s0 . > drwxr-xr-x. root root system_u:object_r:virt_var_lib_t:s0 .. > -rw-------. root root system_u:object_r:nfs_t:s0 __LIBVIRT__DISKS__ > Great. > > C.no nfs > #ps -fZwC sanlock > LABEL UID PID PPID C STIME TTY > TIME CMD > unconfined_u:system_r:sanlock_t:s0-s0:c0.c1023 root 5981 1 0 11:43 ? > 00:00:00 sanlock daemon -w 0 > # ls -lZa /var/lib/libvirt/sanlock/ > drwx------. root root system_u:object_r:virt_var_lib_t:s0 . > drwxr-xr-x. root root system_u:object_r:virt_var_lib_t:s0 .. > -rw-------. root root unconfined_u:object_r:virt_var_lib_t:s0 > __LIBVIRT__DISKS__ > > libvirtd.log: > error : virLockManagerSanlockSetupLockspace:259 : Unable to add lockspace > /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__: Operation now in progress I have no idea so far; I was unable to reproduce with sanlock-2.5-1.el6.x86_64 Patch proposed upstream: https://www.redhat.com/archives/libvir-list/2012-October/msg01441.html Moving to POST: http://post-office.corp.redhat.com/archives/rhvirt-patches/2012-October/msg01229.html We need this patch as well: http://post-office.corp.redhat.com/archives/rhvirt-patches/2012-November/msg00005.html The question still exists on libvirt-0.10.2-7.el6.x86_64 with sanlock-2.6-1.el6.x86_64, for details, please see the following steps. [root@201 ajia]# tail -1 /etc/libvirt/qemu.conf lock_manager = "sanlock" [root@201 ajia]# tail -3 /etc/libvirt/qemu-sanlock.conf disk_lease_dir = "/var/lib/libvirt/sanlock" host_id = 1 auto_disk_leases = 1 [root@201 ajia]# ps -ef|grep sanlock sanlock 16579 1 0 13:43 ? 00:00:00 sanlock daemon -U sanlock -G sanlock root 16580 16579 0 13:43 ? 00:00:00 sanlock daemon -U sanlock -G sanlock root 17065 5077 0 13:51 pts/3 00:00:00 grep sanlock [root@201 ajia]# getsebool -a|grep sanlock sanlock_use_fusefs --> off sanlock_use_nfs --> on sanlock_use_samba --> off virt_use_sanlock --> on [root@201 ajia]# grep avc /var/log/audit/audit.log |wc -l 0 [root@201 ajia]# service libvirtd restart Stopping libvirtd daemon: [FAILED] Starting libvirtd daemon: [ OK ] [root@201 ajia]# ll /var/lib/libvirt/sanlock/ total 0 [root@201 ajia]# service libvirtd status libvirtd dead but subsys locked [root@201 ajia]# tailf /var/log/libvirt/libvirtd.log ...... 2012-11-06 05:52:31.247+0000: 17109: debug : virLockManagerNopDeinit:42 : 2012-11-06 05:52:31.247+0000: 17109: debug : virLockManagerPluginNew:146 : Module load sanlock from /usr/lib64/libvirt/lock-driver 2012-11-06 05:52:31.247+0000: 17109: debug : virLockManagerSanlockInit:361 : version=1000000 configFile=/etc/libvirt/qemu-sanlock.conf flags=0 2012-11-06 05:52:31.247+0000: 17109: debug : virLockManagerSanlockSetupLockspace:219 : Lockspace /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__ does not yet exist 2012-11-06 05:52:31.249+0000: 17109: error : virLockManagerSanlockSetupLockspace:264 : Unable to query sector size /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__: No such device 2012-11-06 05:52:31.249+0000: 17109: error : qemudLoadDriverConfig:558 : Failed to load lock manager sanlock 2012-11-06 05:52:31.249+0000: 17109: error : qemudStartup:751 : Missing lock manager implementation 2012-11-06 05:52:31.249+0000: 17109: error : virStateInitialize:798 : Initialization of QEMU state driver failed 2012-11-06 05:52:31.249+0000: 17109: error : daemonRunStateInit:739 : Driver state initialization failed (In reply to comment #29) I met your issue "No such device" before config user and group which introduced from patch comment 26 in qemu-sanlock.conf . After do that , libvirtd will success to start after a while.Other's config are as same as you BTW. #rpm -q libvirt sanlock libvirt-0.10.2-7.el6.x86_64 sanlock-2.6-2.el6.x86_64 #tail -5 /etc/libvirt/qemu-sanlock.conf disk_lease_dir = "/var/lib/libvirt/sanlock" host_id = 1 auto_disk_leases = 1 user="root" group="root" It means , it kept "dead but subsys locked" with the error "Operation now in progress" in log. The service will become to nomal after about 2-3m. #for i in {1..1000} ; do echo $i;service libvirtd restart ;sleep 2 ; done 81 Stopping libvirtd daemon: [FAILED] Starting libvirtd daemon: [ OK ] 82 Stopping libvirtd daemon: [ OK ] Starting libvirtd daemon: [ OK ] 83 Stopping libvirtd daemon: [ OK ] Starting libvirtd daemon: [ OK ] libvirtd.log always show this: error : virLockManagerSanlockSetupLockspace:334 : Unable to add lockspace /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__: Operation now in progress Then when i started the guest , i met the "No such device" . #virsh start test error: Failed to start domain test error: internal error Unable to query sector size /var/lib/libvirt/sanlock/f30a79d7d4b9c375cf1b1a5feb8e8e1f: No such device libvirtd.log error : virCommandHandshakeWait:2554 : internal error Unable to query sector size /var/lib/libvirt/sanlock/f30a79d7d4b9c375cf1b1a5feb8e8e1f: No such device (In reply to comment #30) > (In reply to comment #29) > I met your issue "No such device" before config user and group which > introduced from patch comment 26 in qemu-sanlock.conf . > > After do that , libvirtd will success to start after a while.Other's config > are as same as you BTW. > > #rpm -q libvirt sanlock > libvirt-0.10.2-7.el6.x86_64 > sanlock-2.6-2.el6.x86_64 > > #tail -5 /etc/libvirt/qemu-sanlock.conf > disk_lease_dir = "/var/lib/libvirt/sanlock" > host_id = 1 > auto_disk_leases = 1 > user="root" > group="root" > In fact, I also tried to uncomment user and group lines, but I got the same failed result, I forgot to add it into Comment 29. In addition, if we don't specify user and group in qemu-sanlock.conf, libvirt shouldn't also die. Alex There is a summary , i'm verifying this bug. So i keep keep ON_QA for further discuss. Pkg: #rpm -q sanlock libvirt sanlock-2.6-2.el6.x86_64 libvirt-0.10.2-7.el6.x86_64 Summary: 1.Basically all works fine with config below, but when restart libvirtd, i run for i in {1..1000} ; do echo $i;service libvirtd restart ;sleep 2 ; done ,libvirtd takes 20-50 times failed with "Operation now in progress" in log before success , seems it was waiting , but i was no idea what it is.No logs in message ,sanlock and libvirtd appeard about that. # tail -2 /etc/sysconfig/sanlock SANLOCKUSER="root" SANLOCKOPTS="-w 0" I test user&&group ="root" and user&&group="sanlock" separately, both of them wokrs fine. # ll /var/lib/libvirt/sanlock/ total 2048 -rw-------. 1 root root 1048576 Nov 6 17:10 f30a79d7d4b9c375cf1b1a5feb8e8e1f -rw-rw----. 1 root root 1048576 Nov 6 17:26 __LIBVIRT__DISKS__ and restart sanlock , libvirtd # ll /var/lib/libvirt/sanlock/ total 2048 -rw-------. 1 root root 1048576 Nov 6 17:32 f30a79d7d4b9c375cf1b1a5feb8e8e1f -rw-rw----. 1 sanlock sanlock 1048576 Nov 6 17:38 __LIBVIRT__DISKS__ Issue A: libvirtd need take some times to start normally. like this: for i in {1..1000} ; do echo $i;service libvirtd restart ;sleep 2 ; done 20 Stopping libvirtd daemon: [FAILED] Starting libvirtd daemon: [ OK ] 21 Stopping libvirtd daemon: [ OK ] Starting libvirtd daemon: [ OK ] 22 Stopping libvirtd daemon: [ OK ] Starting libvirtd daemon: [ OK ] error : virLockManagerSanlockSetupLockspace:334 : Unable to add lockspace /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__: Operation now in progress 2.Run in sanlock default configuration , qemu-sanlock is same as 1.Libvirtd can initialize sanlock successfully , but it still take a while to restart normally . Issue A: start guest get a "No such device" error #virsh start test error: Failed to start domain test error: internal error Unable to query sector size /var/lib/libvirt/sanlock/f30a79d7d4b9c375cf1b1a5feb8e8e1f: No such device libvirtd.log error : virCommandHandshakeWait:2554 : internal error Unable to query sector size /var/lib/libvirt/sanlock/f30a79d7d4b9c375cf1b1a5feb8e8e1f: No such device Issue B: Same to 1->Issue A . 81 Stopping libvirtd daemon: [FAILED] Starting libvirtd daemon: [ OK ] 82 Stopping libvirtd daemon: [ OK ] Starting libvirtd daemon: [ OK ] (In reply to comment #31) > > In addition, if we don't specify user and group in qemu-sanlock.conf, > libvirt shouldn't also die. > > Alex While I agree it would be nice to not have to configure these and libvirt would work right after unpacking out of the box, it is not possible. I mean, libvirt would have to guess under which user:group sanlock process runs (which is tricky since there can be more processes than one called sanlock, some of them may even be run by evil user trying to confuse libvirt or something). So you really need to configure those two variables. (In reply to comment #31) > In addition, if we don't specify user and group in qemu-sanlock.conf, > libvirt shouldn't also die. Actually, I think it *has* to die, since sanlock's purpose is to protect against data corruption caused my the starting of a VM multiple times, if a user has requested sanlock and it is not available, I think the only safe option is to refuse to start. The error should be clearly described in the logs however so that the user can fix the problem. (In reply to comment #33) > (In reply to comment #31) > > > > > In addition, if we don't specify user and group in qemu-sanlock.conf, > > libvirt shouldn't also die. > > > > Alex > > While I agree it would be nice to not have to configure these and libvirt > would work right after unpacking out of the box, it is not possible. I mean, > libvirt would have to guess under which user:group sanlock process runs > (which is tricky since there can be more processes than one called sanlock, > some of them may even be run by evil user trying to confuse libvirt or > something). So you really need to configure those two variables. Hi Michal , I still have another one question about the procedure. Like comment 32 descripted, does the process about libvirtd are expected? I mean that configuration are right and finally libvirtd works fine , but when first start libvirtd service, it "die" for a while with erro" Operation now in progress" . Luwen, I don't think it's okay. However, it may be sanlock bug as well. I mean, it takes some time sanlock to initialize. And if libvirtd is restarted meanwhile, we just request same operation and hence sanlock returns 'Operation now in progress' as it is still processing request from previous run. Honestly, I am not sure how to fix this behaviour. Waiting in loop while sanlock is returning that error is a bit tricky, as there can be very possible thousands of iterations as well as just a few ones. But let me see what can be done. I am afraid we need this patch as well: http://post-office.corp.redhat.com/archives/rhvirt-patches/2012-November/msg00252.html I am moving this to POST again so it can be included in the next build. I still can reproduce this issue on libvirt-0.10.2-10.el6.x86_64 with sanlock-2.6-1.el6.x86_64 and selinux-policy-3.7.19-173.el6.noarch. # tail -1 /etc/libvirt/qemu.conf lock_manager = "sanlock" # tail -5 /etc/libvirt/qemu-sanlock.conf user = "root" group = "root" host_id = 1 auto_disk_leases = 1 disk_lease_dir = "/var/lib/libvirt/sanlock" # ps -ef|grep sanlock sanlock 950 1 0 17:01 ? 00:00:00 sanlock daemon -U sanlock -G sanlock root 951 950 0 17:01 ? 00:00:00 sanlock daemon -U sanlock -G sanlock root 1312 30654 0 17:04 pts/7 00:00:00 grep sanlock # service libvirtd restart Stopping libvirtd daemon: [FAILED] Starting libvirtd daemon: [ OK ] And check libvirtd.log: 2012-11-27 09:03:02.543+0000: 1227: debug : virLockManagerNopDeinit:42 : 2012-11-27 09:03:02.543+0000: 1227: debug : virLockManagerPluginNew:146 : Module load sanlock from /usr/lib64/libvirt/lock-driver 2012-11-27 09:03:02.543+0000: 1227: debug : virLockManagerSanlockInit:390 : version=1000000 configFile=/etc/libvirt/qemu-sanlock.conf flags=0 2012-11-27 09:03:02.543+0000: 1227: debug : virLockManagerSanlockSetupLockspace:227 : Lockspace /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__ does not yet exist 2012-11-27 09:03:02.544+0000: 1227: error : virLockManagerSanlockSetupLockspace:272 : Unable to query sector size /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__: No such device 2012-11-27 09:03:02.544+0000: 1227: error : qemudLoadDriverConfig:558 : Failed to load lock manager sanlock 2012-11-27 09:03:02.544+0000: 1227: error : qemudStartup:765 : Missing lock manager implementation 2012-11-27 09:03:02.544+0000: 1227: error : virStateInitialize:798 : Initialization of QEMU state driver failed 2012-11-27 09:03:02.544+0000: 1227: error : daemonRunStateInit:739 : Driver state initialization failed # tailf /var/log/messages XXX Nov 27 17:08:27 localhost sanlock[950]: 2012-11-27 17:08:27+0800 4769667 [954]: open error -13 /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__ Additional info: # grep avc /var/log/audit/audit.log Notes, no AVC denied on sanlock. # getsebool -a|grep sanlock sanlock_use_fusefs --> off sanlock_use_nfs --> on sanlock_use_samba --> off virt_use_sanlock --> on And libvirtd will be dead. # virsh list error: Failed to reconnect to the hypervisor error: no valid connection error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory Alex, can you please attach full debug logs so I can see what's going on? (In reply to comment #43) > Alex, can you please attach full debug logs so I can see what's going on? Michal, I think the other logs should be useless for you, but I still attach it as an attachment. Created attachment 652568 [details]
libvirtd.log
Well, I am afraid there's nothing left what we can do in here. Libvirt starts setting a sanlock up but gets killed meanwhile. So after 2 seconds, when it is started again, it sees that the lease file is created, however, a sanlock api to query its status fails as sanlockd hasn't initialized it yet. Hence, libvirt thinks sanlock is not usable at all and hence refuse to start. Moreover, nobody in production is doing the loop as Alex is doing - so this issue is harmless. Michal, so how does the user know what to do to fix the situation? From the logs. Either libvirt or sanlock logs. Note that this bug reproduces if and only if libvirtd is restarted during it's startup procedure. It won't reproduce if one restarts libvirtd after longer period (and by londer period I mean like 10+ seconds). Right, it's clearly a corner case, but it also has the potential to be pretty unpleasant for the user if, for example, libvirt crashed while it was in the window where sanlock is setting up the leases. IIUC, libvirt would then not start until the sanlock state was cleared. Maybe I just am not familiar enough with sanlock, but I got into this state once and was pretty frustrated at the lack of guidance from the logs about how to get out of it. Is this a situation that requires better information from sanlock on what's happening? I don't mean to be pedantic, just my experiences with sanlock make me think the whole area needs to be more user friendly because I think it's a really great piece of functionality. Is the user expected to know how to clear sanlock state? If so, can you provide a pointer to any documentation or if none exists let me know and I'll file a bug requesting it. (In reply to comment #49) > Right, it's clearly a corner case, but it also has the potential to be > pretty unpleasant for the user if, for example, libvirt crashed while it was > in the window where sanlock is setting up the leases. IIUC, libvirt would > then not start until the sanlock state was cleared. > > Maybe I just am not familiar enough with sanlock, but I got into this state > once and was pretty frustrated at the lack of guidance from the logs about > how to get out of it. Is this a situation that requires better information > from sanlock on what's happening? I don't mean to be pedantic, just my > experiences with sanlock make me think the whole area needs to be more user > friendly because I think it's a really great piece of functionality. Is the > user expected to know how to clear sanlock state? If so, can you provide a > pointer to any documentation or if none exists let me know and I'll file a > bug requesting it. Well, if sanlock wouldn't narrow all errors just to ENODEV but return the actual error it would be much more helpful and libvirt (or any other app) could then decide what to do instead of blindly claiming an error. Nevertheless, I've come up with possible patch which could prevent this. Luwen, can you please give it a try? https://brewweb.devel.redhat.com/taskinfo?taskID=5148224 Hi Michal, This is test procedure: libvirt-0.10.2-10.el6sanlockTest.x86_64 sanlock-2.6-2.el6.x86_64 Config same to Comment 41 #getsebool -a | grep sanlock sanlock_use_fusefs --> off sanlock_use_nfs --> on sanlock_use_samba --> off virt_use_sanlock --> on # ps -ef | grep sanlock root 1898 1 0 13:38 ? 00:00:00 wdmd -G sanlock sanlock 4265 1 0 13:44 ? 00:00:00 sanlock daemon -U sanlock -G sanlock root 4266 4265 0 13:44 ? 00:00:00 sanlock daemon -U sanlock -G sanlock root 13555 1427 0 14:07 pts/0 00:00:00 grep sanlock #service libvirtd restart Stopping libvirtd daemon: [FAILED] Starting libvirtd daemon: [ OK ] libvirtd.log: 2012-12-03 06:04:33.323+0000: 13485: debug : virLockManagerNopDeinit:42 : 2012-12-03 06:04:33.323+0000: 13485: debug : virLockManagerPluginNew:146 : Module load sanlock from /usr/lib64/libvirt/lock-driver 2012-12-03 06:04:33.327+0000: 13485: debug : virLockManagerSanlockInit:397 : version=1000000 configFile=/etc/libvirt/qemu-sanlock.conf flags=0 2012-12-03 06:04:33.327+0000: 13485: debug : virFileClose:72 : Closed fd 21 2012-12-03 06:04:33.327+0000: 13485: debug : virLockManagerSanlockSetupLockspace:228 : Lockspace /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__ does not yet exist 2012-12-03 06:04:33.328+0000: 13485: debug : virLockManagerSanlockSetupLockspace:266 : Waiting for sanlock to settle down 2012-12-03 06:04:33.328+0000: 13485: debug : virLockManagerSanlockSetupLockspace:268 : Waiting failed with -2 2012-12-03 06:04:33.328+0000: 13485: debug : virLockManagerSanlockSetupLockspace:269 : Dome waiting 2012-12-03 06:04:33.329+0000: 13485: error : virLockManagerSanlockSetupLockspace:280 : Unable to query sector size /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__: No such device 2012-12-03 06:04:33.329+0000: 13485: debug : virFileClose:72 : Closed fd 21 2012-12-03 06:04:33.331+0000: 13485: error : qemudLoadDriverConfig:558 : Failed to load lock manager sanlock 2012-12-03 06:04:33.332+0000: 13485: error : qemudStartup:765 : Missing lock manager implementation 2012-12-03 06:04:33.332+0000: 13485: error : virStateInitialize:798 : Initialization of QEMU state driver failed 2012-12-03 06:04:33.332+0000: 13485: error : daemonRunStateInit:739 : Driver state initialization failed 2012-12-03 06:04:33.332+0000: 13485: debug : virObjectUnref:135 : OBJECT_UNREF: obj=0x789e70 Does it is the patch want to do? And any sussesgion about it? Thanks. Well, the patch shows there's nothing libvirt can do here. If sanlock isn't able to tell us recommended lease file size, we must fail instead of allocating a file of random size. I've tried to insert some wait for sanlock settling down, but it didn't work neither. And I've tried client versions of some init sanlock calls as recommended by a sanlock developer (by client version I mean the one that doesn't talk to sanlockd but run locally instead), but it didn't work neither. So I guess there's nothing left to do for libvirt here .... (In reply to comment #52) > Well, the patch shows there's nothing libvirt can do here. If sanlock isn't > able to tell us recommended lease file size, we must fail instead of > allocating a file of random size. I've tried to insert some wait for sanlock > settling down, but it didn't work neither. And I've tried client versions of > some init sanlock calls as recommended by a sanlock developer (by client > version I mean the one that doesn't talk to sanlockd but run locally > instead), but it didn't work neither. So I guess there's nothing left to do > for libvirt here .... Hi Michal , I have some questions... 1. A.Per comment 32 , libvirtd can start though die for a while error:Operation now in progress B.After use comment 38's patch , libvirtd always die, error:No such device 2. For this bug Which method libvirt shoud take ? Use or not use comment 38's patch? 3. For this issue Could we clone a sanlock bug to track it? Thanks. 1) Right. The error comes from sanlock where error reporting is not fortunate. in most cases, sanlock overwrites the real error with ENODEV leaving us to guess what real error is. 2) Use it. It tries to deal with one exception from the rule from 1). If we obtain EINPROGRESS we try again or - when sufficient new sanlock is presented - wait till state of lease file changes. 3) Sure. We should do that. Ok , I test again with libvirt-0.10.2-11.el6 and get the same result with comment 51.Per comment 52 , verify this bug and i will clone a sanlock one to track this issue. It turns out we need this patch as well: http://post-office.corp.redhat.com/archives/rhvirt-patches/2012-December/msg00366.html Hi Michal , Get a No such device again.. :( ------------------------------Configuration---------------------------------- # rpm -q libvirt kernel sanlock libvirt-0.10.2-13.el6.x86_64 kernel-2.6.32-348.el6.x86_64 sanlock-2.6-2.el6.x86_64 # getsebool -a | grep sanlock sanlock_use_fusefs --> off sanlock_use_nfs --> on sanlock_use_samba --> off virt_use_sanlock --> on # tail -5 /etc/libvirt/qemu-sanlock.conf user = "sanlock" group = "sanlock" host_id = 1 auto_disk_leases = 1 disk_lease_dir = "/var/lib/libvirt/sanlock" # tail -1 /etc/libvirt/qemu.conf lock_manager = "sanlock" ----------------------------------------------------------------------------- 1.Start wdmd -> sanlock -> libvirtd on by on 2.#virsh list //hang a whill , then work as expected Id Name State ---------------------------------------------------- 3. # virsh start test error: Failed to start domain test error: internal error Unable to query sector size /var/lib/libvirt/sanlock/f30a79d7d4b9c375cf1b1a5feb8e8e1f: No such device 4. # ll /var/lib/libvirt/sanlock/ total 1024 -rw-rw----. 1 sanlock sanlock 1048576 Dec 20 20:16 __LIBVIRT__DISKS__ log: sanlock: 531 [3786]: sanlock daemon started 2.6 host 19aa10d4-3555-41be-bd47-26bdcb7099b2.localhost. 535 [3790]: s1 lockspace __LIBVIRT__DISKS__:1:/var/lib/libvirt/sanlock/__LIBVIRT__DISKS__:0 574 [3791]: open error -13 /var/lib/libvirt/sanlock/f30a79d7d4b9c375cf1b1a5feb8e8e1f libvirt: error : virCommandHandshakeWait:2554 : internal error Unable to query sector size /var/lib/libvirt/sanlock/f30a79d7d4b9c375cf1b1a5feb8e8e1f: No such device PS: Since Bug 888197 is not a bug , i use sanlock client shutdown -f 1 to unlock sanlock service.And i test both user are sanlock and root , get the same result. So anything i missed or that's the libvirt expected? So this happens not when you just restart libvirtd quickly, but even if you just start libvirtd and then start a domain? Moreover, can you please attach libvirt debug logs? Created attachment 667105 [details]
libvirt-13 log
(In reply to comment #59) > So this happens not when you just restart libvirtd quickly, but even if you > just start libvirtd and then start a domain? Yeah , now libvirtd can start normally with a short time waiting.But can't start guest normally. I am afraid we need this patch then: http://post-office.corp.redhat.com/archives/rhvirt-patches/2012-December/msg00590.html # rpm -q libvirt sanlock kernel libvirt-0.10.2-14.el6.x86_64 sanlock-2.6-2.el6.x86_64 kernel-2.6.32-348.el6.x86_64 Configuration is as same as comment 58 1.#virsh start test //hang a while , then start succuss 2.#ll /var/lib/libvirt/sanlock total 2048 -rw-------. 1 sanlock sanlock 1048576 Jan 5 14:03 f30a79d7d4b9c375cf1b1a5feb8e8e1f -rw-r-----. 1 sanlock sanlock 1048576 Jan 5 14:04 __LIBVIRT__DISKS__ 3. For: user = "root" group = "root" sanlock get a "open error -13" error libvirtd get a "No such device" error and faild to start, it's expected. So set this bug to verify Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0276.html |