Bug 1189414
| Summary: | Errors in /var/log/messages and /var/log/sanlock.log when successful live migration of RHEV. | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Chen <cchen> |
| Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> |
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 6.6 | CC: | agk, cluster-maint, dyuan, jdenemar, mzhan, rbalakri, teigland, xuzhang, zhwang, zpeng |
| Target Milestone: | rc | Keywords: | Reopened, Upstream |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | libvirt-0.10.2-51.el6 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-07-22 05:48:34 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Chen
2015-02-05 09:25:46 UTC
*** This bug has been marked as a duplicate of bug 1088034 *** "cmd 9 target pid X not found" means that someone is doing sanlock_inquire(X) to get information about leases held by pid X. sanlock prints this error when it does not find any record of pid X. This may or may not be ok; in bug 1088034 it was not a problem. I assume that the caller of sanlock_inquire() is still libvirt. What's surprising to me is that sanlock_inquire() is called on the migration destination. The method for migration with sanlock involves sanlock_inquire() being called on the source to collect lease information, passing this lease information to the destination through libvirt during migration, then reacquiring the leases on the destination using sanlock_acquire(LVER) before the new pid runs qemu. Perhaps the libvirt folks could check if/why sanlock_inquire() is being run on the migration destination. This is all fairly theoretical, and something of a waste of time, because RHEV has never actually used sanlock leases for vms through libvirt. So, libvirt is performing many of the sanlock steps that we're debugging here, even though they do nothing. Perhaps there is a way to disable all the sanlock related stuff until it's actually tested and used, so we're not wasting our time. When RHEV does start using sanlock leases for vms, and we first test this, I expect we'll find actual bugs and things to fix in this area. So it seems this could actually be related to bug 1088034. We fixed the bug by calling sanlock_inquire to check whether a domain is registered with sanlock or not. So in case we do this during incoming migration (which would be pretty stupid to be honest), we'd see such errors in the logs... I'll look at it. Confirmed and reproduced with upstream. Once sanlock locking driver is enabled in qemu.conf, everytime libvirt starts a new domain sanlock logs "cmd 9 target pid 2135544 not found". The reason is what I described in comment 7. We check whether a process is registered with sanlock even when we have just started it. > We check whether a process is registered with sanlock even when
> we have just started it.
Could you give me a pointer to that code so I can check what sequence of sanlock call libvirt is running? I'd like to see if you're using the sanlock interface as expected, or if there could be a better alternative.
The code was added as a fix to bug 1088034 in patch https://www.redhat.com/archives/libvir-list/2014-May/msg00347.html The salock_inquire call which results in the error messages in sanlock.log is at http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/locking/lock_driver_sanlock.c;h=60f305c43bbba94834f430aa6365f884d50b99a1;hb=HEAD#l501 in virLockManagerSanlockNew, which is called whenever we initialize our lock manager structure for a domain. I made a patch that avoids this call when we have just started the domain, because we know it is not registered. I will send the patch upstream for review soon, but I'll happily rework it if you suggest a better way of checking whether a domain is registered with sanlockd or not. Patch proposed upstream: https://www.redhat.com/archives/libvir-list/2015-March/msg00337.html Thanks, that looks like a good solution. I see that sanlock_inquire() is the only good way that sanlock provides to check if a pid is registered. It wasn't envisioned to be used for that, which is why sanlock logs the "not found" error. I'll also remove that error message from sanlock since we clearly need a legitimate way to check for a pid, and inquire seems like a reasonable way to do it. Fixed upstream by v1.2.13-88-g54972be:
commit 54972be8430c709a338c0716ebc48b02be25dcb7
Author: Jiri Denemark <jdenemar>
Date: Fri Mar 6 15:58:55 2015 +0100
sanlock: Don't spam logs with "target pid not found"
Commit v1.2.4-52-gda879e5 fixed issues with domains started before
sanlock driver was enabled by checking whether a running domain is
registered with sanlock and if it's not, sanlock driver is basically
ignored for the domain.
However, it was checking this even for domain which has just been
started and no sanlock_* API was called for them yet. This results in
cmd 9 target pid 2135544 not found
error messages to appear in sanlock.log whenever we start a new domain.
This patch avoids this useless check for freshly started domains.
Signed-off-by: Jiri Denemark <jdenemar>
Hi Jiri
I could reproduce this bug with the libvirt-50, however, met another issue that the vm will fail to start while update the libvirt from libvirt-50 to libvirt-51, BTW, met the same issue while update the libvirt from libvirt49 to libvirt-50, so could you help me check it ? thanks
steps
1.Configure the nfs server
[nfs_server]# cat /etc/exports
/export *(rw,no_root_squash)
[nfs_server]# ll /export -Zd
drwxr-xr-x. root root system_u:object_r:default_t:s0 /export
[nfs_server]# ll /export -Z
-rw-------. root root unconfined_u:object_r:default_t:s0 rhel67.img
2.Mount the nfs on both server and client
[nfs_client]#mount $nfs_serverip:/export /var/lib/libvirt/sanlock -o vers=3
3.Configure the sanlock env in the both the source and target host
[nfs_client]#setsebool virt_use_sanlock 1
[nfs_client]#setsebool virt_use_nfs 1
[nfs_client]# service libvirtd stop
[nfs_client]# augtool
augtool> set /files/etc/libvirt/qemu.conf/lock_manager "sanlock"
Note: when in the second HOST host_id =2
augtool> set /files/etc/libvirt/qemu-sanlock.conf/host_id 1
augtool> set /files/etc/libvirt/qemu-sanlock.conf/auto_disk_leases 1
Note: disk_lease_dir == nfs mount target
augtool> set /files/etc/libvirt/qemu-sanlock.conf/disk_lease_dir
"/var/lib/libvirt/sanlock"
augtool> save
Saved 1 file(s)
augtool> quit
[nfs_client]#service sanlock start
[nfs_client]#service libvirtd start
4.Re-check the nfs share directory
[nfs_server]# ll /export/ -dZ
drwxr-xr-x. root root unconfined_u:object_r:default_t:s0 /export/
[nfs_server]# ll /export/ -Z
-rw-------. root root system_u:object_r:default_t:s0 __LIBVIRT__DISKS__
-rw-------. root root unconfined_u:object_r:default_t:s0 rhel67.img
5.Start the guest on the nfs client, then recheck the nfs share directory in the nfs server
[nfs_client]# virsh start virt-tests-vm1
Domain virt-tests-vm1 started
[nfs_server]#ll /export/ -Z
-rw-------. root root system_u:object_r:default_t:s0 cde96501e7938c78acee7b2139cd05c7
-rw-------. root root system_u:object_r:default_t:s0 __LIBVIRT__DISKS__
-rw-------. qemu qemu unconfined_u:object_r:default_t:s0 rhel67.img
[nfs_server]# ll /export/ -Zd
drwxr-xr-x. root root unconfined_u:object_r:default_t:s0 /export/
6.Destroy the guest, then recheck the nfs share directory in the nfs server
[nfs_client]# virsh destroy virt-tests-vm1
Domain virt-tests-vm1 destroyed
[nfs_server]# ll /export -dZ
drwxr-xr-x. root root system_u:object_r:default_t:s0 /taa
[nfs_server# ll /export -Z
-rw-------. root root system_u:object_r:default_t:s0 cde96501e7938c78acee7b2139cd05c7
-rw-------. root root system_u:object_r:default_t:s0 __LIBVIRT__DISKS__
-rw-------. qemu qemu unconfined_u:object_r:default_t:s0 rhel67.img
7.Upate the libvirt to libvirt-51, then re-check the share directory in nfs server, the group of the nfs share directory have changed to sanlock
[nfs_client]#yum update libvirt-51
[nfs_client]#service libvirtd restart
[nfs_server]# ll /export/ -dZ
drwxrwx---. root sanlock unconfined_u:object_r:default_t:s0 /export/
[nfs_server]# ll /export/ -Z
-rw-------. root root system_u:object_r:default_t:s0 cde96501e7938c78acee7b2139cd05c7
-rw-------. root root system_u:object_r:default_t:s0 __LIBVIRT__DISKS__
-rw-------. root root unconfined_u:object_r:default_t:s0 rhel67.img
8.start the guest, the guest fail to start
[nfs_client]# virsh start virt-tests-vm1
error: Failed to start domain virt-tests-vm1
error: internal error process exited while connecting to monitor: char device redirected to /dev/pts/1
2015-03-25T08:58:00.357760Z qemu-kvm: -drive file=/var/lib/libvirt/sanlock/rhel67.img,if=none,id=drive-virtio-disk0,format=raw,cache=none: could not open disk image /var/lib/libvirt/sanlock/rhel67.img: Permission denied
Hi jiri The original issue for this bug that errors in /var/log/messages and /var/log/sanlock.log has been fixed in libvirt-0.10.2-51.el6 and the issue mentioned in comment 16 also could hit in the previous libvirt packet, so it may not new issue, so could we verify this bug first and track the comment16's issue with seperate bug if it does a bug? thanks You shouldn't use disk_lease_dir for storing disk image data. Thanks jiri's response, currently i could start the guest successfully even upgrade the libvirt packet after change the disk image data from disk_lease_dir to other directorys, so mark this bug verified Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-1252.html |