Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1189414

Summary:	Errors in /var/log/messages and /var/log/sanlock.log when successful live migration of RHEV.
Product:	Red Hat Enterprise Linux 6	Reporter:	Chen <cchen>
Component:	libvirt	Assignee:	Jiri Denemark <jdenemar>
Status:	CLOSED ERRATA	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	6.6	CC:	agk, cluster-maint, dyuan, jdenemar, mzhan, rbalakri, teigland, xuzhang, zhwang, zpeng
Target Milestone:	rc	Keywords:	Reopened, Upstream
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	libvirt-0.10.2-51.el6	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-07-22 05:48:34 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Chen 2015-02-05 09:25:46 UTC

Description of problem:

Errors in /var/log/messages and /var/log/sanlock.log when successful live migration of RHEV.

Version-Release number of selected component (if applicable):

RHEL 6.6
sanlock-2.8-1.el6.x86_64

How reproducible:

100%

Steps to Reproduce:
1. Live-migration a VM in RHEV.

2. On destination side, the following errors will be output in /var/log/messages and /var/log/sanlock.

2015-02-05 04:01:04-0500 1190 [8805]: cmd 9 target pid 10744 not found

(10744 is the live-migrated VM's qemu-kvm process )


Actual results:

Errors in /var/log/messages and /var/log/sanlock.log despite that the live-migration is successful.

Expected results:

No errors should be output in log file.

Additional info:

Comment 3 David Teigland 2015-02-26 15:53:16 UTC


*** This bug has been marked as a duplicate of bug 1088034 ***

Comment 5 David Teigland 2015-02-27 15:23:52 UTC

"cmd 9 target pid X not found" means that someone is doing sanlock_inquire(X) to get information about leases held by pid X.  sanlock prints this error when it does not find any record of pid X.  This may or may not be ok; in bug 1088034 it was not a problem.

I assume that the caller of sanlock_inquire() is still libvirt.  What's surprising to me is that sanlock_inquire() is called on the migration destination.  The method for migration with sanlock involves sanlock_inquire() being called on the source to collect lease information, passing this lease information to the destination through libvirt during migration, then reacquiring the leases on the destination using sanlock_acquire(LVER) before the new pid runs qemu.

Perhaps the libvirt folks could check if/why sanlock_inquire() is being run on the migration destination.

This is all fairly theoretical, and something of a waste of time, because RHEV has never actually used sanlock leases for vms through libvirt.  So, libvirt is performing many of the sanlock steps that we're debugging here, even though they do nothing.  Perhaps there is a way to disable all the sanlock related stuff until it's actually tested and used, so we're not wasting our time.  When RHEV does start using sanlock leases for vms, and we first test this, I expect we'll find actual bugs and things to fix in this area.

Comment 7 Jiri Denemark 2015-03-06 09:48:32 UTC

So it seems this could actually be related to bug 1088034. We fixed the bug by calling sanlock_inquire to check whether a domain is registered with sanlock or not. So in case we do this during incoming migration (which would be pretty stupid to be honest), we'd see such errors in the logs... I'll look at it.

Comment 8 Jiri Denemark 2015-03-06 14:26:02 UTC

Confirmed and reproduced with upstream. Once sanlock locking driver is enabled in qemu.conf, everytime libvirt starts a new domain sanlock logs "cmd 9 target pid 2135544 not found". The reason is what I described in comment 7. We check whether a process is registered with sanlock even when we have just started it.

Comment 9 David Teigland 2015-03-06 15:23:23 UTC

> We check whether a process is registered with sanlock even when
> we have just started it.

Could you give me a pointer to that code so I can check what sequence of sanlock call libvirt is running?  I'd like to see if you're using the sanlock interface as expected, or if there could be a better alternative.

Comment 10 Jiri Denemark 2015-03-06 15:48:30 UTC

The code was added as a fix to bug 1088034 in patch https://www.redhat.com/archives/libvir-list/2014-May/msg00347.html

The salock_inquire call which results in the error messages in sanlock.log is at http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/locking/lock_driver_sanlock.c;h=60f305c43bbba94834f430aa6365f884d50b99a1;hb=HEAD#l501 in virLockManagerSanlockNew, which is called whenever we initialize our lock manager structure for a domain. I made a patch that avoids this call when we have just started the domain, because we know it is not registered. I will send the patch upstream for review soon, but I'll happily rework it if you suggest a better way of checking whether a domain is registered with sanlockd or not.

Comment 11 Jiri Denemark 2015-03-06 15:58:55 UTC

Patch proposed upstream: https://www.redhat.com/archives/libvir-list/2015-March/msg00337.html

Comment 12 David Teigland 2015-03-06 17:01:56 UTC

Thanks, that looks like a good solution.  I see that sanlock_inquire() is the only good way that sanlock provides to check if a pid is registered.  It wasn't envisioned to be used for that, which is why sanlock logs the "not found" error.  I'll also remove that error message from sanlock since we clearly need a legitimate way to check for a pid, and inquire seems like a reasonable way to do it.

Comment 13 Jiri Denemark 2015-03-09 12:41:11 UTC

Fixed upstream by v1.2.13-88-g54972be:

commit 54972be8430c709a338c0716ebc48b02be25dcb7
Author: Jiri Denemark <jdenemar>
Date:   Fri Mar 6 15:58:55 2015 +0100

    sanlock: Don't spam logs with "target pid not found"
    
    Commit v1.2.4-52-gda879e5 fixed issues with domains started before
    sanlock driver was enabled by checking whether a running domain is
    registered with sanlock and if it's not, sanlock driver is basically
    ignored for the domain.
    
    However, it was checking this even for domain which has just been
    started and no sanlock_* API was called for them yet. This results in
    
        cmd 9 target pid 2135544 not found
    
    error messages to appear in sanlock.log whenever we start a new domain.
    
    This patch avoids this useless check for freshly started domains.
    
    Signed-off-by: Jiri Denemark <jdenemar>

Comment 16 zhenfeng wang 2015-03-25 09:44:12 UTC

Hi Jiri
I could reproduce this bug with the libvirt-50, however, met another issue that the vm will fail to start while update the libvirt from libvirt-50 to libvirt-51, BTW, met the same issue while update the libvirt from libvirt49 to libvirt-50, so could you help me check it ?  thanks

steps
1.Configure the nfs server
[nfs_server]# cat /etc/exports 
/export  *(rw,no_root_squash)

[nfs_server]# ll /export -Zd
drwxr-xr-x. root root system_u:object_r:default_t:s0   /export
[nfs_server]# ll /export -Z
-rw-------. root root unconfined_u:object_r:default_t:s0 rhel67.img

2.Mount the nfs on both server and client
[nfs_client]#mount $nfs_serverip:/export /var/lib/libvirt/sanlock -o vers=3

3.Configure the sanlock env in the both the source and target host
[nfs_client]#setsebool virt_use_sanlock 1
[nfs_client]#setsebool virt_use_nfs 1

[nfs_client]# service libvirtd stop

[nfs_client]# augtool
augtool> set /files/etc/libvirt/qemu.conf/lock_manager "sanlock"

Note: when in the second HOST host_id  =2 

augtool> set /files/etc/libvirt/qemu-sanlock.conf/host_id 1   
                     
augtool> set /files/etc/libvirt/qemu-sanlock.conf/auto_disk_leases 1       
    
Note:  disk_lease_dir  == nfs  mount target
augtool> set /files/etc/libvirt/qemu-sanlock.conf/disk_lease_dir
"/var/lib/libvirt/sanlock"
augtool> save
Saved 1 file(s)
augtool> quit

[nfs_client]#service sanlock start
[nfs_client]#service libvirtd start

4.Re-check the nfs share directory
[nfs_server]# ll /export/ -dZ
drwxr-xr-x. root root unconfined_u:object_r:default_t:s0 /export/

[nfs_server]# ll /export/ -Z
-rw-------. root root system_u:object_r:default_t:s0   __LIBVIRT__DISKS__
-rw-------. root root unconfined_u:object_r:default_t:s0 rhel67.img

5.Start the guest on the nfs client, then recheck the nfs share directory in the nfs server

[nfs_client]# virsh start virt-tests-vm1
Domain virt-tests-vm1 started

[nfs_server]#ll /export/ -Z
-rw-------. root root system_u:object_r:default_t:s0   cde96501e7938c78acee7b2139cd05c7
-rw-------. root root system_u:object_r:default_t:s0   __LIBVIRT__DISKS__
-rw-------. qemu qemu unconfined_u:object_r:default_t:s0 rhel67.img

[nfs_server]# ll /export/ -Zd
drwxr-xr-x. root root unconfined_u:object_r:default_t:s0 /export/

6.Destroy the guest, then recheck the nfs share directory in the nfs server
[nfs_client]# virsh destroy virt-tests-vm1
Domain virt-tests-vm1 destroyed

[nfs_server]# ll /export -dZ
drwxr-xr-x. root root system_u:object_r:default_t:s0   /taa
[nfs_server# ll /export -Z
-rw-------. root root system_u:object_r:default_t:s0   cde96501e7938c78acee7b2139cd05c7
-rw-------. root root system_u:object_r:default_t:s0   __LIBVIRT__DISKS__
-rw-------. qemu qemu unconfined_u:object_r:default_t:s0 rhel67.img



7.Upate the libvirt to libvirt-51, then re-check the share directory in nfs server, the group of the nfs share directory have changed to sanlock

[nfs_client]#yum update libvirt-51

[nfs_client]#service libvirtd restart

[nfs_server]# ll /export/ -dZ
            drwxrwx---. root sanlock unconfined_u:object_r:default_t:s0 /export/

[nfs_server]# ll /export/ -Z
           -rw-------. root root system_u:object_r:default_t:s0   cde96501e7938c78acee7b2139cd05c7
           -rw-------. root root system_u:object_r:default_t:s0   __LIBVIRT__DISKS__
           -rw-------. root root unconfined_u:object_r:default_t:s0 rhel67.img

8.start the guest, the guest fail to start

[nfs_client]# virsh start virt-tests-vm1
error: Failed to start domain virt-tests-vm1
error: internal error process exited while connecting to monitor: char device redirected to /dev/pts/1
2015-03-25T08:58:00.357760Z qemu-kvm: -drive file=/var/lib/libvirt/sanlock/rhel67.img,if=none,id=drive-virtio-disk0,format=raw,cache=none: could not open disk image /var/lib/libvirt/sanlock/rhel67.img: Permission denied

Comment 17 zhenfeng wang 2015-03-30 09:23:48 UTC

Hi jiri
The original issue for this bug that errors in /var/log/messages and /var/log/sanlock.log has been fixed in libvirt-0.10.2-51.el6 and the issue mentioned in comment 16 also could hit in the previous libvirt packet, so it may not new issue, so could we verify this bug first and track the comment16's issue with seperate bug if it does a bug? thanks

Comment 18 Jiri Denemark 2015-03-30 14:45:13 UTC

You shouldn't use disk_lease_dir for storing disk image data.

Comment 19 zhenfeng wang 2015-04-01 09:00:06 UTC

Thanks jiri's response, currently i could start the guest successfully even upgrade the libvirt packet after change the disk image data from disk_lease_dir to other directorys, so mark this bug verified

Comment 21 errata-xmlrpc 2015-07-22 05:48:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1252.html