Bug 1200149

Summary: Race starting multiple libvirtd user sessions at the same time
Product: [Fedora] Fedora Reporter: Richard W.M. Jones <rjones>
Component: libvirtAssignee: Michal Privoznik <mprivozn>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 22CC: agedosier, berrange, clalancette, crobinso, dyuan, itamar, jforbes, kchamart, laine, libvirt-maint, mkletzan, mprivozn, rjones, veillard, virt-maint, zhwang
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-1.2.13-3.fc22 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1208176 (view as bug list) Environment:
Last Closed: 2015-04-22 22:58:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 910269, 1208176    

Description Richard W.M. Jones 2015-03-09 20:42:19 UTC
Description of problem:

As NON-root:

killall libvirtd ; for i in `seq 1 5`; do libguestfs-test-tool >/tmp/log$i 2>&1 & done

Wait a few seconds, then hit return and look for processes
which exit with an error.  Sometimes there won't be any, but
sometimes:

[1]   Exit 1                  libguestfs-test-tool > /tmp/log$i 2>&1
[2]   Done                    libguestfs-test-tool > /tmp/log$i 2>&1
[3]   Done                    libguestfs-test-tool > /tmp/log$i 2>&1
[4]-  Done                    libguestfs-test-tool > /tmp/log$i 2>&1
[5]+  Done                    libguestfs-test-tool > /tmp/log$i 2>&1

In the cases where you see process(es) exiting with an error, examine
the /tmp/log* files.  In one of them you should see a failure
like this:

libguestfs: opening libvirt handle: URI = qemu:///session, auth = default+wrapper, flags = 0
libvirt: XML-RPC error : Failed to connect socket to '/run/user/1000/libvirt/libvirt-sock': No such file or directory
libguestfs: error: could not connect to libvirt (URI = qemu:///session): Failed to connect socket to '/run/user/1000/libvirt/libvirt-sock': No such file or directory [code=38 domain=7]

This error message is consistent for me, ie. there are not
multiple causes.

Version-Release number of selected component (if applicable):

libvirt-1.2.13-1.fc23.x86_64
(Also happened with libvirt from F22)

How reproducible:

Not 100%, but quite often.

Steps to Reproduce:
1. See description above.

Comment 1 Richard W.M. Jones 2015-03-09 20:44:30 UTC
Note this doesn't depend on libguestfs.  The following also fails
(NB: NON-root):

killall libvirtd ; for i in `seq 1 5`; do virsh list >/tmp/log$i 2>&1 & done

Comment 2 Richard W.M. Jones 2015-04-01 11:20:32 UTC
Still happens in:

libvirt-1.2.13-2.fc23.x86_64

Comment 3 Kashyap Chamarthy 2015-04-01 13:51:58 UTC
I can reproduce the error, with the below versions (same as Rich):

  $ uname -r; rpm -q libvirt-daemon-kvm qemu-system-x86
  4.0.0-0.rc5.git4.1.fc22.x86_64
  libvirt-daemon-kvm-1.2.13-2.fc22.x86_64
  qemu-system-x86-2.3.0-0.2.rc1.fc22.x86_64


As noted by Rich earlier, test was done as NON-root:

$ id -u -n
kashyapc
$ killall libvirtd ; for i in `seq 1 5`; do virsh list >/tmp/log$i 2>&1 & done
[. . .]

Hit enter:

$
[1]   Exit 1                  virsh list > /tmp/log$i 2>&1
[2]   Done                    virsh list > /tmp/log$i 2>&1
[3]   Exit 1                  virsh list > /tmp/log$i 2>&1
[4]-  Exit 1                  virsh list > /tmp/log$i 2>&1
[5]+  Exit 1                  virsh list > /tmp/log$i 2>&1
[kashyapc@foo ~]$ 


`grep` the logs:

$ grep Fail /tmp/log*
/tmp/log1:error: Failed to connect socket to '/home/kashyapc/.cache/libvirt/libvirt-sock': No such file or directory
/tmp/log3:error: Failed to connect socket to '/home/kashyapc/.cache/libvirt/libvirt-sock': No such file or directory
/tmp/log4:error: Failed to connect socket to '/home/kashyapc/.cache/libvirt/libvirt-sock': No such file or directory
/tmp/log5:error: Failed to connect socket to '/home/kashyapc/.cache/libvirt/libvirt-sock': No such file or directory

Comment 4 Martin Kletzander 2015-04-01 14:51:32 UTC
(In reply to Kashyap Chamarthy from comment #3)
You have certainly not done this as root, so you haven't reproduced the original problem.

Having said that, I'm not sure what the problem is, I can't find it from the description.  I'm running the following with no output from "diff":

j=50; killall libvirtd; for i in {1..$j}; do virsh list >/tmp/log$i 2>&1 & done; wait $!; for i in {2..$j}; do diff /tmp/log$((i-1)) /tmp/log$i; done

Changing $j is not doing anything.  All processes exit with "1", no changes are there wither.  Could you please describe what should happen and what happens instead?  Sorry if I just misunderstood the description.

Comment 5 Richard W.M. Jones 2015-04-01 14:55:42 UTC
(In reply to Martin Kletzander from comment #4)
> (In reply to Kashyap Chamarthy from comment #3)
> You have certainly not done this as root, so you haven't reproduced the
> original problem.
> 
> Having said that, I'm not sure what the problem is, I can't find it from the
> description.  I'm running the following with no output from "diff":
> 
> j=50; killall libvirtd; for i in {1..$j}; do virsh list >/tmp/log$i 2>&1 &
> done; wait $!; for i in {2..$j}; do diff /tmp/log$((i-1)) /tmp/log$i; done
> 
> Changing $j is not doing anything.  All processes exit with "1", no changes
> are there wither.  Could you please describe what should happen and what
> happens instead?  Sorry if I just misunderstood the description.

Get back to the simple test case:

killall libvirtd ; for i in `seq 1 5`; do virsh list >/tmp/log$i 2>&1 & done

If you hit [Return] you'll notice that (sometimes) one of more
of the virsh commands fails (Exit 1):

[1]   Done                    virsh list > /tmp/log$i 2>&1
[2]   Exit 1                  virsh list > /tmp/log$i 2>&1
[3]   Done                    virsh list > /tmp/log$i 2>&1
[4]-  Done                    virsh list > /tmp/log$i 2>&1
[5]+  Done                    virsh list > /tmp/log$i 2>&1

Then look at the log files:

$ cat /tmp/log?
 Id    Name                           State
----------------------------------------------------

error: failed to connect to the hypervisor
error: no valid connection
error: Failed to connect socket to '/run/user/1000/libvirt/libvirt-sock': No such file or directory

 Id    Name                           State
----------------------------------------------------

 Id    Name                           State
----------------------------------------------------

 Id    Name                           State
----------------------------------------------------

Comment 6 Laine Stump 2015-04-01 14:58:23 UTC
(In reply to Martin Kletzander from comment #4)
> (In reply to Kashyap Chamarthy from comment #3)
> You have certainly not done this as root, so you haven't reproduced the
> original problem.

The whole point of the BZ is that this happens when running as non-root (which is how libguestfs uses libvirt)

Comment 7 Martin Kletzander 2015-04-01 14:59:41 UTC
Sorry to both of you, I am the wrong here, I read NON-root as it was "YES-root".  I'll continue having a look at it...

Comment 8 Michal Privoznik 2015-04-02 13:08:25 UTC
Patch proposed upstream:

https://www.redhat.com/archives/libvir-list/2015-April/msg00107.html

Comment 10 Michal Privoznik 2015-04-15 11:48:17 UTC
I've just pushed the patch upstream:

commit be78814ae07f092d9c4e71fd82dd1947aba2f029
Author:     Michal Privoznik <mprivozn>
AuthorDate: Thu Apr 2 14:41:17 2015 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Wed Apr 15 13:39:13 2015 +0200

    virNetSocketNewConnectUNIX: Use flocks when spawning a daemon
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1200149
    
    Even though we have a mutex mechanism so that two clients don't spawn
    two daemons, it's not strong enough. It can happen that while one
    client is spawning the daemon, the other one fails to connect.
    Basically two possible errors can happen:
    
      error: Failed to connect socket to '/home/mprivozn/.cache/libvirt/libvirt-sock': Connection refused
    
    or:
    
      error: Failed to connect socket to '/home/mprivozn/.cache/libvirt/libvirt-sock': No such file or directory
    
    The problem in both cases is, the daemon is only starting up, while we
    are trying to connect (and fail). We should postpone the connecting
    phase until the daemon is started (by the other thread that is
    spawning it). In order to do that, create a file lock 'libvirt-lock'
    in the directory where session daemon would create its socket. So even
    when called from multiple processes, spawning a daemon will serialize
    on the file lock. So only the first to come will spawn the daemon.
    
    Tested-by: Richard W. M. Jones <rjones>
    Signed-off-by: Michal Privoznik <mprivozn>


v1.2.14-174-gbe78814

Comment 11 Cole Robinson 2015-04-15 13:47:38 UTC
Reopening to track backporting this to f22

Comment 12 Fedora Update System 2015-04-15 19:11:25 UTC
libvirt-1.2.13-3.fc22 has been submitted as an update for Fedora 22.
https://admin.fedoraproject.org/updates/libvirt-1.2.13-3.fc22

Comment 13 Fedora Update System 2015-04-17 18:37:14 UTC
Package libvirt-1.2.13-3.fc22:
* should fix your issue,
* was pushed to the Fedora 22 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing libvirt-1.2.13-3.fc22'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2015-6245/libvirt-1.2.13-3.fc22
then log in and leave karma (feedback).

Comment 14 Fedora Update System 2015-04-22 22:58:03 UTC
libvirt-1.2.13-3.fc22 has been pushed to the Fedora 22 stable repository.  If problems still persist, please make note of it in this bug report.