Bug 1200149
Summary: | Race starting multiple libvirtd user sessions at the same time | |||
---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Richard W.M. Jones <rjones> | |
Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> | |
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | |
Severity: | unspecified | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 22 | CC: | agedosier, berrange, clalancette, crobinso, dyuan, itamar, jforbes, kchamart, laine, libvirt-maint, mkletzan, mprivozn, rjones, veillard, virt-maint, zhwang | |
Target Milestone: | --- | Keywords: | Reopened | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | libvirt-1.2.13-3.fc22 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1208176 (view as bug list) | Environment: | ||
Last Closed: | 2015-04-22 22:58:03 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 910269, 1208176 |
Description
Richard W.M. Jones
2015-03-09 20:42:19 UTC
Note this doesn't depend on libguestfs. The following also fails (NB: NON-root): killall libvirtd ; for i in `seq 1 5`; do virsh list >/tmp/log$i 2>&1 & done Still happens in: libvirt-1.2.13-2.fc23.x86_64 I can reproduce the error, with the below versions (same as Rich): $ uname -r; rpm -q libvirt-daemon-kvm qemu-system-x86 4.0.0-0.rc5.git4.1.fc22.x86_64 libvirt-daemon-kvm-1.2.13-2.fc22.x86_64 qemu-system-x86-2.3.0-0.2.rc1.fc22.x86_64 As noted by Rich earlier, test was done as NON-root: $ id -u -n kashyapc $ killall libvirtd ; for i in `seq 1 5`; do virsh list >/tmp/log$i 2>&1 & done [. . .] Hit enter: $ [1] Exit 1 virsh list > /tmp/log$i 2>&1 [2] Done virsh list > /tmp/log$i 2>&1 [3] Exit 1 virsh list > /tmp/log$i 2>&1 [4]- Exit 1 virsh list > /tmp/log$i 2>&1 [5]+ Exit 1 virsh list > /tmp/log$i 2>&1 [kashyapc@foo ~]$ `grep` the logs: $ grep Fail /tmp/log* /tmp/log1:error: Failed to connect socket to '/home/kashyapc/.cache/libvirt/libvirt-sock': No such file or directory /tmp/log3:error: Failed to connect socket to '/home/kashyapc/.cache/libvirt/libvirt-sock': No such file or directory /tmp/log4:error: Failed to connect socket to '/home/kashyapc/.cache/libvirt/libvirt-sock': No such file or directory /tmp/log5:error: Failed to connect socket to '/home/kashyapc/.cache/libvirt/libvirt-sock': No such file or directory (In reply to Kashyap Chamarthy from comment #3) You have certainly not done this as root, so you haven't reproduced the original problem. Having said that, I'm not sure what the problem is, I can't find it from the description. I'm running the following with no output from "diff": j=50; killall libvirtd; for i in {1..$j}; do virsh list >/tmp/log$i 2>&1 & done; wait $!; for i in {2..$j}; do diff /tmp/log$((i-1)) /tmp/log$i; done Changing $j is not doing anything. All processes exit with "1", no changes are there wither. Could you please describe what should happen and what happens instead? Sorry if I just misunderstood the description. (In reply to Martin Kletzander from comment #4) > (In reply to Kashyap Chamarthy from comment #3) > You have certainly not done this as root, so you haven't reproduced the > original problem. > > Having said that, I'm not sure what the problem is, I can't find it from the > description. I'm running the following with no output from "diff": > > j=50; killall libvirtd; for i in {1..$j}; do virsh list >/tmp/log$i 2>&1 & > done; wait $!; for i in {2..$j}; do diff /tmp/log$((i-1)) /tmp/log$i; done > > Changing $j is not doing anything. All processes exit with "1", no changes > are there wither. Could you please describe what should happen and what > happens instead? Sorry if I just misunderstood the description. Get back to the simple test case: killall libvirtd ; for i in `seq 1 5`; do virsh list >/tmp/log$i 2>&1 & done If you hit [Return] you'll notice that (sometimes) one of more of the virsh commands fails (Exit 1): [1] Done virsh list > /tmp/log$i 2>&1 [2] Exit 1 virsh list > /tmp/log$i 2>&1 [3] Done virsh list > /tmp/log$i 2>&1 [4]- Done virsh list > /tmp/log$i 2>&1 [5]+ Done virsh list > /tmp/log$i 2>&1 Then look at the log files: $ cat /tmp/log? Id Name State ---------------------------------------------------- error: failed to connect to the hypervisor error: no valid connection error: Failed to connect socket to '/run/user/1000/libvirt/libvirt-sock': No such file or directory Id Name State ---------------------------------------------------- Id Name State ---------------------------------------------------- Id Name State ---------------------------------------------------- (In reply to Martin Kletzander from comment #4) > (In reply to Kashyap Chamarthy from comment #3) > You have certainly not done this as root, so you haven't reproduced the > original problem. The whole point of the BZ is that this happens when running as non-root (which is how libguestfs uses libvirt) Sorry to both of you, I am the wrong here, I read NON-root as it was "YES-root". I'll continue having a look at it... Patch proposed upstream: https://www.redhat.com/archives/libvir-list/2015-April/msg00107.html I've just pushed the patch upstream: commit be78814ae07f092d9c4e71fd82dd1947aba2f029 Author: Michal Privoznik <mprivozn> AuthorDate: Thu Apr 2 14:41:17 2015 +0200 Commit: Michal Privoznik <mprivozn> CommitDate: Wed Apr 15 13:39:13 2015 +0200 virNetSocketNewConnectUNIX: Use flocks when spawning a daemon https://bugzilla.redhat.com/show_bug.cgi?id=1200149 Even though we have a mutex mechanism so that two clients don't spawn two daemons, it's not strong enough. It can happen that while one client is spawning the daemon, the other one fails to connect. Basically two possible errors can happen: error: Failed to connect socket to '/home/mprivozn/.cache/libvirt/libvirt-sock': Connection refused or: error: Failed to connect socket to '/home/mprivozn/.cache/libvirt/libvirt-sock': No such file or directory The problem in both cases is, the daemon is only starting up, while we are trying to connect (and fail). We should postpone the connecting phase until the daemon is started (by the other thread that is spawning it). In order to do that, create a file lock 'libvirt-lock' in the directory where session daemon would create its socket. So even when called from multiple processes, spawning a daemon will serialize on the file lock. So only the first to come will spawn the daemon. Tested-by: Richard W. M. Jones <rjones> Signed-off-by: Michal Privoznik <mprivozn> v1.2.14-174-gbe78814 Reopening to track backporting this to f22 libvirt-1.2.13-3.fc22 has been submitted as an update for Fedora 22. https://admin.fedoraproject.org/updates/libvirt-1.2.13-3.fc22 Package libvirt-1.2.13-3.fc22: * should fix your issue, * was pushed to the Fedora 22 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing libvirt-1.2.13-3.fc22' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2015-6245/libvirt-1.2.13-3.fc22 then log in and leave karma (feedback). libvirt-1.2.13-3.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report. |