Bug 1285368
Summary: | Running 'virt-sysprep' in parallel on EL7 fails | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Barak Korren <bkorren> |
Component: | libvirt | Assignee: | Ján Tomko <jtomko> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.1 | CC: | bkorren, dyuan, huzhan, inetkach, jsuchane, pcfe, ptoscano, rbalakri, rjones, zhwang |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-01-07 14:14:16 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 910269 |
Description
Barak Korren
2015-11-25 13:28:56 UTC
Can you see if running this command as *NON*-root in parallel also fails: virsh list For example, you could test this by doing: for f in `seq 1 100`; do virsh list & done If any background command fails, that's a libvirt error. Also, to disable libvirt and get stuff done, you can do: export LIBGUESTFS_BACKEND=direct (although of course this is a workaround - we need to fix the bug in libvirt). Also, which version of libvirt is this? We worked around for now by making our scripts not run in parallel. That slows down things obviously... Trying this: for f in `seq 1 100`; do virsh -c 'qemu:///system' list & done None of the commands failed., maybe bash is not invoking them fast enough? Everything described above was done as non-root. Libvirt is: libvirt-1.2.8-16.el7_1.5.x86_64 This bug sure looks a lot like: https://bugzilla.redhat.com/show_bug.cgi?id=1138604 https://bugzilla.redhat.com/show_bug.cgi?id=927369 However your version of libvirt is supposed to include a fix. Is there a $HOME directory for this user? Is the home directory writable? Does /run/user/24044 get created? How about /run/user/24044/libvirt? I'm not clear about how /run/user/... is created. I think systemd is supposed to create it, but if you're not logging in (ie. it's not an ordinary user) then it won't be created, and libvirt will just not work in that case. 24044 is a perfectly normal user with a writeable $HOME (Its just my own UID). /run/user/24044/libvirt exists. As this bug doesn't seem to be reproducible outside virt-sysprep, I suggest enabling libvirtd debugging and trying to collect the libvirtd logs during a failure. How to enable libvirt{,d} debugging: * This is all as NON-root * killall libvirtd cd ~/.config/libvirt cat > libvirtd.conf <<'EOF' log_level=1 log_outputs="1:file:/tmp/libvirtd.log" EOF export LIBVIRT_DEBUG=1 Run your commands until you see the failure. Attach the output of the commands AND the contents of /tmp/libvirtd.log to this bug. Jano, please have a look. Erik won't be available for a while. Thanks. I was able to reproduce the bug with libvirt-daemon-1.2.8-16.el7.x86_64.rpm and I no longer get the error with the RHEL 7.2 version: libvirt-daemon-1.2.17-13.el7.x86_64 This has been fixed upstream by: commit be78814ae07f092d9c4e71fd82dd1947aba2f029 Author: Michal Privoznik <mprivozn> CommitDate: 2015-04-15 13:39:13 +0200 virNetSocketNewConnectUNIX: Use flocks when spawning a daemon https://bugzilla.redhat.com/show_bug.cgi?id=1200149 Even though we have a mutex mechanism so that two clients don't spawn two daemons, it's not strong enough. It can happen that while one client is spawning the daemon, the other one fails to connect. Basically two possible errors can happen: error: Failed to connect socket to '/home/mprivozn/.cache/libvirt/libvirt-sock': Connection refused or: error: Failed to connect socket to '/home/mprivozn/.cache/libvirt/libvirt-sock': No such file or directory The problem in both cases is, the daemon is only starting up, while we are trying to connect (and fail). We should postpone the connecting phase until the daemon is started (by the other thread that is spawning it). In order to do that, create a file lock 'libvirt-lock' in the directory where session daemon would create its socket. So even when called from multiple processes, spawning a daemon will serialize on the file lock. So only the first to come will spawn the daemon. Tested-by: Richard W. M. Jones <rjones> Signed-off-by: Michal Privoznik <mprivozn> git describe: v1.2.14-174-gbe78814 contains: v1.2.15-rc1~165 Could reproduce the bug with: libvirt-daemon-1.2.8-16.el7.x86_64.rpm Not reproduced with the RHEL 7.2 version: libvirt-daemon-1.2.17-13.el7.x86_64 Verify steps: 1. $ killall libvirtd ; for i in `seq 1 10`; do libguestfs-test-tool >./log$i 2>&1 & done [1] Done libguestfs-test-tool > ./log$i 2>&1 [2] Done libguestfs-test-tool > ./log$i 2>&1 [3] Done libguestfs-test-tool > ./log$i 2>&1 [4] Done libguestfs-test-tool > ./log$i 2>&1 [5] Done libguestfs-test-tool > ./log$i 2>&1 [6] Done libguestfs-test-tool > ./log$i 2>&1 [7] Done libguestfs-test-tool > ./log$i 2>&1 [8] Done libguestfs-test-tool > ./log$i 2>&1 [9]- Done libguestfs-test-tool > ./log$i 2>&1 [10]+ Done libguestfs-test-tool > ./log$i 2>&1 2. Try several times, no errors return. |