Bug 1285368
| Summary: | Running 'virt-sysprep' in parallel on EL7 fails | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Barak Korren <bkorren> |
| Component: | libvirt | Assignee: | Ján Tomko <jtomko> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.1 | CC: | bkorren, dyuan, huzhan, inetkach, jsuchane, pcfe, ptoscano, rbalakri, rjones, zhwang |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-01-07 14:14:16 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 910269 | ||
Can you see if running this command as *NON*-root in parallel also fails: virsh list For example, you could test this by doing: for f in `seq 1 100`; do virsh list & done If any background command fails, that's a libvirt error. Also, to disable libvirt and get stuff done, you can do: export LIBGUESTFS_BACKEND=direct (although of course this is a workaround - we need to fix the bug in libvirt). Also, which version of libvirt is this? We worked around for now by making our scripts not run in parallel. That slows down things obviously... Trying this: for f in `seq 1 100`; do virsh -c 'qemu:///system' list & done None of the commands failed., maybe bash is not invoking them fast enough? Everything described above was done as non-root. Libvirt is: libvirt-1.2.8-16.el7_1.5.x86_64 This bug sure looks a lot like: https://bugzilla.redhat.com/show_bug.cgi?id=1138604 https://bugzilla.redhat.com/show_bug.cgi?id=927369 However your version of libvirt is supposed to include a fix. Is there a $HOME directory for this user? Is the home directory writable? Does /run/user/24044 get created? How about /run/user/24044/libvirt? I'm not clear about how /run/user/... is created. I think systemd is supposed to create it, but if you're not logging in (ie. it's not an ordinary user) then it won't be created, and libvirt will just not work in that case. 24044 is a perfectly normal user with a writeable $HOME (Its just my own UID). /run/user/24044/libvirt exists. As this bug doesn't seem to be reproducible outside virt-sysprep,
I suggest enabling libvirtd debugging and trying to collect the
libvirtd logs during a failure.
How to enable libvirt{,d} debugging:
* This is all as NON-root *
killall libvirtd
cd ~/.config/libvirt
cat > libvirtd.conf <<'EOF'
log_level=1
log_outputs="1:file:/tmp/libvirtd.log"
EOF
export LIBVIRT_DEBUG=1
Run your commands until you see the failure.
Attach the output of the commands AND the contents of /tmp/libvirtd.log
to this bug.
Jano, please have a look. Erik won't be available for a while. Thanks. I was able to reproduce the bug with
libvirt-daemon-1.2.8-16.el7.x86_64.rpm
and I no longer get the error with the RHEL 7.2 version:
libvirt-daemon-1.2.17-13.el7.x86_64
This has been fixed upstream by:
commit be78814ae07f092d9c4e71fd82dd1947aba2f029
Author: Michal Privoznik <mprivozn>
CommitDate: 2015-04-15 13:39:13 +0200
virNetSocketNewConnectUNIX: Use flocks when spawning a daemon
https://bugzilla.redhat.com/show_bug.cgi?id=1200149
Even though we have a mutex mechanism so that two clients don't spawn
two daemons, it's not strong enough. It can happen that while one
client is spawning the daemon, the other one fails to connect.
Basically two possible errors can happen:
error: Failed to connect socket to '/home/mprivozn/.cache/libvirt/libvirt-sock': Connection refused
or:
error: Failed to connect socket to '/home/mprivozn/.cache/libvirt/libvirt-sock': No such file or directory
The problem in both cases is, the daemon is only starting up, while we
are trying to connect (and fail). We should postpone the connecting
phase until the daemon is started (by the other thread that is
spawning it). In order to do that, create a file lock 'libvirt-lock'
in the directory where session daemon would create its socket. So even
when called from multiple processes, spawning a daemon will serialize
on the file lock. So only the first to come will spawn the daemon.
Tested-by: Richard W. M. Jones <rjones>
Signed-off-by: Michal Privoznik <mprivozn>
git describe: v1.2.14-174-gbe78814 contains: v1.2.15-rc1~165
Could reproduce the bug with: libvirt-daemon-1.2.8-16.el7.x86_64.rpm Not reproduced with the RHEL 7.2 version: libvirt-daemon-1.2.17-13.el7.x86_64 Verify steps: 1. $ killall libvirtd ; for i in `seq 1 10`; do libguestfs-test-tool >./log$i 2>&1 & done [1] Done libguestfs-test-tool > ./log$i 2>&1 [2] Done libguestfs-test-tool > ./log$i 2>&1 [3] Done libguestfs-test-tool > ./log$i 2>&1 [4] Done libguestfs-test-tool > ./log$i 2>&1 [5] Done libguestfs-test-tool > ./log$i 2>&1 [6] Done libguestfs-test-tool > ./log$i 2>&1 [7] Done libguestfs-test-tool > ./log$i 2>&1 [8] Done libguestfs-test-tool > ./log$i 2>&1 [9]- Done libguestfs-test-tool > ./log$i 2>&1 [10]+ Done libguestfs-test-tool > ./log$i 2>&1 2. Try several times, no errors return. |
Description of problem: When running 'virt-sysprep' in parallel to set-up two different VMs, one of the commands fails with the following error while the other succeeds: libvirt: XML-RPC error : Failed to connect socket to '/run/user/24044/libvirt/libvirt-sock': No such file or directory virt-sysprep: error: libguestfs error: could not connect to libvirt (URI = qemu:///session): Failed to connect socket to '/run/user/24044/libvirt/libvirt-sock': No such file or directory [code=38 domain=7] If reporting bugs, run virt-sysprep with debugging enabled and include the complete output: virt-sysprep -v -x [...] Version-Release number of selected component (if applicable): libguestfs-tools-c-1.28.1-1.18.el7.x86_64 Steps to Reproduce: 1. Create two VM images with something like the following commands: qemu-img create -f qcow2 -b /path/to/base/qcow2/image /path/to/your/vm1.qcow2 qemu-img create -f qcow2 -b /path/to/base/qcow2/image /path/to/your/vm2.qcow2 2. Attempt to run the following two commands in parallel: virt-sysprep --connect 'qemu:///system' -a '/path/to/your/vm1.qcow2', --selinux-relabel --hostname vm1 \ --root-password 'password:123456 --mkdir '/root/.ssh' --chmod '0700:/root/.ssh' \ --upload '/path/to/id_rsa.pub:/root/.ssh/authorized_keys' \ --run-command 'chown root.root /root/.ssh/authorized_keys' \ --mkdir '/etc/iscsi' --chmod '0755:/etc/iscsi' \ --write '/etc/iscsi/initiatorname.iscsi:InitiatorName=iqn.2014-07.org.lago:vm1' \ --mkdir '/etc/selinux' --chmod '0755:/etc/selinux' \ --write '/etc/selinux/config:SELINUX=enforcing\nSELINUXTYPE=targeted\n' \ --mkdir '/etc/sysconfig/network-scripts' --chmod '0755:/etc/sysconfig/network-scripts' \ --write', '/etc/sysconfig/network-scripts/ifcfg-eth0:HWADDR="54:52:c0:a8:c9:02"\nBOOTPROTO="dhcp"\nTYPE="Ethernet"\nONBOOT="yes"\nNAME="eth0"' virt-sysprep --connect 'qemu:///system' -a '/path/to/your/vm2.qcow2', --selinux-relabel --hostname vm2 \ --root-password 'password:123456 --mkdir '/root/.ssh' --chmod '0700:/root/.ssh' \ --upload '/path/to/id_rsa.pub:/root/.ssh/authorized_keys' \ --run-command 'chown root.root /root/.ssh/authorized_keys' \ --mkdir '/etc/iscsi' --chmod '0755:/etc/iscsi' \ --write '/etc/iscsi/initiatorname.iscsi:InitiatorName=iqn.2014-07.org.lago:vm2' \ --mkdir '/etc/selinux' --chmod '0755:/etc/selinux' \ --write '/etc/selinux/config:SELINUX=enforcing\nSELINUXTYPE=targeted\n' \ --mkdir '/etc/sysconfig/network-scripts' --chmod '0755:/etc/sysconfig/network-scripts' \ --write', '/etc/sysconfig/network-scripts/ifcfg-eth0:HWADDR="54:52:c0:a8:c9:02"\nBOOTPROTO="dhcp"\nTYPE="Ethernet"\nONBOOT="yes"\nNAME="eth0"' Actual results: On of the virt-sysprep commands fails with the error pasted above Expected results: Both commands should succeed as they do when run serially.