Bug 1285368

Summary: Running 'virt-sysprep' in parallel on EL7 fails
Product: Red Hat Enterprise Linux 7 Reporter: Barak Korren <bkorren>
Component: libvirtAssignee: Ján Tomko <jtomko>
Status: CLOSED CURRENTRELEASE QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.1CC: bkorren, dyuan, huzhan, inetkach, jsuchane, pcfe, ptoscano, rbalakri, rjones, zhwang
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-01-07 14:14:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 910269    

Description Barak Korren 2015-11-25 13:28:56 UTC
Description of problem:
When running 'virt-sysprep' in parallel to set-up two different VMs, one of the commands fails with the following error while the other succeeds:

libvirt: XML-RPC error : Failed to connect socket to '/run/user/24044/libvirt/libvirt-sock': No such file or directory
virt-sysprep: error: libguestfs error: could not connect to libvirt (URI = qemu:///session): Failed to connect socket to '/run/user/24044/libvirt/libvirt-sock': No such file or directory [code=38 domain=7]

If reporting bugs, run virt-sysprep with debugging enabled and include the 
complete output:

  virt-sysprep -v -x [...]


Version-Release number of selected component (if applicable):
libguestfs-tools-c-1.28.1-1.18.el7.x86_64


Steps to Reproduce:
1. Create two VM images with something like the following commands:
    qemu-img create -f qcow2 -b /path/to/base/qcow2/image /path/to/your/vm1.qcow2
    qemu-img create -f qcow2 -b /path/to/base/qcow2/image /path/to/your/vm2.qcow2
2. Attempt to run the following two commands in parallel:
    virt-sysprep --connect 'qemu:///system' -a '/path/to/your/vm1.qcow2', --selinux-relabel --hostname vm1 \
      --root-password 'password:123456 --mkdir '/root/.ssh' --chmod '0700:/root/.ssh' \
      --upload '/path/to/id_rsa.pub:/root/.ssh/authorized_keys' \
      --run-command 'chown root.root /root/.ssh/authorized_keys' \
      --mkdir '/etc/iscsi' --chmod '0755:/etc/iscsi' \
      --write '/etc/iscsi/initiatorname.iscsi:InitiatorName=iqn.2014-07.org.lago:vm1' \
      --mkdir '/etc/selinux' --chmod '0755:/etc/selinux' \
      --write '/etc/selinux/config:SELINUX=enforcing\nSELINUXTYPE=targeted\n' \
      --mkdir '/etc/sysconfig/network-scripts' --chmod '0755:/etc/sysconfig/network-scripts' \
      --write', '/etc/sysconfig/network-scripts/ifcfg-eth0:HWADDR="54:52:c0:a8:c9:02"\nBOOTPROTO="dhcp"\nTYPE="Ethernet"\nONBOOT="yes"\nNAME="eth0"'
    virt-sysprep --connect 'qemu:///system' -a '/path/to/your/vm2.qcow2', --selinux-relabel --hostname vm2 \
      --root-password 'password:123456 --mkdir '/root/.ssh' --chmod '0700:/root/.ssh' \
      --upload '/path/to/id_rsa.pub:/root/.ssh/authorized_keys' \
      --run-command 'chown root.root /root/.ssh/authorized_keys' \
      --mkdir '/etc/iscsi' --chmod '0755:/etc/iscsi' \
      --write '/etc/iscsi/initiatorname.iscsi:InitiatorName=iqn.2014-07.org.lago:vm2' \
      --mkdir '/etc/selinux' --chmod '0755:/etc/selinux' \
      --write '/etc/selinux/config:SELINUX=enforcing\nSELINUXTYPE=targeted\n' \
      --mkdir '/etc/sysconfig/network-scripts' --chmod '0755:/etc/sysconfig/network-scripts' \
      --write', '/etc/sysconfig/network-scripts/ifcfg-eth0:HWADDR="54:52:c0:a8:c9:02"\nBOOTPROTO="dhcp"\nTYPE="Ethernet"\nONBOOT="yes"\nNAME="eth0"'


Actual results:
On of the virt-sysprep commands fails with the error pasted above

Expected results:
Both commands should succeed as they do when run serially.

Comment 1 Richard W.M. Jones 2015-11-25 13:33:00 UTC
Can you see if running this command as *NON*-root
in parallel also fails:

  virsh list

For example, you could test this by doing:

  for f in `seq 1 100`; do virsh list & done

If any background command fails, that's a libvirt error.

Comment 2 Richard W.M. Jones 2015-11-25 13:34:38 UTC
Also, to disable libvirt and get stuff done, you can do:

  export LIBGUESTFS_BACKEND=direct

(although of course this is a workaround - we need to fix the
bug in libvirt).

Also, which version of libvirt is this?

Comment 3 Barak Korren 2015-11-25 14:11:56 UTC
We worked around for now by making our scripts not run in parallel. That slows down things obviously...

Trying this:

  for f in `seq 1 100`; do virsh -c 'qemu:///system' list & done

None of the commands failed., maybe bash is not invoking them fast enough?
Everything described above was done as non-root.

Libvirt is:
  libvirt-1.2.8-16.el7_1.5.x86_64

Comment 4 Richard W.M. Jones 2015-11-25 14:42:07 UTC
This bug sure looks a lot like:
https://bugzilla.redhat.com/show_bug.cgi?id=1138604
https://bugzilla.redhat.com/show_bug.cgi?id=927369
However your version of libvirt is supposed to include a fix.

Is there a $HOME directory for this user?

Is the home directory writable?

Does /run/user/24044 get created?  How about
/run/user/24044/libvirt?  I'm not clear about how /run/user/...
is created.  I think systemd is supposed to create it, but if
you're not logging in (ie. it's not an ordinary user) then it
won't be created, and libvirt will just not work in that case.

Comment 5 Barak Korren 2015-11-25 14:52:52 UTC
24044 is a perfectly normal user with a writeable $HOME (Its just my own UID). /run/user/24044/libvirt exists.

Comment 6 Richard W.M. Jones 2015-12-02 10:41:46 UTC
As this bug doesn't seem to be reproducible outside virt-sysprep,
I suggest enabling libvirtd debugging and trying to collect the
libvirtd logs during a failure.

How to enable libvirt{,d} debugging:

* This is all as NON-root *

killall libvirtd

cd ~/.config/libvirt

cat > libvirtd.conf <<'EOF'
log_level=1
log_outputs="1:file:/tmp/libvirtd.log"
EOF

export LIBVIRT_DEBUG=1

Run your commands until you see the failure.

Attach the output of the commands AND the contents of /tmp/libvirtd.log
to this bug.

Comment 9 Jaroslav Suchanek 2016-01-04 13:00:22 UTC
Jano, please have a look. Erik won't be available for a while. Thanks.

Comment 10 Ján Tomko 2016-01-04 13:38:21 UTC
I was able to reproduce the bug with
libvirt-daemon-1.2.8-16.el7.x86_64.rpm
and I no longer get the error with the RHEL 7.2 version:
libvirt-daemon-1.2.17-13.el7.x86_64

This has been fixed upstream by:
commit be78814ae07f092d9c4e71fd82dd1947aba2f029
Author:     Michal Privoznik <mprivozn>
CommitDate: 2015-04-15 13:39:13 +0200

    virNetSocketNewConnectUNIX: Use flocks when spawning a daemon
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1200149
    
    Even though we have a mutex mechanism so that two clients don't spawn
    two daemons, it's not strong enough. It can happen that while one
    client is spawning the daemon, the other one fails to connect.
    Basically two possible errors can happen:
    
      error: Failed to connect socket to '/home/mprivozn/.cache/libvirt/libvirt-sock': Connection refused
    
    or:
    
      error: Failed to connect socket to '/home/mprivozn/.cache/libvirt/libvirt-sock': No such file or directory
    
    The problem in both cases is, the daemon is only starting up, while we
    are trying to connect (and fail). We should postpone the connecting
    phase until the daemon is started (by the other thread that is
    spawning it). In order to do that, create a file lock 'libvirt-lock'
    in the directory where session daemon would create its socket. So even
    when called from multiple processes, spawning a daemon will serialize
    on the file lock. So only the first to come will spawn the daemon.
    
    Tested-by: Richard W. M. Jones <rjones>
    Signed-off-by: Michal Privoznik <mprivozn>

git describe: v1.2.14-174-gbe78814 contains: v1.2.15-rc1~165

Comment 11 Hu Zhang 2016-01-07 07:34:27 UTC
Could reproduce the bug with:
libvirt-daemon-1.2.8-16.el7.x86_64.rpm
Not reproduced with the RHEL 7.2 version:
libvirt-daemon-1.2.17-13.el7.x86_64

Verify steps:
1. $ killall libvirtd ; for i in `seq 1 10`; do libguestfs-test-tool >./log$i 2>&1 & done
[1]   Done                    libguestfs-test-tool > ./log$i 2>&1
[2]   Done                    libguestfs-test-tool > ./log$i 2>&1
[3]   Done                    libguestfs-test-tool > ./log$i 2>&1
[4]   Done                    libguestfs-test-tool > ./log$i 2>&1
[5]   Done                    libguestfs-test-tool > ./log$i 2>&1
[6]   Done                    libguestfs-test-tool > ./log$i 2>&1
[7]   Done                    libguestfs-test-tool > ./log$i 2>&1
[8]   Done                    libguestfs-test-tool > ./log$i 2>&1
[9]-  Done                    libguestfs-test-tool > ./log$i 2>&1
[10]+  Done                    libguestfs-test-tool > ./log$i 2>&1

2. Try several times, no errors return.