Bug 1028728 - virCommandProcessIO: hangs on poll() when nfds equals 1
Summary: virCommandProcessIO: hangs on poll() when nfds equals 1
Keywords:
Status: CLOSED DUPLICATE of bug 999765
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libvirt
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Libvirt Maintainers
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-11-10 03:18 UTC by Sipingal Liu
Modified: 2016-04-27 01:26 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-04-10 15:20:39 UTC
Embargoed:


Attachments (Terms of Use)
libvirtd log (853.49 KB, text/x-log)
2013-11-12 00:16 UTC, Sipingal Liu
no flags Details

Description Sipingal Liu 2013-11-10 03:18:49 UTC
Description of problem:

  libvirtd hangs when determinate the qemu capabilities 

Version-Release number of selected component (if applicable):
   version greater than 1.0.1, current version is 1.1.4

How reproducible:
run libvirtd

Steps to Reproduce:
1. build libvirtd from source code

./configure --prefix=/usr --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --libdir=/usr/lib64 --disable-silent-rules --disable-dependency-tracking --with-libvirtd --with-avahi --without-xen --without-xen-inotify --without-xenapi --without-libxl --without-openvz --with-lxc --without-vbox --with-uml --with-qemu --with-yajl --with-phyp --with-esx --with-vmware --with-network --with-storage-fs --with-storage-lvm --with-storage-iscsi --with-storage-disk --with-storage-mpath --without-storage-rbd --without-numactl --without-numad --without-selinux --with-fuse --with-udev --with-capng --without-polkit --with-sasl --with-macvtap --with-libpcap --with-virtualport --without-firewalld --enable-nls --with-python --with-qemu-user=qemu --with-qemu-group=qemu --with-audit --without-netcf --without-hal --without-sanlock --with-init-script=systemd --disable-static --docdir=/usr/share/doc/libvirt-1.1.4 --with-remote --localstatedir=/var

2. run libvirtd

Workaround:
  apply following patch:

--- libvirt-1.1.4.orig/src/util/vircommand.c    2013-10-29 16:27:14.000000000 +0800
+++ libvirt-1.1.4/src/util/vircommand.c 2013-11-10 10:47:02.967066588 +0800
@@ -1906,7 +1906,7 @@
             nfds++;
         }

-        if (nfds == 0)
+        if (nfds <= 1)
             break;

         if (poll(fds, nfds, -1) < 0)

Comment 2 Sipingal Liu 2013-11-10 03:43:20 UTC
another workaround:

--- libvirt-1.0.5.orig/src/qemu/qemu_capabilities.c     2013-04-26 14:17:40.000000000 +0800
+++ libvirt-1.0.5/src/qemu/qemu_capabilities.c  2013-05-19 17:52:23.712369142 +0800
@@ -2452,16 +2452,19 @@ virQEMUCapsInitQMP(virQEMUCapsPtr qemuCa
                                "-M", "none",
                                "-qmp", monarg,
                                "-pidfile", pidfile,
-                               "-daemonize",
                                NULL);
     virCommandAddEnvPassCommon(cmd);
     virCommandClearCaps(cmd);
     virCommandSetGID(cmd, runGid);
     virCommandSetUID(cmd, runUid);
+    virCommandSetPidFile(cmd, pidfile);
+    virCommandDaemonize(cmd);

     if (virCommandRun(cmd, &status) < 0)
         goto cleanup;

Comment 3 Jiri Denemark 2013-11-11 10:33:47 UTC
Neither of the two patches make sense. What version of QEMU do you use and could you attach debug logs from libvirtd?

Comment 4 Eric Blake 2013-11-11 13:34:46 UTC
If you will post your proposed patches upstream at libvir-list, you will get a wider review from developers more familiar with the real root cause you are trying to solve.

Comment 5 Sipingal Liu 2013-11-12 00:14:45 UTC
I'm using qemu 1.6.1, but this issue can be reproduced with previous versions(i.e 1.4.x,1.5.x)


The qemu build with following configuraiton(** --target-list=,x86_64-linux-user,arm-linux-user **)
./configure --cc=x86_64-pc-linux-gnu-gcc --host-cc=x86_64-pc-linux-gnu-gcc --prefix=/usr --sysconfdir=/etc --libdir=/usr/lib64 --docdir=/usr/share/doc/qemu-1.6.1/html --disable-bsd-user --disable-guest-agent --disable-strip --disable-werror --python=/usr/bin/python2.7 --enable-linux-user --disable-system --target-list=,x86_64-linux-user,arm-linux-user --disable-blobs --disable-bluez --disable-curses --disable-kvm --disable-libiscsi --disable-glusterfs --enable-seccomp --disable-sdl --disable-smartcard-nss --disable-tools --disable-vde --disable-libssh2 --disable-libusb --disable-debug-info --disable-debug-tcg --enable-docs --enable-tcg-interpreter


When start the libvirtd, it hangs on determine the qemu capabilities, you can see a zombie process(qemu-system-arm) 2375 and it's child(original child) 2375 which is not terminated. Because the qemu-system-<arch> runs as a daemon, it forks itself and quits, but it is not wait()ed by its parent(libvirtd). And the child does not exit(or isn't terminated) either. 

I was reviewing the source code and debugging the issue and found the nfds equals 1.

I'm curious if the only one fds(cmd->inpipe) is available. Because I tried to print out the nfds value, it's 1. 
  
I'll attach the libvirtd log shortly. 

1885     for (;;) {
1886         size_t i;
1887         struct pollfd fds[3];
1888         int nfds = 0;
1889
1890         if (cmd->inpipe != -1) {
1891             fds[nfds].fd = cmd->inpipe;
1892             fds[nfds].events = POLLOUT;
1893             fds[nfds].revents = 0;
1894             nfds++;
1895         }
1896         if (outfd != -1) {
1897             fds[nfds].fd = outfd;
1898             fds[nfds].events = POLLIN;
1899             fds[nfds].revents = 0;
1900             nfds++;
1901         }
1902         if (errfd != -1) {
1903             fds[nfds].fd = errfd;
1904             fds[nfds].events = POLLIN;
1905             fds[nfds].revents = 0;
1906             nfds++;
1907         }
1908
1909         if (nfds == 0)
1910             break;
1911
1912         if (poll(fds, nfds, -1) < 0) {


Workaround without any patch, kill 2375(qemu-system-arch), then libvirtd executes 2534 and 2536(qemu-system-x86_64), kill 2536, libvirtds executes 2612  2614(it's qemu-kvm, it links to qemu-system-x86_64), kill it as well.     

$ virsh list  ##########** hangs *** #

$ ps -elfy ff | grep "libvirt\|qemu"
S root      2271  2987  0  80   0  3164 45251 poll_s 07:50 pts/1      0:00              |       \_ sudo ./daemon/libvirtd -f /etc/libvirt/libvirtd.conf
S root      2273  2271  1  80   0 10196 114812 poll_s 07:50 pts/1     0:00              |           \_ /home/sipingal/libvirt-1.1.4.orig/daemon/.libslibvirtd -f /etc/libvirt/libvirtd.conf
Z qemu      2375  2273  0  80   0     0     0 exit   07:50 pts/1      0:00              |               \_ [qemu-system-arm] <defunct>
S sipingal  2429 32008  0  80   0   960 28093 pipe_w 07:50 pts/2      0:00                      \_ grep --colour=auto libvirt\|qemu
S nobody    1884     1  0  80   0   952 31244 poll_s 07:19 ?          0:00 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf
S qemu      2378     1  0  80   0  5308 95085 poll_s 07:50 ?          0:00 /usr/bin/qemu-system-arm -S -no-user-config -nodefaults -nographic -M none -qmp unix:/var/lib/libvirt/qemu/capabilities.monitor.sock,server,nowait -pidfile /var/lib/libvirt/qemu/capabilities.pidfile -daemonize
$ sudo kill 2378
$ ps -elfy ff | grep "qemu"
Z qemu      2534  2273  0  80   0     0     0 exit   07:50 pts/1      0:00              |               \_ [qemu-system-x86] <defunct>
S sipingal  2564 32008  0  80   0   972 28093 pipe_w 07:50 pts/2      0:00                      \_ grep --colour=auto qemu
S qemu      2536     1  0  80   0  5068 95487 poll_s 07:50 ?          0:00 /usr/bin/qemu-system-x86_64 -S -no-user-config -nodefaults -nographic -M none -qmp unix:/var/lib/libvirt/qemu/capabilities.monitor.sock,server,nowait -pidfile /var/lib/libvirt/qemu/capabilities.pidfile -daemonize
$ sudo kill 2536
$ ps -elfy ff | grep "qemu"
Z qemu      2612  2273  1  80   0     0     0 exit   07:50 pts/1      0:00              |               \_ [qemu-system-x86] <defunct>
S sipingal  2627 32008  0  80   0   968 28093 pipe_w 07:50 pts/2      0:00                      \_ grep --colour=auto qemu
S qemu      2614     1  0  80   0  4744 79618 poll_s 07:50 ?          0:00 /usr/bin/qemu-system-x86_64 -machine accel=kvm -S -no-user-config -nodefaults -nographic -M none -qmp unix:/var/lib/libvirt/qemu/capabilities.monitor.sock,server,nowait -pidfile /var/lib/libvirt/qemu/capabilities.pidfile -daemonize
sipingal@spad ~ $ sudo kill 2614
sipingal@spad ~ $ ps -elfy ff | grep "qemu"
S sipingal  2675 32008  0  80   0   972 28093 pipe_w 07:51 pts/2      0:00                      \_ grep --colour=auto qemu


FYI, My system is Gentoo:
 $ eix -I qemu
[I] app-emulation/qemu
     Available versions:  1.4.2 (~)1.5.2-r1 (~)1.5.2-r2 1.5.3 (~)1.6.0 (~)1.6.0-r1 (~)1.6.1 **9999 {accessibility +aio alsa bluetooth +caps +curl debug (+)fdt +filecaps glusterfs gtk iscsi +jpeg mixemu ncurses opengl +png pulseaudio python rbd sasl sdl +seccomp selinux smartcard spice ssh static static-softmmu static-user systemtap tci test +threads tls usb usbredir +uuid vde +vhost-net virtfs +vnc xattr xen xfs KERNEL="FreeBSD linux" PYTHON_TARGETS="python2_6 python2_7" QEMU_SOFTMMU_TARGETS="alpha arm cris i386 lm32 m68k microblaze microblazeel mips mips64 mips64el mipsel moxie or32 ppc ppc64 ppcemb s390x sh4 sh4eb sparc sparc64 unicore32 x86_64 xtensa xtensaeb" QEMU_USER_TARGETS="alpha arm armeb cris i386 m68k microblaze microblazeel mips mips64 mips64el mipsel mipsn32 mipsn32el or32 ppc ppc64 ppc64abi32 s390x sh4 sh4eb sparc sparc32plus sparc64 unicore32 x86_64"}
     Installed versions:  1.6.1(06:22:15 PM 10/24/2013)(aio alsa bluetooth caps curl fdt filecaps gtk jpeg mixemu ncurses opengl png pulseaudio python rbd sasl sdl seccomp spice tci threads tls usbredir uuid vde vhost-net virtfs vnc xattr xfs -accessibility -debug -glusterfs -iscsi -selinux -smartcard -ssh -static -static-softmmu -static-user -systemtap -test -usb -xen KERNEL="linux -FreeBSD" PYTHON_TARGETS="python2_7 -python2_6" QEMU_SOFTMMU_TARGETS="arm x86_64 -alpha -cris -i386 -lm32 -m68k -microblaze -microblazeel -mips -mips64 -mips64el -mipsel -moxie -or32 -ppc -ppc64 -ppcemb -s390x -sh4 -sh4eb -sparc -sparc64 -unicore32 -xtensa -xtensaeb" QEMU_USER_TARGETS="arm x86_64 -alpha -armeb -cris -i386 -m68k -microblaze -microblazeel -mips -mips64 -mips64el -mipsel -mipsn32 -mipsn32el -or32 -ppc -ppc64 -ppc64abi32 -s390x -sh4 -sh4eb -sparc -sparc32plus -sparc64 -unicore32")
     Homepage:            http://www.qemu.org http://www.linux-kvm.org
     Description:         QEMU + Kernel-based Virtual Machine userland tools

Comment 6 Sipingal Liu 2013-11-12 00:16:34 UTC
Created attachment 822695 [details]
libvirtd log

Comment 7 Sipingal Liu 2013-11-12 00:28:27 UTC
the second workaround avoid running capability determination as a daemon, then the virCommandRun doesn't execute it via virCommandRunAsync(), but run it as a foreground process, then libvirtd can capture its stdout/err, that causes the NFDS changed to 2. 

2617     /*
2618      * We explicitly need to use -daemonize here, rather than
2619      * virCommandDaemonize, because we need to synchronize
2620      * with QEMU creating its monitor socket API. Using
2621      * daemonize guarantees control won't return to libvirt
2622      * until the socket is present.
2623      */
2624     cmd = virCommandNewArgList(qemuCaps->binary,
2625                                "-S",
2626                                "-no-user-config",
2627                                "-nodefaults",
2628                                "-nographic",
2629                                "-M", "none",
2630                                "-qmp", monarg,
2631                                "-pidfile", pidfile,
2632                                "-daemonize",
2633                                NULL);
2634     virCommandAddEnvPassCommon(cmd);
2635     virCommandClearCaps(cmd);
2636     virCommandSetGID(cmd, runGid);
2637     virCommandSetUID(cmd, runUid);
2638
2639     if (virCommandRun(cmd, &status) < 0)
2640         goto cleanup;
2641

src/util/vircommand.c

2033 /**
2034  * virCommandRun:
2035  * @cmd: command to run
2036  * @exitstatus: optional status collection
2037  *
2038  * Run the command and wait for completion.
2039  * Returns -1 on any error executing the
2040  * command. Returns 0 if the command executed,
2041  * with the exit status set.  If @exitstatus is NULL, then the
2042  * child must exit with status 0 for this to succeed.
2043  */
2044 int
2045 virCommandRun(virCommandPtr cmd, int *exitstatus)
2046 {
-----------------8<-----------------------
2102     /* If caller hasn't requested capture of stdout/err, then capture
2103      * it ourselves so we can log it.  But the intermediate child for
2104      * a daemon has no expected output, and we don't want our
2105      * capturing pipes passed on to the daemon grandchild.
2106      */
2107     if (!(cmd->flags & VIR_EXEC_DAEMON)) {
2108         if (!cmd->outfdptr) {
2109             cmd->outfdptr = &cmd->outfd;
2110             cmd->outbuf = &outbuf;
2111             string_io = true;
2112         }
2113         if (!cmd->errfdptr) {
2114             cmd->errfdptr = &cmd->errfd;
2115             cmd->errbuf = &errbuf;
2116             string_io = true;
2117         }
2118     }
2119
2120     cmd->flags |= VIR_EXEC_RUN_SYNC;
2121     if (virCommandRunAsync(cmd, NULL) < 0) {
2122         cmd->has_error = -1;
2123         return -1;
2124     }
2125

Comment 8 Dave Allan 2013-11-12 02:19:37 UTC
As Eric mentioned in comment 4, you really need to take this discussion to the libvirt list.

Comment 9 Cole Robinson 2016-04-10 15:20:39 UTC
I'm pretty sure this is long since fixed

*** This bug has been marked as a duplicate of bug 999765 ***


Note You need to log in before you can comment on or make changes to this bug.