Bug 770788 - virCommandProcessIO sometimes hang
Summary: virCommandProcessIO sometimes hang
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libvirt
Version: unspecified
Hardware: All
OS: All
unspecified
high
Target Milestone: ---
Assignee: Eric Blake
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 758026
TreeView+ depends on / blocked
 
Reported: 2011-12-29 05:31 UTC by Bitman Zhou
Modified: 2012-02-08 16:02 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-01-04 15:22:23 UTC
Embargoed:


Attachments (Terms of Use)

Description Bitman Zhou 2011-12-29 05:31:36 UTC
Description of problem:
When I was running libvirtd (0.9.7) in a VMware VM Ubuntu 11.10 server, dmidecode never exit and libvirtd is hang. After some debugging, I found there is a wrong use of poll(). This patch can fix it:

diff --git a/src/util/command.c b/src/util/command.c
index f5effdf..a038701 100644
--- a/src/util/command.c
+++ b/src/util/command.c
@@ -1616,6 +1616,7 @@ virCommandProcessIO(virCommandPtr cmd)
         int i;
         struct pollfd fds[3];
         int nfds = 0;
+       int pfds = 0;
 
         if (infd != -1) {
             fds[nfds].fd = infd;
@@ -1635,8 +1636,9 @@ virCommandProcessIO(virCommandPtr cmd)
 
         if (nfds == 0)
             break;
-
-        if (poll(fds, nfds, -1) < 0) {
+       
+       pfds = poll(fds, nfds, -1);
+        if (pfds < 0) {
             if ((errno == EAGAIN) || (errno == EINTR))
                 continue;
             virReportSystemError(errno, "%s",
@@ -1644,7 +1646,7 @@ virCommandProcessIO(virCommandPtr cmd)
             goto cleanup;
         }
 
-        for (i = 0; i < nfds ; i++) {
+        for (i = 0; i < pfds ; i++) {
             if (fds[i].fd == errfd ||
                 fds[i].fd == outfd) {
                 char data[1024];



Version-Release number of selected component (if applicable):
I also checked the latest git version, the bug is still there.

Comment 1 Eric Blake 2012-01-02 19:16:40 UTC
Thanks for the report; can you please also post your patch on libvir-list for better feedback from upstream?

Comment 2 Eric Blake 2012-01-02 19:20:21 UTC
Actually, your patch is incorrect.  poll() may return a value less than nfds, but you _still_ have to cycle through all nfds elements of the array; for example, if nfds is 2 on entry and the return value is 1, you can't guarantee that fds[0] was updated (it might have been fds[1]).

You need to further describe the hang that you are seeing, as I don't think it is a mis-use of poll() causing the hang.

Comment 3 Bitman Zhou 2012-01-03 03:52:44 UTC
Yes, it's my fault to give the wrong patch. But the hang does exist. In my case, it's hang on read() on stderr of the executing command(i.e. dmidecode) which never writes to stderr in my case. And there are still out messgaes not handled. So you cannot just do a simple loop on all fds[]. According to poll() man page, we should also check if there is revents:

       The  field  revents  is  an  output parameter, filled by the kernel with the events that actually occurred.  The bits returned in revents can include any of
       those specified in events, or one of the values POLLERR, POLLHUP, or POLLNVAL.  (These three bits are meaningless in the events field, and will  be  set  in
       the revents field whenever the corresponding condition is true.)

So the patch should be like:

--- libvirt-0.9.7.orig/src/util/command.c	2012-01-01 13:45:19.566400135 +0800
+++ libvirt-0.9.7/src/util/command.c	2012-01-03 11:46:39.998207510 +0800
@@ -1650,6 +1650,9 @@
                 /* Silence a false positive from clang. */
                 sa_assert(buf);
 
+                if (fds[i].revents == 0)
+                    continue;
+
                 done = read(fds[i].fd, data, sizeof(data));
                 if (done < 0) {
                     if (errno != EINTR &&

Comment 4 Bitman Zhou 2012-01-03 04:14:01 UTC
This case should happen if the output of the executed command is large.
bzhou@ybox-s1:~$ sudo dmidecode -q -t 0,1,4,7 >& dmi.log
bzhou@ybox-s1:~$ wc -c dmi.log
105014 dmi.log

Comment 5 Eric Blake 2012-01-03 19:01:26 UTC
Looks like this more comprehensive patch proposed upstream should fix the bug in a better manner:

https://www.redhat.com/archives/libvir-list/2012-January/msg00079.html

Comment 6 Michal Privoznik 2012-01-04 10:05:38 UTC
I've just pushed that patch:

commit 06b9c5b9231ef4dbd4b5ff69564305cd4f814879
Author:     Michal Privoznik <mprivozn>
AuthorDate: Tue Jan 3 18:40:55 2012 +0100
Commit:     Michal Privoznik <mprivozn>
CommitDate: Wed Jan 4 10:40:23 2012 +0100

    virCommand: Properly handle POLLHUP
    
    It is a good practise to set revents to zero before doing any poll().
    Moreover, we should check if event we waited for really occurred or
    if any of fds we were polling on didn't encountered hangup.


Should we move this to POST?

Comment 7 Michal Privoznik 2012-01-04 14:53:46 UTC
*** Bug 758026 has been marked as a duplicate of this bug. ***

Comment 8 Eric Blake 2012-01-04 15:22:23 UTC
(In reply to comment #6)
> I've just pushed that patch:
> 
> Should we move this to POST?

Actually, since this was filed against upstream libvirt.git, and the patch is now applied, we can outright close this BZ.  0.9.9 will include the fix.

Comment 9 Anil Vettathu 2012-02-08 16:02:11 UTC
(In reply to comment #4)
> This case should happen if the output of the executed command is large.
> bzhou@ybox-s1:~$ sudo dmidecode -q -t 0,1,4,7 >& dmi.log
> bzhou@ybox-s1:~$ wc -c dmi.log
> 105014 dmi.log

Hi Bitman,

Just curious how you came to know its the dmidecode that make libvirtd hung?


Note You need to log in before you can comment on or make changes to this bug.