Description of problem: When I was running libvirtd (0.9.7) in a VMware VM Ubuntu 11.10 server, dmidecode never exit and libvirtd is hang. After some debugging, I found there is a wrong use of poll(). This patch can fix it: diff --git a/src/util/command.c b/src/util/command.c index f5effdf..a038701 100644 --- a/src/util/command.c +++ b/src/util/command.c @@ -1616,6 +1616,7 @@ virCommandProcessIO(virCommandPtr cmd) int i; struct pollfd fds[3]; int nfds = 0; + int pfds = 0; if (infd != -1) { fds[nfds].fd = infd; @@ -1635,8 +1636,9 @@ virCommandProcessIO(virCommandPtr cmd) if (nfds == 0) break; - - if (poll(fds, nfds, -1) < 0) { + + pfds = poll(fds, nfds, -1); + if (pfds < 0) { if ((errno == EAGAIN) || (errno == EINTR)) continue; virReportSystemError(errno, "%s", @@ -1644,7 +1646,7 @@ virCommandProcessIO(virCommandPtr cmd) goto cleanup; } - for (i = 0; i < nfds ; i++) { + for (i = 0; i < pfds ; i++) { if (fds[i].fd == errfd || fds[i].fd == outfd) { char data[1024]; Version-Release number of selected component (if applicable): I also checked the latest git version, the bug is still there.
Thanks for the report; can you please also post your patch on libvir-list for better feedback from upstream?
Actually, your patch is incorrect. poll() may return a value less than nfds, but you _still_ have to cycle through all nfds elements of the array; for example, if nfds is 2 on entry and the return value is 1, you can't guarantee that fds[0] was updated (it might have been fds[1]). You need to further describe the hang that you are seeing, as I don't think it is a mis-use of poll() causing the hang.
Yes, it's my fault to give the wrong patch. But the hang does exist. In my case, it's hang on read() on stderr of the executing command(i.e. dmidecode) which never writes to stderr in my case. And there are still out messgaes not handled. So you cannot just do a simple loop on all fds[]. According to poll() man page, we should also check if there is revents: The field revents is an output parameter, filled by the kernel with the events that actually occurred. The bits returned in revents can include any of those specified in events, or one of the values POLLERR, POLLHUP, or POLLNVAL. (These three bits are meaningless in the events field, and will be set in the revents field whenever the corresponding condition is true.) So the patch should be like: --- libvirt-0.9.7.orig/src/util/command.c 2012-01-01 13:45:19.566400135 +0800 +++ libvirt-0.9.7/src/util/command.c 2012-01-03 11:46:39.998207510 +0800 @@ -1650,6 +1650,9 @@ /* Silence a false positive from clang. */ sa_assert(buf); + if (fds[i].revents == 0) + continue; + done = read(fds[i].fd, data, sizeof(data)); if (done < 0) { if (errno != EINTR &&
This case should happen if the output of the executed command is large. bzhou@ybox-s1:~$ sudo dmidecode -q -t 0,1,4,7 >& dmi.log bzhou@ybox-s1:~$ wc -c dmi.log 105014 dmi.log
Looks like this more comprehensive patch proposed upstream should fix the bug in a better manner: https://www.redhat.com/archives/libvir-list/2012-January/msg00079.html
I've just pushed that patch: commit 06b9c5b9231ef4dbd4b5ff69564305cd4f814879 Author: Michal Privoznik <mprivozn> AuthorDate: Tue Jan 3 18:40:55 2012 +0100 Commit: Michal Privoznik <mprivozn> CommitDate: Wed Jan 4 10:40:23 2012 +0100 virCommand: Properly handle POLLHUP It is a good practise to set revents to zero before doing any poll(). Moreover, we should check if event we waited for really occurred or if any of fds we were polling on didn't encountered hangup. Should we move this to POST?
*** Bug 758026 has been marked as a duplicate of this bug. ***
(In reply to comment #6) > I've just pushed that patch: > > Should we move this to POST? Actually, since this was filed against upstream libvirt.git, and the patch is now applied, we can outright close this BZ. 0.9.9 will include the fix.
(In reply to comment #4) > This case should happen if the output of the executed command is large. > bzhou@ybox-s1:~$ sudo dmidecode -q -t 0,1,4,7 >& dmi.log > bzhou@ybox-s1:~$ wc -c dmi.log > 105014 dmi.log Hi Bitman, Just curious how you came to know its the dmidecode that make libvirtd hung?