Bug 868575 - libvirt is often failing to show qemu stderr/stdout when startup fails
Summary: libvirt is often failing to show qemu stderr/stdout when startup fails
Alias: None
Product: Fedora
Classification: Fedora
Component: libvirt
Version: 18
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
Assignee: Libvirt Maintainers
QA Contact: Fedora Extras Quality Assurance
: 922425 (view as bug list)
Depends On:
TreeView+ depends on / blocked
Reported: 2012-10-20 22:23 UTC by Cole Robinson
Modified: 2013-10-31 20:39 UTC (History)
11 users (show)

Clone Of:
Last Closed: 2013-10-31 20:39:01 UTC

Attachments (Terms of Use)

Description Cole Robinson 2012-10-20 22:23:52 UTC
One of the most common class of bugs we get are errors launching qemu, and in at least 95% of the time the only useful info is qemu stderr/stdout. However we aren't doing a good job of getting that info to the user right now. Having detail-less error's here like 'handshake failed' or 'couldn't connect to monitor' suck for users and libvirt devs alike.

There are 2 problems:

1) The conditions in qemuWaitForMonitor for scraping the logs don't always trigger.

I can consistently reproduce an issue here on libvirt.git, by sticking <readonly/> in an IDE disk block, which correctly causes qemu to bail out. When we reach the kill(2) check in the cleanup: block, the guest is still running, so we don't scrape the log output.

Possible solution could be to give a short wait loop for the VM to exit (we used to do this but not sure what happened to it). I'm sure there's a more c

2) Anything that errors between qemuProcessStart:virCommandRun and qemuProcessStart:qemuProcessWaitForMonitor won't report log output.

I was consistently seeing an issue here when hitting #809910, but it can be artificially reproduced quite easily by using an XML config that upsets qemu and sticking a sleep before each function call after virCommandRun.

All the bits here that can error depending on the qemu process state (particularly the virCommandHandshake bit) need to show log output when they fail.

(that's what the long dead '#if 0' block in the code was trying to accomplish but it was turned off without an alternative provided)

This isn't specific to F18 but it would be nice if we could get fixes queued here before the testday.

Comment 1 Cole Robinson 2013-05-24 14:53:08 UTC
dallan, can this be prioritized? pretty much every failure I'm hitting on F19 gives me the useless startup error 'connection reset by peer'

Comment 2 Dave Allan 2013-06-18 18:41:08 UTC
*** Bug 922425 has been marked as a duplicate of this bug. ***

Comment 3 Cole Robinson 2013-10-31 20:39:01 UTC
This should already be fixed in F20, so closing

Note You need to log in before you can comment on or make changes to this bug.