Bug 868575

Summary: libvirt is often failing to show qemu stderr/stdout when startup fails
Product: [Fedora] Fedora Reporter: Cole Robinson <crobinso>
Component: libvirtAssignee: Libvirt Maintainers <libvirt-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 18CC: berrange, clalancette, dallan, djasa, itamar, jforbes, jyang, laine, libvirt-maint, veillard, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-10-31 16:39:01 EDT Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:

Description Cole Robinson 2012-10-20 18:23:52 EDT
One of the most common class of bugs we get are errors launching qemu, and in at least 95% of the time the only useful info is qemu stderr/stdout. However we aren't doing a good job of getting that info to the user right now. Having detail-less error's here like 'handshake failed' or 'couldn't connect to monitor' suck for users and libvirt devs alike.

There are 2 problems:


1) The conditions in qemuWaitForMonitor for scraping the logs don't always trigger.

I can consistently reproduce an issue here on libvirt.git, by sticking <readonly/> in an IDE disk block, which correctly causes qemu to bail out. When we reach the kill(2) check in the cleanup: block, the guest is still running, so we don't scrape the log output.

Possible solution could be to give a short wait loop for the VM to exit (we used to do this but not sure what happened to it). I'm sure there's a more c



2) Anything that errors between qemuProcessStart:virCommandRun and qemuProcessStart:qemuProcessWaitForMonitor won't report log output.

I was consistently seeing an issue here when hitting #809910, but it can be artificially reproduced quite easily by using an XML config that upsets qemu and sticking a sleep before each function call after virCommandRun.

All the bits here that can error depending on the qemu process state (particularly the virCommandHandshake bit) need to show log output when they fail.

(that's what the long dead '#if 0' block in the code was trying to accomplish but it was turned off without an alternative provided)


This isn't specific to F18 but it would be nice if we could get fixes queued here before the testday.
Comment 1 Cole Robinson 2013-05-24 10:53:08 EDT
dallan, can this be prioritized? pretty much every failure I'm hitting on F19 gives me the useless startup error 'connection reset by peer'
Comment 2 Dave Allan 2013-06-18 14:41:08 EDT
*** Bug 922425 has been marked as a duplicate of this bug. ***
Comment 3 Cole Robinson 2013-10-31 16:39:01 EDT
This should already be fixed in F20, so closing