Red Hat Bugzilla – Bug 537938
restore fails with large files; libvirtd becomes unresponsive
Last modified: 2010-05-26 13:59:29 EDT
Description of problem:
When trying to do a "virsh restore" from a very large file (typically >2GB), libvirtd fails with an error such as the following:
14:28:39.112: error : qemudOpenMonitorUnix:1007 : monitor socket did not show up.: Connection refused
14:28:39.112: debug : qemudShutdownVMDaemon:2177 : Shutting down VM 'fvte-6ba0a31cc2a987befbcc48ee66be19d857eb9a8b'
When this occurs, libvirtd is no longer responsive (even to completely separate connections or commands) until restarted; "virsh list", for instance, hangs indefinitely.
Version-Release number of selected component (if applicable):
>50% (100% on present hardware and configuration [a Dell R900], but timing-dependent by nature; may behave differently on other hardware)
Steps to Reproduce:
1. Use "virsh save" to save memory and device state of a large VM (that is, one with at least 2GB of memory actively in use and populated).
2. Move the backing store used by this image out of the way, and replace it with a qcow2 file which points at that prior backing store as its parent state. (optional; this allows a single ramsave file to be reused more than once safely by repeating this step).
3. Use "virsh restore" to resume the guest
libvirtd complains that the monitor socket did not show up, says that it is shutting down the VM. Henceforth no virsh commands (even "virsh help") function.
The guest should resume running where it left off.
Created attachment 369822 [details]
Debug log showing a failure
This failure occurred on a fairly small (102MB with LZO compression, 239MB after extracting and decompressing the qemu migration stream) ramsave file, after eight prior uneventful save/restore cycles.
Rerunning a restore from the same source file resulted in success; as such, this is unquestionably a race condition.
Another notable symptom --
After forcefully restarting libvirtd, the guest which libvirt had hung attempting to start is running (in the state in which it was saved to disk) and functioning correctly.
Ran some tests using script -t to record timing with a qemu run with an -incoming exec: set to emit a sigil to stderr on migration complete.
Observations as follows:
- The qemu monitor prompt most often displays immediately, but very occasionally displays only after a substantial delay; difficult to say with certainty, but this latter case appears to happen more often with caches cold.
- In the common case, the monitor appears quickly, but is unresponsive (despite the prompt having appeared) while migration is underway.
- For a 1.3GB lzop-compressed saved state file, this migration process takes on the scale of 12 seconds on my QA system (a fairly high-end server marketed as virtualization-centric hardware); decompressing ahead-of-time and only loading the raw data makes this closer to 11sec with caches hot, and 20sec with caches cold.
- Libvirt's timeouts waiting for the monitor to appear, and for the monitor to be responsive, are generally much smaller than the 11-20sec observed.
As such, these timeouts are problematic, causing failures during restore processes which would succeed if it weren't for libvirt bailing early.
Created attachment 370241 [details]
Proposed fix - let intermediate process finish before looking for console
The attached patch moves the waitpid() for the intermediate child (ie decompressor) to happen immediately before qeumdWaitForMonitor() is called.
This appears to resolve the issue for me. It has not yet been tested in cases where no compression is in use.
> Observations as follows:
> - The qemu monitor prompt most often displays immediately, but very
> occasionally displays only after a substantial delay; difficult to say with
> certainty, but this latter case appears to happen more often with caches cold.
> - In the common case, the monitor appears quickly, but is unresponsive
> (despite the prompt having appeared) while migration is underway.
> - For a 1.3GB lzop-compressed saved state file, this migration process takes
> on the scale of 12 seconds on my QA system (a fairly high-end server marketed
> as virtualization-centric hardware); decompressing ahead-of-time and only
> loading the raw data makes this closer to 11sec with caches hot, and 20sec with
> caches cold.
> - Libvirt's timeouts waiting for the monitor to appear, and for the monitor to
> be responsive, are generally much smaller than the 11-20sec observed.
All these points are related to fact that in libvirt <= 0.7.2, we actually wait for the '(qemu)' prompt to appear on the monitor console before considering the VM to have successfully launched. IIUC, the appearance of this '(qemu)' prompt on the monitor is blocked pending completion of reading data from the -incoming arg (or least least it is often delayed)
In the current upstream libvirt, I totally re-wrote the monitor interaction, and (by accident!) removed the need to wait for '(qemu)' prompt when starting a VM. Thus when starting VMs now, the only place where we have a timeout is reading the STDOUT logs to find the PTY associated with any serial/parallel ports. QEMU writes the PTY names to STDOUT before processing the data from -incoming, so that should not cause any problematic delays. That said, the current timeout in libvirt for reading PTYs is only 3 seconds which is probably a little too low for a highly loaded host - it would be worth increasing that to 10 seconds as a safety net.
So while your patch may have done the job for 0.7.2, I'm not sure that it is needed with current libvirt upstream code anymore.
(In reply to comment #5)
> So while your patch may have done the job for 0.7.2, I'm not sure that it is
> needed with current libvirt upstream code anymore.
Should we apply Charles patch in F-12?
This bug appears to have been reported against 'rawhide' during the Fedora 13 development cycle.
Changing version to '13'.
More information and reason for this action is here:
Patch available in comment #4, worth applying to F12.
Actually, on second thought, since this is non-upstream code, and haven't had more than this report of large file restore issues, I don't think its worth a backport. If we get other reports of this issue we can reconsider, but doesn't sound like many people are hitting it in practice.
Since this is fixed in F13, closing CURRENTRELEASE.