Bug 537938 - restore fails with large files; libvirtd becomes unresponsive
Summary: restore fails with large files; libvirtd becomes unresponsive
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: libvirt
Version: 12
Hardware: All
OS: Linux
high
high
Target Milestone: ---
Assignee: Daniel Berrangé
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: F12VirtTarget
TreeView+ depends on / blocked
 
Reported: 2009-11-16 20:43 UTC by Charles Duffy
Modified: 2010-05-26 17:59 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-05-26 17:59:29 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Debug log showing a failure (7.49 KB, text/x-log)
2009-11-17 03:46 UTC, Charles Duffy
no flags Details
Proposed fix - let intermediate process finish before looking for console (4.14 KB, patch)
2009-11-19 00:56 UTC, Charles Duffy
no flags Details | Diff

Description Charles Duffy 2009-11-16 20:43:05 UTC
Description of problem:

  When trying to do a "virsh restore" from a very large file (typically >2GB), libvirtd fails with an error such as the following:

    14:28:39.112: error : qemudOpenMonitorUnix:1007 : monitor socket did not show up.: Connection refused
    14:28:39.112: debug : qemudShutdownVMDaemon:2177 : Shutting down VM 'fvte-6ba0a31cc2a987befbcc48ee66be19d857eb9a8b'

  When this occurs, libvirtd is no longer responsive (even to completely separate connections or commands) until restarted; "virsh list", for instance, hangs indefinitely.

Version-Release number of selected component (if applicable):

  0.7.2-5.fc13

How reproducible:

  >50% (100% on present hardware and configuration [a Dell R900], but timing-dependent by nature; may behave differently on other hardware)

Steps to Reproduce:

  1. Use "virsh save" to save memory and device state of a large VM (that is, one with at least 2GB of memory actively in use and populated).

  2. Move the backing store used by this image out of the way, and replace it with a qcow2 file which points at that prior backing store as its parent state. (optional; this allows a single ramsave file to be reused more than once safely by repeating this step).

  3. Use "virsh restore" to resume the guest
  
Actual results:

  libvirtd complains that the monitor socket did not show up, says that it is shutting down the VM. Henceforth no virsh commands (even "virsh help") function.

Expected results:

  The guest should resume running where it left off.

Comment 1 Charles Duffy 2009-11-17 03:46:49 UTC
Created attachment 369822 [details]
Debug log showing a failure

This failure occurred on a fairly small (102MB with LZO compression, 239MB after extracting and decompressing the qemu migration stream) ramsave file, after eight prior uneventful save/restore cycles.

Rerunning a restore from the same source file resulted in success; as such, this is unquestionably a race condition.

Comment 2 Charles Duffy 2009-11-17 03:57:30 UTC
Another notable symptom --

After forcefully restarting libvirtd, the guest which libvirt had hung attempting to start is running (in the state in which it was saved to disk) and functioning correctly.

Comment 3 Charles Duffy 2009-11-17 05:50:59 UTC
Ran some tests using script -t to record timing with a qemu run with an -incoming exec: set to emit a sigil to stderr on migration complete.

Observations as follows:

 - The qemu monitor prompt most often displays immediately, but very occasionally displays only after a substantial delay; difficult to say with certainty, but this latter case appears to happen more often with caches cold.

 - In the common case, the monitor appears quickly, but is unresponsive (despite the prompt having appeared) while migration is underway.

 - For a 1.3GB lzop-compressed saved state file, this migration process takes on the scale of 12 seconds on my QA system (a fairly high-end server marketed as virtualization-centric hardware); decompressing ahead-of-time and only loading the raw data makes this closer to 11sec with caches hot, and 20sec with caches cold.

 - Libvirt's timeouts waiting for the monitor to appear, and for the monitor to be responsive, are generally much smaller than the 11-20sec observed.

As such, these timeouts are problematic, causing failures during restore processes which would succeed if it weren't for libvirt bailing early.

Comment 4 Charles Duffy 2009-11-19 00:56:20 UTC
Created attachment 370241 [details]
Proposed fix - let intermediate process finish before looking for console

The attached patch moves the waitpid() for the intermediate child (ie decompressor) to happen immediately before qeumdWaitForMonitor() is called.

This appears to resolve the issue for me. It has not yet been tested in cases where no compression is in use.

Comment 5 Daniel Berrangé 2009-11-19 11:23:57 UTC
> Observations as follows:
>
>  - The qemu monitor prompt most often displays immediately, but very
> occasionally displays only after a substantial delay; difficult to say with
> certainty, but this latter case appears to happen more often with caches cold.
>
> - In the common case, the monitor appears quickly, but is unresponsive
> (despite the prompt having appeared) while migration is underway.
>
>  - For a 1.3GB lzop-compressed saved state file, this migration process takes
> on the scale of 12 seconds on my QA system (a fairly high-end server marketed
> as virtualization-centric hardware); decompressing ahead-of-time and only
> loading the raw data makes this closer to 11sec with caches hot, and 20sec with
> caches cold.
> 
>  - Libvirt's timeouts waiting for the monitor to appear, and for the monitor to
> be responsive, are generally much smaller than the 11-20sec observed.

All these points are related to fact that in libvirt <= 0.7.2, we actually wait for the '(qemu)' prompt to appear on the monitor console before considering the VM to have successfully launched. IIUC, the appearance of this '(qemu)' prompt on the monitor is blocked pending completion of reading data from the -incoming arg (or least least it is often delayed)

In the current upstream libvirt, I totally re-wrote the monitor interaction, and (by accident!) removed the need to wait for '(qemu)' prompt when starting a VM. Thus when starting VMs now, the only place where we have a timeout is reading the STDOUT logs to find the PTY associated with any serial/parallel ports.  QEMU writes the PTY names to STDOUT before processing the data from -incoming, so that should not cause any problematic delays. That said, the current timeout in libvirt for reading PTYs is only 3 seconds which is probably a little too low for a highly loaded host - it would be worth increasing that to 10 seconds as a safety net.

So while your patch may have done the job for 0.7.2, I'm not sure that it is needed with current libvirt upstream code anymore.

Comment 6 Mark McLoughlin 2009-11-19 13:01:58 UTC
(In reply to comment #5)
 
> So while your patch may have done the job for 0.7.2, I'm not sure that it is
> needed with current libvirt upstream code anymore.  

Should we apply Charles patch in F-12?

Comment 7 Bug Zapper 2010-03-15 13:02:43 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 13 development cycle.
Changing version to '13'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 8 Cole Robinson 2010-05-17 18:55:33 UTC
Patch available in comment #4, worth applying to F12.

Comment 9 Cole Robinson 2010-05-26 17:59:29 UTC
 Actually, on second thought, since this is non-upstream code, and haven't had more than this report of large file restore issues, I don't think its worth a backport. If we get other reports of this issue we can reconsider, but doesn't sound like many people are hitting it in practice.

Since this is fixed in F13, closing CURRENTRELEASE.


Note You need to log in before you can comment on or make changes to this bug.