RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 730750 - libvirt error in restoring domain with corrupt managedsave image
Summary: libvirt error in restoring domain with corrupt managedsave image
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt
Version: 6.1
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Eric Blake
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 638510
TreeView+ depends on / blocked
 
Reported: 2011-08-15 16:05 UTC by Grant Williamson
Modified: 2011-12-06 11:26 UTC (History)
9 users (show)

Fixed In Version: libvirt-0.9.4-8.el6
Doc Type: Bug Fix
Doc Text:
Cause Libvirt would attempt to load a managed save file in preference to starting a domain from scratch, even if the managed save file was damaged and could not be loaded. Consequence Users were complaining about the inability to start domains, not realizing that the domain had a corrupt managed save image that was being retried in a loop, and without realizing an obscure 'virsh managedsave-remove' could resolve the problem. Fix Libvirt introduced 'virsh start --force-boot', as well as some improved logic in ensuring that a managed save file would not be tried if it was corrupt, to make it less likely that a corrupted managed save file can interfere with guest startup. Result Use of managed save images is less likely to cause confusion due to a corrupted image.
Clone Of:
Environment:
Last Closed: 2011-12-06 11:26:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:1513 0 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2011-12-06 01:23:30 UTC

Description Grant Williamson 2011-08-15 16:05:36 UTC
Description of problem:
If a managed save image cannot be restored, user is presented with the following error message.
"Error restoring domain: cannot send monitor command '{"execute":"qmp_capabilities"}': Connection reset by peer"

Version-Release number of selected component (if applicable):
libvirt 0.8.7-18

How reproducible:
- Power on a Windows XP guest using virt-manager.

- Start to save the image using Virtual Manager, Shutdown, Save.

- Before the save file is complete, make a copy of it. Then cancel the save process.
  i.e.
  cp /var/lib/libvirt/qemu/save/winxp.raw /root/winxp.raw
  This simulates a corrupt image.

- Shutdown the windows xp guest

- Copy the incomplete file back
  i.e.
  cp /root/winxp.raw /var/lib/libvirt/qemu/save/winxp.raw 

- Now power on windows xp image, it will quit with the error message shown above. The machine will not power on/boot successfully until this corrupt file is removed.
  

Expected results:
libvirt or virt-manager should determine the save file is corrupt either continue to boot or prompt the user if they would like to remove, before continuing to boot.

Additional info:

Comment 2 Grant Williamson 2011-08-16 07:14:56 UTC
So I found this thread.
http://www.redhat.com/archives/libvir-list/2011-April/msg00385.html

Red Hat's view on this  - if the restore fails, data loss may occur when/if the saved state is removed. I agree.

However for desktop KVM users, they get confused by cryptic error messages. Would it be possible for virt-manager to handle this in some fashion by prompting the user, on failure to remove or retry the restore?

Comment 3 Osier Yang 2011-08-17 12:56:39 UTC
I'm not sure we can add some feild to the header of save image, such as "complete".
So that can check the save image at restoring/starting. But this is only way as far I can get.

Comment 4 Satya Komaragiri 2011-08-30 08:04:03 UTC
Invalid (or missing) info:
     * Version field: '['6.1']'
     * Platform field (Architecture): 'Unspecified'
Please set valid values for above.
Once values are set,  please change status back to 'NEW'.
Regards,

Comment 5 Eric Blake 2011-08-30 16:05:03 UTC
(In reply to comment #3)
> I'm not sure we can add some feild to the header of save image, such as
> "complete".
> So that can check the save image at restoring/starting. But this is only way as
> far I can get.

Upstream has tackled this problem on two fronts:

1. Yes, we can, and we should, modify the save image header to mark incomplete images.  Back-compatibility says that the best way to do this is by modifying the magic number - an unknown or missing value will treat the file as unknown and refuse to use it, a special number treats the file as incomplete (and managed save will know to warn about the incomplete managed save image, then proceed to boot normally), and the existing magic number is only written in on completion (safe to use).
https://www.redhat.com/archives/libvir-list/2011-August/msg00854.html

2. Expose the capability of deleting (failed) managed save images more prominently.  Done with this upstream commit:
commit 27c85260532f879be5674a4eed0811c21fd34f94
Author: Eric Blake <eblake>
Date:   Sat Aug 27 17:07:18 2011 -0600

    start: allow discarding managed save
    
    There have been several instances of people having problems with
    a broken managed save file, and not aware that they could use
    'virsh managedsave-remove dom' to fix things.  Making it possible
    to do this as part of starting a domain makes the same functionality
    easier to find, and one less API call.
    
    * include/libvirt/libvirt.h.in (VIR_DOMAIN_START_FORCE_BOOT): New
    flag.
    * src/libvirt.c (virDomainCreateWithFlags): Document it.
    * src/qemu/qemu_driver.c (qemuDomainObjStart): Alter signature.
    (qemuAutostartDomain, qemuDomainStartWithFlags): Update callers.
    * tools/virsh.c (cmdStart): Expose it in virsh.
    * tools/virsh.pod (start): Document it.

as well as this followup to make the virsh capability work even with older servers:
https://www.redhat.com/archives/libvir-list/2011-August/msg01440.html

I think both approaches need to be backported into RHEL before we can call this issue complete (which implies that approach 1 still needs to be coded and accepted upstream, and that patch 2/1 of approach 2 still needs ack upstream).

Comment 6 Eric Blake 2011-08-30 21:09:54 UTC
approach 1 also posted upstream:
https://www.redhat.com/archives/libvir-list/2011-August/msg01458.html
https://www.redhat.com/archives/libvir-list/2011-August/msg01459.html

Additionally, at least one of my pending snapshot patches want to use the refactored qemuOpenFile() method from msg01458, so I'm marking this as a prereq to bug 638510 support for live snapshots via the snapshot_blkdev qemu monitor command.

Comment 9 dyuan 2011-09-07 07:32:45 UTC
Reproduced this bug on libvirt-0.9.4-7.el6, domain will start fail with the incomplete save file. Verified PASS with libvirt-0.9.4-9.el6, domain will boot normally and remove the incomplete save file.

Comment 10 dyuan 2011-09-07 07:41:15 UTC
(In reply to comment #9)
> Reproduced this bug on libvirt-0.9.4-7.el6, domain will start fail with the
> incomplete save file. Verified PASS with libvirt-0.9.4-9.el6, domain will boot
> normally and remove the incomplete save file.

Also get the following libvirtd.log:
15:30:36.635: 10074: warning : qemuDomainObjStart:4857 : Ignoring incomplete managed state /var/lib/libvirt/qemu/save/rhel6.save

Comment 11 Daniel Veillard 2011-09-07 09:24:41 UTC
yep, that's normal :-)

Daniel

Comment 12 John Walicki 2011-09-07 14:07:39 UTC
Many thanks for the patch. 
Will this fix be included in RHEL 6.2?

Comment 13 Daniel Veillard 2011-10-13 14:28:17 UTC
c.f. comment #12, yes definitely,

Daniel

Comment 14 Eric Blake 2011-11-10 23:56:44 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause
    Libvirt would attempt to load a managed save file in preference to starting a domain from scratch, even if the managed save file was damaged and could not be loaded.
Consequence
    Users were complaining about the inability to start domains, not realizing that the domain had a corrupt managed save image that was being retried in a loop, and without realizing an obscure 'virsh managedsave-remove' could resolve the problem.
Fix
    Libvirt introduced 'virsh start --force-boot', as well as some improved logic in ensuring that a managed save file would not be tried if it was corrupt, to make it less likely that a corrupted managed save file can interfere with guest startup.
Result
    Use of managed save images is less likely to cause confusion due to a corrupted image.

Comment 15 errata-xmlrpc 2011-12-06 11:26:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1513.html


Note You need to log in before you can comment on or make changes to this bug.