RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 692998 - data loss if restoring libvirt domain encounters transient error
Summary: data loss if restoring libvirt domain encounters transient error
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt
Version: 6.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Osier Yang
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-04-01 22:30 UTC by Eric Blake
Modified: 2015-09-28 02:24 UTC (History)
5 users (show)

Fixed In Version: libvirt-0.8.7-17.el6
Doc Type: Bug Fix
Doc Text:
libvirt removed the managed state file (created by "virsh managedsave dom") even if it failed to restore and start the domain using that file. This caused data loss. The managed state file is now removed only if the restore operation succeeds.
Clone Of:
Environment:
Last Closed: 2011-05-19 13:29:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0596 0 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2011-05-18 17:56:36 UTC

Description Eric Blake 2011-04-01 22:30:29 UTC
Description of problem:
https://www.redhat.com/archives/libvir-list/2011-April/msg00071.html
Libvirt blindly unlink()s a saved domain image when completing 'virsh restore file' (or, with managedsave, 'virsh start' from the managed save location).  However, if qemu failed to start (particularly if the failure is transient, such as lack of memory due to other pressures on the system that can be resolved before retrying the restore), this is a form of data loss.

Version-Release number of selected component (if applicable):
libvirt-0.8.7-15.el6

How reproducible:
not sure how to reliably reproduce qemu not starting, but it seems like it should be possible to come up with some scenarios

Steps to Reproduce:
1. save a running qemu domain (either with 'virsh managedsave dom' or 'virsh save dom file'
2. set up conditions where qemu will fail to start (possibly by reverting to a known-buggy qemu, or by intentionally allocating enough memory in some other program that the qemu memory request will be denied)
3. try to start the saved domain ('virsh start' or 'virsh restore file', accordingly)
4. revert the temporary conditions (restore qemu to a non-buggy version, or release memory back to the system...)
5. try to start the saved domain again
  
Actual results:
step 3 unlinked the save file, even though the restore failed, losing all state from the guest's memory at the time it was saved

Expected results:
step 3 should fail, but leave the save file intact.  step 5 should then succeed.


Additional info:
No upstream patch available yet, but data loss is severe, hence requesting exception for inclusion in RHEL 6.1.

Comment 2 Osier Yang 2011-04-05 06:50:38 UTC
The problem exists only for "virsh managedsave dom; virsh start dom",  "virsh save dom dom.save; virsh restore dom.save" works fine, as it doesn't trys to unlink the saved state.

patch posted to upstream:
http://www.redhat.com/archives/libvir-list/2011-April/msg00215.html

Comment 6 zhanghaiyan 2011-04-18 10:04:45 UTC
Verified this bug pass with libvirt-0.8.7-17.el6.x86_64
1. # virsh start rhel61
Do some operation in guest, for example open a document
2. # virsh managedsave rhel61
# ll /var/lib/libvirt/qemu/save/rhel61.save 
-rw-------. 1 root root 497719393 Apr 18 05:58 /var/lib/libvirt/qemu/save/rhel61.save
3. # rpm -e qemu-kvm --nodeps
4. # # virsh start rhel61
error: Failed to start domain rhel61
error: Cannot find QEMU binary /usr/libexec/qemu-kvm: No such file or directory
# ll /var/lib/libvirt/qemu/save/rhel61.save 
-rw-------. 1 root root 497719393 Apr 18 05:58 /var/lib/libvirt/qemu/save/rhel61.save
5. # rpm -ivh qemu-kvm-0.12.1.2-2.158.el6.x86_64.rpm 
# service libvirtd restart
6. # virsh start rhel61
The guest is restored from the save file, and the document is still open in the guest

Reproduced this bug with libvirt-0.8.7-16.el6.x86_64
In step 4, the save file is deleted
In step 6, the guest is boot up freshly

Comment 7 Osier Yang 2011-05-03 07:06:52 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
libvirt removes the managed state file (created by "virsh managedsave dom") even if it fails to start up the domain from restoring with the managed state file, which causes data loss, with this update, it removes the managed state file only if restoring from the managed state file succeeded.

Comment 10 Laura Bailey 2011-05-04 04:17:24 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-libvirt removes the managed state file (created by "virsh managedsave dom") even if it fails to start up the domain from restoring with the managed state file, which causes data loss, with this update, it removes the managed state file only if restoring from the managed state file succeeded.+libvirt removed the managed state file (created by "virsh managedsave dom") even if it failed to restore and start the domain using that file. This caused data loss. The managed state file is now removed only if the restore operation succeeds.

Comment 11 errata-xmlrpc 2011-05-19 13:29:36 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0596.html


Note You need to log in before you can comment on or make changes to this bug.