Bug 730750
Summary: | libvirt error in restoring domain with corrupt managedsave image | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Grant Williamson <grant_williamson> |
Component: | libvirt | Assignee: | Eric Blake <eblake> |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 6.1 | CC: | dallan, dyuan, eblake, malittle, mzhan, rwu, veillard, walicki, yupzhang |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | libvirt-0.9.4-8.el6 | Doc Type: | Bug Fix |
Doc Text: |
Cause
Libvirt would attempt to load a managed save file in preference to starting a domain from scratch, even if the managed save file was damaged and could not be loaded.
Consequence
Users were complaining about the inability to start domains, not realizing that the domain had a corrupt managed save image that was being retried in a loop, and without realizing an obscure 'virsh managedsave-remove' could resolve the problem.
Fix
Libvirt introduced 'virsh start --force-boot', as well as some improved logic in ensuring that a managed save file would not be tried if it was corrupt, to make it less likely that a corrupted managed save file can interfere with guest startup.
Result
Use of managed save images is less likely to cause confusion due to a corrupted image.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2011-12-06 11:26:41 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 638510 |
Description
Grant Williamson
2011-08-15 16:05:36 UTC
So I found this thread. http://www.redhat.com/archives/libvir-list/2011-April/msg00385.html Red Hat's view on this - if the restore fails, data loss may occur when/if the saved state is removed. I agree. However for desktop KVM users, they get confused by cryptic error messages. Would it be possible for virt-manager to handle this in some fashion by prompting the user, on failure to remove or retry the restore? I'm not sure we can add some feild to the header of save image, such as "complete". So that can check the save image at restoring/starting. But this is only way as far I can get. Invalid (or missing) info: * Version field: '['6.1']' * Platform field (Architecture): 'Unspecified' Please set valid values for above. Once values are set, please change status back to 'NEW'. Regards, (In reply to comment #3) > I'm not sure we can add some feild to the header of save image, such as > "complete". > So that can check the save image at restoring/starting. But this is only way as > far I can get. Upstream has tackled this problem on two fronts: 1. Yes, we can, and we should, modify the save image header to mark incomplete images. Back-compatibility says that the best way to do this is by modifying the magic number - an unknown or missing value will treat the file as unknown and refuse to use it, a special number treats the file as incomplete (and managed save will know to warn about the incomplete managed save image, then proceed to boot normally), and the existing magic number is only written in on completion (safe to use). https://www.redhat.com/archives/libvir-list/2011-August/msg00854.html 2. Expose the capability of deleting (failed) managed save images more prominently. Done with this upstream commit: commit 27c85260532f879be5674a4eed0811c21fd34f94 Author: Eric Blake <eblake> Date: Sat Aug 27 17:07:18 2011 -0600 start: allow discarding managed save There have been several instances of people having problems with a broken managed save file, and not aware that they could use 'virsh managedsave-remove dom' to fix things. Making it possible to do this as part of starting a domain makes the same functionality easier to find, and one less API call. * include/libvirt/libvirt.h.in (VIR_DOMAIN_START_FORCE_BOOT): New flag. * src/libvirt.c (virDomainCreateWithFlags): Document it. * src/qemu/qemu_driver.c (qemuDomainObjStart): Alter signature. (qemuAutostartDomain, qemuDomainStartWithFlags): Update callers. * tools/virsh.c (cmdStart): Expose it in virsh. * tools/virsh.pod (start): Document it. as well as this followup to make the virsh capability work even with older servers: https://www.redhat.com/archives/libvir-list/2011-August/msg01440.html I think both approaches need to be backported into RHEL before we can call this issue complete (which implies that approach 1 still needs to be coded and accepted upstream, and that patch 2/1 of approach 2 still needs ack upstream). approach 1 also posted upstream: https://www.redhat.com/archives/libvir-list/2011-August/msg01458.html https://www.redhat.com/archives/libvir-list/2011-August/msg01459.html Additionally, at least one of my pending snapshot patches want to use the refactored qemuOpenFile() method from msg01458, so I'm marking this as a prereq to bug 638510 support for live snapshots via the snapshot_blkdev qemu monitor command. Reproduced this bug on libvirt-0.9.4-7.el6, domain will start fail with the incomplete save file. Verified PASS with libvirt-0.9.4-9.el6, domain will boot normally and remove the incomplete save file. (In reply to comment #9) > Reproduced this bug on libvirt-0.9.4-7.el6, domain will start fail with the > incomplete save file. Verified PASS with libvirt-0.9.4-9.el6, domain will boot > normally and remove the incomplete save file. Also get the following libvirtd.log: 15:30:36.635: 10074: warning : qemuDomainObjStart:4857 : Ignoring incomplete managed state /var/lib/libvirt/qemu/save/rhel6.save yep, that's normal :-) Daniel Many thanks for the patch. Will this fix be included in RHEL 6.2? c.f. comment #12, yes definitely, Daniel Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause Libvirt would attempt to load a managed save file in preference to starting a domain from scratch, even if the managed save file was damaged and could not be loaded. Consequence Users were complaining about the inability to start domains, not realizing that the domain had a corrupt managed save image that was being retried in a loop, and without realizing an obscure 'virsh managedsave-remove' could resolve the problem. Fix Libvirt introduced 'virsh start --force-boot', as well as some improved logic in ensuring that a managed save file would not be tried if it was corrupt, to make it less likely that a corrupted managed save file can interfere with guest startup. Result Use of managed save images is less likely to cause confusion due to a corrupted image. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1513.html |