Bug 730750
| Summary: | libvirt error in restoring domain with corrupt managedsave image | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Grant Williamson <grant_williamson> |
| Component: | libvirt | Assignee: | Eric Blake <eblake> |
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 6.1 | CC: | dallan, dyuan, eblake, malittle, mzhan, rwu, veillard, walicki, yupzhang |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | libvirt-0.9.4-8.el6 | Doc Type: | Bug Fix |
| Doc Text: |
Cause
Libvirt would attempt to load a managed save file in preference to starting a domain from scratch, even if the managed save file was damaged and could not be loaded.
Consequence
Users were complaining about the inability to start domains, not realizing that the domain had a corrupt managed save image that was being retried in a loop, and without realizing an obscure 'virsh managedsave-remove' could resolve the problem.
Fix
Libvirt introduced 'virsh start --force-boot', as well as some improved logic in ensuring that a managed save file would not be tried if it was corrupt, to make it less likely that a corrupted managed save file can interfere with guest startup.
Result
Use of managed save images is less likely to cause confusion due to a corrupted image.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-12-06 11:26:41 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 638510 | ||
So I found this thread. http://www.redhat.com/archives/libvir-list/2011-April/msg00385.html Red Hat's view on this - if the restore fails, data loss may occur when/if the saved state is removed. I agree. However for desktop KVM users, they get confused by cryptic error messages. Would it be possible for virt-manager to handle this in some fashion by prompting the user, on failure to remove or retry the restore? I'm not sure we can add some feild to the header of save image, such as "complete". So that can check the save image at restoring/starting. But this is only way as far I can get. Invalid (or missing) info:
* Version field: '['6.1']'
* Platform field (Architecture): 'Unspecified'
Please set valid values for above.
Once values are set, please change status back to 'NEW'.
Regards,
(In reply to comment #3) > I'm not sure we can add some feild to the header of save image, such as > "complete". > So that can check the save image at restoring/starting. But this is only way as > far I can get. Upstream has tackled this problem on two fronts: 1. Yes, we can, and we should, modify the save image header to mark incomplete images. Back-compatibility says that the best way to do this is by modifying the magic number - an unknown or missing value will treat the file as unknown and refuse to use it, a special number treats the file as incomplete (and managed save will know to warn about the incomplete managed save image, then proceed to boot normally), and the existing magic number is only written in on completion (safe to use). https://www.redhat.com/archives/libvir-list/2011-August/msg00854.html 2. Expose the capability of deleting (failed) managed save images more prominently. Done with this upstream commit: commit 27c85260532f879be5674a4eed0811c21fd34f94 Author: Eric Blake <eblake> Date: Sat Aug 27 17:07:18 2011 -0600 start: allow discarding managed save There have been several instances of people having problems with a broken managed save file, and not aware that they could use 'virsh managedsave-remove dom' to fix things. Making it possible to do this as part of starting a domain makes the same functionality easier to find, and one less API call. * include/libvirt/libvirt.h.in (VIR_DOMAIN_START_FORCE_BOOT): New flag. * src/libvirt.c (virDomainCreateWithFlags): Document it. * src/qemu/qemu_driver.c (qemuDomainObjStart): Alter signature. (qemuAutostartDomain, qemuDomainStartWithFlags): Update callers. * tools/virsh.c (cmdStart): Expose it in virsh. * tools/virsh.pod (start): Document it. as well as this followup to make the virsh capability work even with older servers: https://www.redhat.com/archives/libvir-list/2011-August/msg01440.html I think both approaches need to be backported into RHEL before we can call this issue complete (which implies that approach 1 still needs to be coded and accepted upstream, and that patch 2/1 of approach 2 still needs ack upstream). approach 1 also posted upstream: https://www.redhat.com/archives/libvir-list/2011-August/msg01458.html https://www.redhat.com/archives/libvir-list/2011-August/msg01459.html Additionally, at least one of my pending snapshot patches want to use the refactored qemuOpenFile() method from msg01458, so I'm marking this as a prereq to bug 638510 support for live snapshots via the snapshot_blkdev qemu monitor command. Reproduced this bug on libvirt-0.9.4-7.el6, domain will start fail with the incomplete save file. Verified PASS with libvirt-0.9.4-9.el6, domain will boot normally and remove the incomplete save file. (In reply to comment #9) > Reproduced this bug on libvirt-0.9.4-7.el6, domain will start fail with the > incomplete save file. Verified PASS with libvirt-0.9.4-9.el6, domain will boot > normally and remove the incomplete save file. Also get the following libvirtd.log: 15:30:36.635: 10074: warning : qemuDomainObjStart:4857 : Ignoring incomplete managed state /var/lib/libvirt/qemu/save/rhel6.save yep, that's normal :-) Daniel Many thanks for the patch. Will this fix be included in RHEL 6.2? c.f. comment #12, yes definitely, Daniel
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
Cause
Libvirt would attempt to load a managed save file in preference to starting a domain from scratch, even if the managed save file was damaged and could not be loaded.
Consequence
Users were complaining about the inability to start domains, not realizing that the domain had a corrupt managed save image that was being retried in a loop, and without realizing an obscure 'virsh managedsave-remove' could resolve the problem.
Fix
Libvirt introduced 'virsh start --force-boot', as well as some improved logic in ensuring that a managed save file would not be tried if it was corrupt, to make it less likely that a corrupted managed save file can interfere with guest startup.
Result
Use of managed save images is less likely to cause confusion due to a corrupted image.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1513.html |
Description of problem: If a managed save image cannot be restored, user is presented with the following error message. "Error restoring domain: cannot send monitor command '{"execute":"qmp_capabilities"}': Connection reset by peer" Version-Release number of selected component (if applicable): libvirt 0.8.7-18 How reproducible: - Power on a Windows XP guest using virt-manager. - Start to save the image using Virtual Manager, Shutdown, Save. - Before the save file is complete, make a copy of it. Then cancel the save process. i.e. cp /var/lib/libvirt/qemu/save/winxp.raw /root/winxp.raw This simulates a corrupt image. - Shutdown the windows xp guest - Copy the incomplete file back i.e. cp /root/winxp.raw /var/lib/libvirt/qemu/save/winxp.raw - Now power on windows xp image, it will quit with the error message shown above. The machine will not power on/boot successfully until this corrupt file is removed. Expected results: libvirt or virt-manager should determine the save file is corrupt either continue to boot or prompt the user if they would like to remove, before continuing to boot. Additional info: