Bug 1460962
Summary: | vm cannot be started if it has a corrupted managedsave file | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | yisun |
Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> |
Status: | CLOSED ERRATA | QA Contact: | Yanqiu Zhang <yanqzhan> |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | 7.4 | CC: | chhu, dyuan, fjin, jdenemar, rbalakri, xuzhang, yafu, yanqzhan, yisun, zpeng |
Target Milestone: | rc | Keywords: | Regression, Upstream |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | libvirt-3.9.0-1.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-04-10 10:48:37 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
yisun
2017-06-13 09:15:03 UTC
Oops, caused by commit ac793bd7195ab99445cf6c6d6053439c56cef922 Author: Jiri Denemark <jdenemar> AuthorDate: Tue Jun 6 22:27:57 2017 +0200 Commit: Jiri Denemark <jdenemar> CommitDate: Wed Jun 7 13:36:01 2017 +0200 qemu: Fix memory leaks in qemuDomainSaveImageOpen Signed-off-by: Jiri Denemark <jdenemar> Reviewed-by: Pavel Hrdina <phrdina> which switched from directly returning with -3 to a goto, but failed to change the "return -1" statement at the end of the error path. Patch sent upstream for review: https://www.redhat.com/archives/libvir-list/2017-June/msg00541.html Fixed upstream now by commit 16e31fb38da3c2b9a35faff9ac626d947199cf13 Refs: v3.4.0-97-g16e31fb38 Author: Jiri Denemark <jdenemar> AuthorDate: Tue Jun 13 13:25:07 2017 +0200 Commit: Jiri Denemark <jdenemar> CommitDate: Tue Jun 13 13:46:40 2017 +0200 qemu: Fix starting a domain with corrupted managed save file Commit v3.4.0-44-gac793bd71 fixed a memory leak, but failed to return the special -3 value. Thus an attempt to start a domain with corrupted managed save file would removed the corrupted file and report "An error occurred, but the cause is unknown" instead of starting the domain from scratch. https://bugzilla.redhat.com/show_bug.cgi?id=1460962 Hit another issue, should be same root cause, doc it here, pls Jiri help to confirm. Summary: cannot undefine a VM when it used to have a corrupted manavedsave file which is already removed Steps: 1. make a managedsave root@localhost ~ ## virsh managedsave avocado-vt-vm1 Domain avocado-vt-vm1 state saved by libvirt 2. corrupt the managedsave file root@localhost ~ ## echo > /var/lib/libvirt/qemu/save/avocado-vt-vm1.save 3. try to start the vm root@localhost ~ ## virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: An error occurred, but the cause is unknown 4. now we can see the managedsave file removed root@localhost ~ ## ll /var/lib/libvirt/qemu/save/avocado-vt-vm1.save ls: cannot access /var/lib/libvirt/qemu/save/avocado-vt-vm1.save: No such file or directory 5. try to undefine the vm root@localhost ~ ## virsh undefine avocado-vt-vm1 error: Refusing to undefine while domain managed save image exists <=== now, we cannot undefine the Yeah, it's caused by the same bug. Libvirtd still thinks the domain has a saved state since it didn't notice it was removed because it was corrupted. Restarting libvirtd should let you undefined the domain. Reproduce this bug with libvirt-3.2.0-14.el7_4.2.x86_64 Steps to reproduce: 1.# virsh list --all --managed-save Id Name State ---------------------------------------------------- - V shut off # ls /var/lib/libvirt/qemu/save/ # echo > /var/lib/libvirt/qemu/save/V.save # virsh list --all --managed-save Id Name State ---------------------------------------------------- - V shut off # virsh start V error: Failed to start domain V error: An error occurred, but the cause is unknown <== Reproduced # virsh list --all --managed-save Id Name State ---------------------------------------------------- - V shut off 2.# virsh start V Domain V started # virsh list --all --managed-save Id Name State ---------------------------------------------------- 206 V running # virsh managedsave V Domain V state saved by libvirt # virsh list --all --managed-save Id Name State ---------------------------------------------------- - V saved # echo > /var/lib/libvirt/qemu/save/V.save # virsh start V error: Failed to start domain V error: An error occurred, but the cause is unknown <== Reproduced # virsh list --all --managed-save Id Name State ---------------------------------------------------- - V saved # virsh start V Domain V started # virsh list --all --managed-save Id Name State ---------------------------------------------------- 207 V running # virsh destroy V Domain V destroyed # virsh list --all --managed-save Id Name State ---------------------------------------------------- - V saved Verify this bug with libvirt-3.8.0-1.el7.x86_64. Steps to verify: 1.# virsh list --all --managed-save Id Name State ---------------------------------------------------- - V shut off # ls /var/lib/libvirt/qemu/save/ # echo > /var/lib/libvirt/qemu/save/V.save # virsh list --all --managed-save Id Name State ---------------------------------------------------- - V shut off # virsh start V Domain V started <== Successfully started without error. 2.# virsh list --all --managed-save Id Name State ---------------------------------------------------- 1 V running # virsh managedsave V Domain V state saved by libvirt # virsh list --all --managed-save Id Name State ---------------------------------------------------- - V saved # echo > /var/lib/libvirt/qemu/save/V.save # virsh start V Domain V started <== Successfully started without error. # virsh list --all --managed-save Id Name State ---------------------------------------------------- 2 V running # virsh destroy V Domain V destroyed # virsh list --all --managed-save Id Name State ---------------------------------------------------- - V saved # virsh undefine V error: Refusing to undefine while domain managed save image exists # systemctl restart libvirtd # virsh list --all --managed-save Id Name State ---------------------------------------------------- - V shut off # virsh undefine V Domain V has been undefined Above 'start' behavior get the expected result. But, Jiri, one more question: In last a few steps, after start guest with a corrupted image, the managed-saved status can only be cancelled by restart libvirtd, even though I start/destroy the guest for many times it cannot be cancelled. Do you think it's okay? Oops, looks like we don't reset the managed-saved status after deleting a corrupted save image. An additional trivial patch is needed... Patch sent upstream for review: https://www.redhat.com/archives/libvir-list/2017-October/msg01079.html Fixed upstream by commit f26636887fee11b3ecaa5c0a0734687cded8ed28 Refs: v3.8.0-237-gf26636887 Author: Jiri Denemark <jdenemar> AuthorDate: Tue Oct 24 10:32:03 2017 +0200 Commit: Jiri Denemark <jdenemar> CommitDate: Tue Oct 24 11:07:10 2017 +0200 qemu: Reset hasManagedSave after removing a corrupted image When starting a domain with managed save image, we try to restore it first. If the image is corrupted, we silently unlink it and just normally start the domain. At this point the domain has no managed save image, yet we did not reset the hasManagedSave flag. https://bugzilla.redhat.com/show_bug.cgi?id=1460962 Signed-off-by: Jiri Denemark <jdenemar> Verify this bug with libvirt-3.9.0-2.el7.x86_64: 1.Newly create a corrupted saved image: # virsh list --all --managed-save Id Name State ---------------------------------------------------- - V shut off # ls /var/lib/libvirt/qemu/save/ # echo > /var/lib/libvirt/qemu/save/V.save # virsh list --all --managed-save Id Name State ---------------------------------------------------- - V shut off # virsh start V Domain V started # ls /var/lib/libvirt/qemu/save/V.save # virsh destroy V Domain V destroyed # virsh list --all --managed-save Id Name State ---------------------------------------------------- - V shut off <== status is not "saved" 2.Corrupt an existing saved image: # virsh list --all --managed-save Id Name State ---------------------------------------------------- 6 V running # virsh managedsave V Domain V state saved by libvirt # virsh list --all --managed-save Id Name State ---------------------------------------------------- - V saved # echo > /var/lib/libvirt/qemu/save/V.save # virsh start V Domain V started # virsh list --all --managed-save Id Name State ---------------------------------------------------- 7 V running # virsh destroy V Domain V destroyed # virsh list --all --managed-save Id Name State ---------------------------------------------------- - V shut off <== status is not "saved" And guest can be undefined. According to comment 9 and this comment. Mark this bug as verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:0704 |