Description of problem: Failed to start a domain with corrupted managed save file Version-Release number of selected component (if applicable): libvirt-daemon-6.0.0-1.module+el8.2.0+5453+31b2b136.x86_64 qemu-kvm-4.2.0-6.module+el8.2.0+5453+31b2b136.x86_64 How reproducible: 100% Steps to Reproduce: 1.Managedsave a running domain # virsh managedsave 7 Domain 7 state saved by libvirt 2.Corrupt the saved file # echo > /var/lib/libvirt/qemu/save/7.save 3.Try to start the domain # virsh start 7 error: Failed to start domain 7 error: An error occurred, but the cause is unknown Actual results: The first start of a domain with corrupted managed save file failed Expected results: Additional info: 1. Second start will succeed: # virsh start 7 Domain 7 started 2. Not produced on: libvirt-5.10.0-2.module+el8.2.0+5274+60f836b5.x86_64 qemu-kvm-4.2.0-5.module+el8.2.0+5389+367d9739.x86_64
And what do you expect to happen? The managed save image (if present) restored by using virsh start [1], so if the image is corrupted it's expected that the VM will fail to start. If it fails libvirt then removes the failed image and the VM will be started fresh. This always worked this way thus my question. We could argue that the error message could be potentially improved though. [1] https://libvirt.org/manpages/virsh.html#managedsave https://libvirt.org/manpages/virsh.html#start
Hi Peter, Previously libvirt removes the corrupt image and makes a fresh start on the first try. You can also refer to bz1460962 and bz730750. Thanks.
> And what do you expect to happen? > > The managed save image (if present) restored by using virsh start [1], so if > the image is corrupted it's expected that the VM will fail to start. If it > fails libvirt then removes the failed image and the VM will be started fresh. This implicit start after detecting an error should have not been IMO implemented in the first place especially when there's an API to remove the managedsave file. I agree the error should be improved though.
(In reply to Erik Skultety from comment #4) > > And what do you expect to happen? > > > > The managed save image (if present) restored by using virsh start [1], so if > > the image is corrupted it's expected that the VM will fail to start. If it > > fails libvirt then removes the failed image and the VM will be started fresh. > > This implicit start after detecting an error should have not been IMO > implemented in the first place especially when there's an API to remove the > managedsave file. I agree the error should be improved though. In hindsight, it looks like questionable behaviour, but none the less this is what we explicitly did in the past, so we should be fixing the API regression here.
Proposed fix: https://www.redhat.com/archives/libvir-list/2020-April/msg01073.html
Fixed by d9792233ec qemuDomainSaveImageOpen: Refactor handling of errors 9219424f56 qemuDomainSaveImageOpen: Use 'g_new0' instead of VIR_ALLOC(_N) db907a4d9c qemuDomainSaveImageOpen: Automatically close 'fd' if unneeded 3850add603 qemuDomainSaveImageOpen: Use g_autoptr for 'def' 92b9657986 virQEMUSaveData: Register autoclear function and use it in qemuDomainSaveImageOpen f76a571820 qemu: fix domain start with corrupted save file v6.2.0-224-gf76a571820
Verify this bug on: libvirt-daemon-6.3.0-1.module+el8.3.0+6478+69f490bb.x86_64 qemu-kvm-4.2.0-19.module+el8.3.0+6478+69f490bb.x86_64 Steps and results are same as bz1460962#c13. And log msg: 2020-05-09 08:11:35.505+0000: 332050: warning : qemuDomainObjStart:7526 : Ignoring incomplete managed state /var/lib/libvirt/qemu/save/avocado-vt-vm1.save
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:5137