Bug 1791522 - Failed to start a domain with corrupted managed save file
Summary: Failed to start a domain with corrupted managed save file
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.2
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: 8.0
Assignee: Pavel Mores
QA Contact: Yanqiu Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-16 02:54 UTC by Yanqiu Zhang
Modified: 2020-11-17 17:47 UTC (History)
13 users (show)

Fixed In Version: libvirt-6.3.0-1.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-17 17:46:36 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Yanqiu Zhang 2020-01-16 02:54:49 UTC
Description of problem:
Failed to start a domain with corrupted managed save file

Version-Release number of selected component (if applicable):
libvirt-daemon-6.0.0-1.module+el8.2.0+5453+31b2b136.x86_64
qemu-kvm-4.2.0-6.module+el8.2.0+5453+31b2b136.x86_64


How reproducible:
100%

Steps to Reproduce:
1.Managedsave a running domain
# virsh managedsave 7

Domain 7 state saved by libvirt

2.Corrupt the saved file
#  echo > /var/lib/libvirt/qemu/save/7.save 

3.Try to start the domain
# virsh start 7
error: Failed to start domain 7
error: An error occurred, but the cause is unknown


Actual results:
The first start of a domain with corrupted managed save file failed

Expected results:


Additional info:
1. Second start will succeed:
# virsh start 7
Domain 7 started
2. Not produced on:
libvirt-5.10.0-2.module+el8.2.0+5274+60f836b5.x86_64
qemu-kvm-4.2.0-5.module+el8.2.0+5389+367d9739.x86_64

Comment 1 Peter Krempa 2020-01-16 08:02:05 UTC
And what do you expect to happen? 

The managed save image (if present) restored by using virsh start [1], so if the image is corrupted it's expected that the VM will fail to start. If it fails libvirt then removes the failed image and the VM will be started fresh.

This always worked this way thus my question. We could argue that the error message could be potentially improved though.


[1] https://libvirt.org/manpages/virsh.html#managedsave
    https://libvirt.org/manpages/virsh.html#start

Comment 2 Yanqiu Zhang 2020-01-16 08:15:01 UTC
Hi Peter,

Previously libvirt removes the corrupt image and makes a fresh start on the first try. You can also refer to bz1460962 and bz730750.

Thanks.

Comment 4 Erik Skultety 2020-01-16 10:29:16 UTC
> And what do you expect to happen? 
> 
> The managed save image (if present) restored by using virsh start [1], so if
> the image is corrupted it's expected that the VM will fail to start. If it
> fails libvirt then removes the failed image and the VM will be started fresh.

This implicit start after detecting an error should have not been IMO implemented in the first place especially when there's an API to remove the managedsave file. I agree the error should be improved though.

Comment 6 Daniel Berrangé 2020-01-16 10:45:28 UTC
(In reply to Erik Skultety from comment #4)
> > And what do you expect to happen? 
> > 
> > The managed save image (if present) restored by using virsh start [1], so if
> > the image is corrupted it's expected that the VM will fail to start. If it
> > fails libvirt then removes the failed image and the VM will be started fresh.
> 
> This implicit start after detecting an error should have not been IMO
> implemented in the first place especially when there's an API to remove the
> managedsave file. I agree the error should be improved though.

In hindsight, it looks like questionable behaviour, but none the less this is what we explicitly did in the past, so we should be fixing the API regression here.

Comment 8 Pavel Mores 2020-04-22 11:01:12 UTC
Proposed fix:

https://www.redhat.com/archives/libvir-list/2020-April/msg01073.html

Comment 9 Pavel Mores 2020-04-23 15:29:36 UTC
Fixed by

d9792233ec qemuDomainSaveImageOpen: Refactor handling of errors
9219424f56 qemuDomainSaveImageOpen: Use 'g_new0' instead of VIR_ALLOC(_N)
db907a4d9c qemuDomainSaveImageOpen: Automatically close 'fd' if unneeded
3850add603 qemuDomainSaveImageOpen: Use g_autoptr for 'def'
92b9657986 virQEMUSaveData: Register autoclear function and use it in qemuDomainSaveImageOpen
f76a571820 qemu: fix domain start with corrupted save file

v6.2.0-224-gf76a571820

Comment 12 Yanqiu Zhang 2020-05-09 09:01:36 UTC
Verify this bug on:
libvirt-daemon-6.3.0-1.module+el8.3.0+6478+69f490bb.x86_64
qemu-kvm-4.2.0-19.module+el8.3.0+6478+69f490bb.x86_64

Steps and results are same as bz1460962#c13.

And log msg:
2020-05-09 08:11:35.505+0000: 332050: warning : qemuDomainObjStart:7526 : Ignoring incomplete managed state /var/lib/libvirt/qemu/save/avocado-vt-vm1.save

Comment 15 errata-xmlrpc 2020-11-17 17:46:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5137


Note You need to log in before you can comment on or make changes to this bug.