1791522 – Failed to start a domain with corrupted managed save file

Bug 1791522 - Failed to start a domain with corrupted managed save file

Summary: Failed to start a domain with corrupted managed save file

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux Advanced Virtualization
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	8.2
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	8.0
Assignee:	Pavel Mores
QA Contact:	Yanqiu Zhang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-01-16 02:54 UTC by Yanqiu Zhang
Modified:	2020-11-17 17:47 UTC (History)
CC List:	13 users (show)
Fixed In Version:	libvirt-6.3.0-1.el8
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-11-17 17:46:36 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Yanqiu Zhang 2020-01-16 02:54:49 UTC

Description of problem:
Failed to start a domain with corrupted managed save file

Version-Release number of selected component (if applicable):
libvirt-daemon-6.0.0-1.module+el8.2.0+5453+31b2b136.x86_64
qemu-kvm-4.2.0-6.module+el8.2.0+5453+31b2b136.x86_64


How reproducible:
100%

Steps to Reproduce:
1.Managedsave a running domain
# virsh managedsave 7

Domain 7 state saved by libvirt

2.Corrupt the saved file
#  echo > /var/lib/libvirt/qemu/save/7.save 

3.Try to start the domain
# virsh start 7
error: Failed to start domain 7
error: An error occurred, but the cause is unknown


Actual results:
The first start of a domain with corrupted managed save file failed

Expected results:


Additional info:
1. Second start will succeed:
# virsh start 7
Domain 7 started
2. Not produced on:
libvirt-5.10.0-2.module+el8.2.0+5274+60f836b5.x86_64
qemu-kvm-4.2.0-5.module+el8.2.0+5389+367d9739.x86_64

Comment 1 Peter Krempa 2020-01-16 08:02:05 UTC

And what do you expect to happen? 

The managed save image (if present) restored by using virsh start [1], so if the image is corrupted it's expected that the VM will fail to start. If it fails libvirt then removes the failed image and the VM will be started fresh.

This always worked this way thus my question. We could argue that the error message could be potentially improved though.


[1] https://libvirt.org/manpages/virsh.html#managedsave
    https://libvirt.org/manpages/virsh.html#start

Comment 2 Yanqiu Zhang 2020-01-16 08:15:01 UTC

Hi Peter,

Previously libvirt removes the corrupt image and makes a fresh start on the first try. You can also refer to bz1460962 and bz730750.

Thanks.

Comment 4 Erik Skultety 2020-01-16 10:29:16 UTC

> And what do you expect to happen? 
> 
> The managed save image (if present) restored by using virsh start [1], so if
> the image is corrupted it's expected that the VM will fail to start. If it
> fails libvirt then removes the failed image and the VM will be started fresh.

This implicit start after detecting an error should have not been IMO implemented in the first place especially when there's an API to remove the managedsave file. I agree the error should be improved though.

Comment 6 Daniel Berrangé 2020-01-16 10:45:28 UTC

(In reply to Erik Skultety from comment #4)
> > And what do you expect to happen? 
> > 
> > The managed save image (if present) restored by using virsh start [1], so if
> > the image is corrupted it's expected that the VM will fail to start. If it
> > fails libvirt then removes the failed image and the VM will be started fresh.
> 
> This implicit start after detecting an error should have not been IMO
> implemented in the first place especially when there's an API to remove the
> managedsave file. I agree the error should be improved though.

In hindsight, it looks like questionable behaviour, but none the less this is what we explicitly did in the past, so we should be fixing the API regression here.

Comment 8 Pavel Mores 2020-04-22 11:01:12 UTC

Proposed fix:

https://www.redhat.com/archives/libvir-list/2020-April/msg01073.html

Comment 9 Pavel Mores 2020-04-23 15:29:36 UTC

Fixed by

d9792233ec qemuDomainSaveImageOpen: Refactor handling of errors
9219424f56 qemuDomainSaveImageOpen: Use 'g_new0' instead of VIR_ALLOC(_N)
db907a4d9c qemuDomainSaveImageOpen: Automatically close 'fd' if unneeded
3850add603 qemuDomainSaveImageOpen: Use g_autoptr for 'def'
92b9657986 virQEMUSaveData: Register autoclear function and use it in qemuDomainSaveImageOpen
f76a571820 qemu: fix domain start with corrupted save file

v6.2.0-224-gf76a571820

Comment 12 Yanqiu Zhang 2020-05-09 09:01:36 UTC

Verify this bug on:
libvirt-daemon-6.3.0-1.module+el8.3.0+6478+69f490bb.x86_64
qemu-kvm-4.2.0-19.module+el8.3.0+6478+69f490bb.x86_64

Steps and results are same as bz1460962#c13.

And log msg:
2020-05-09 08:11:35.505+0000: 332050: warning : qemuDomainObjStart:7526 : Ignoring incomplete managed state /var/lib/libvirt/qemu/save/avocado-vt-vm1.save

Comment 15 errata-xmlrpc 2020-11-17 17:46:36 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5137

Note You need to log in before you can comment on or make changes to this bug.