2177957 – Qemu core dump if cut off nfs storage during migration

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2177957 - Qemu core dump if cut off nfs storage during migration

Summary: Qemu core dump if cut off nfs storage during migration

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	8.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Eric Blake
QA Contact:	aihua liang
Docs Contact:	Parth Shah
URL:
Whiteboard:
Depends On:	2058982
Blocks:
TreeView+	depends on / blocked

Reported:	2023-03-14 03:42 UTC by aihua liang
Modified:	2023-11-14 16:54 UTC (History)
CC List:	20 users (show)
Fixed In Version:	qemu-kvm-6.2.0-34.module+el8.9.0+18868+5565e56d
Doc Type:	Known Issue
Doc Text:	.NFS failure during VM migration causes migration failure and source VM coredump Currently, if the NFS service or server is shut down during virtual machine (VM) migration, the source VM's QEMU is unable to reconnect to the NFS server when it starts running again. As a result, the migration fails and a coredump is initiated on the source VM. Currently, there is no workaround available.
Clone Of:	2058982
Environment:
Last Closed:	2023-11-14 15:33:28 UTC
Type:	---
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	pm-rhel: mirror+

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Gitlab	redhat/rhel/src/qemu-kvm qemu-kvm merge_requests 273	None	None	None	2023-05-02 02:30:12 UTC
Red Hat Issue Tracker	RHELPLAN-151667	None	None	None	2023-03-14 03:44:00 UTC
Red Hat Product Errata	RHSA-2023:6980	None	None	None	2023-11-14 15:34:08 UTC

Comment 2 aihua liang 2023-03-16 05:56:26 UTC

Hit this issue on:
   Qemu-kvm-6.2.0-20.module+el8.8.0+16744+d3c7858f
   Qemu-kvm-6.2.0-9.module+el8.7.0+14737+6552dcb8
   Qemu-kvm-6.2.0-11.module+el8.6.0+18167+43cf40f3.8
   Qemu-kvm-6.2.0-1.module+el8.6.0+13725+61ae1949

Not hit this issue on:
   Qemu-kvm-6.0.0-29.module+el8.6.0+12490+ec3e565c
   Qemu-kvm-6.1.0-1.module+el8.6.0+12535+4e2af250
   Qemu-kvm-6.1.0-5.module+el8.6.0+13430+8fdd5f85


So, it's a regression bug, and it occurs since qemu-kvm6.2, qemu-kvm-6.2.0-1.module+el8.6.0+13725+61ae1949.

Comment 7 Eric Blake 2023-05-01 18:39:05 UTC

Patches backported for 9.3 were simple enough that I will repeat the exercise for 8.9; will post a link to the merge request once it is available...

Comment 8 Eric Blake 2023-05-02 02:26:18 UTC

Merge request up: https://gitlab.com/redhat/rhel/src/qemu-kvm/qemu-kvm/-/merge_requests/273/diffs

Comment 16 aihua liang 2023-05-23 06:29:48 UTC

Test on qemu-kvm-6.2.0-34.module+el8.9.0+18868+5565e56d, the coredump issue has been resolved.

Test Steps:
 1. mount both src and dst with nfsv4,soft
   (src)#mount 10.73.114.30:/home/kvm_autotest_root/images /mnt/nfs
   (dst)#mount 10.73.114.30:/home/kvm_autotest_root/images /mnt/nfs

 2.Start src guest with qemu cmdline:
   /usr/libexec/qemu-kvm \
     -name 'avocado-vt-vm1'  \
     -sandbox on  \
     -machine q35,memory-backend=mem-machine_mem \
     -device '{"id": "pcie-root-port-0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x1", "chassis": 1}' \
     -device '{"id": "pcie-pci-bridge-0", "driver": "pcie-pci-bridge", "addr": "0x0", "bus": "pcie-root-port-0"}'  \
     -nodefaults \
     -device '{"driver": "VGA", "bus": "pcie.0", "addr": "0x2"}' \
     -m 30720 \
     -object '{"qom-type": "memory-backend-ram", "size": 32212254720, "id": "mem-machine_mem"}'  \
     -smp 12,maxcpus=12,cores=6,threads=1,dies=1,sockets=2  \
     -cpu 'Skylake-Server',+kvm_pv_unhalt \
     -chardev socket,wait=off,server=on,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20230202-211855-PNb4QIQg  \
     -mon chardev=qmp_id_qmpmonitor1,mode=control \
     -chardev socket,wait=off,server=on,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20230202-211855-PNb4QIQg  \
     -mon chardev=qmp_id_catch_monitor,mode=control \
     -device '{"ioport": 1285, "driver": "pvpanic", "id": "idcE8sYZ"}' \
     -chardev socket,wait=off,server=on,id=chardev_serial0,path=/var/tmp/serial-serial0-20230202-211855-PNb4QIQg \
     -device '{"id": "serial0", "driver": "isa-serial", "chardev": "chardev_serial0"}'  \
     -chardev socket,id=seabioslog_id_20230202-211855-PNb4QIQg,path=/var/tmp/seabios-20230202-211855-PNb4QIQg,server=on,wait=off \
     -device isa-debugcon,chardev=seabioslog_id_20230202-211855-PNb4QIQg,iobase=0x402 \
     -device '{"id": "pcie-root-port-1", "port": 1, "driver": "pcie-root-port", "addr": "0x1.0x1", "bus": "pcie.0", "chassis": 2}' \
     -device '{"driver": "qemu-xhci", "id": "usb1", "bus": "pcie-root-port-1", "addr": "0x0"}' \
     -device '{"driver": "usb-tablet", "id": "usb-tablet1", "bus": "usb1.0", "port": "1"}' \
     -object '{"qom-type": "iothread", "id": "iothread0"}' \
     -device '{"id": "pcie-root-port-2", "port": 2, "driver": "pcie-root-port", "addr": "0x1.0x2", "bus": "pcie.0", "chassis": 3}' \
     -device '{"id": "virtio_scsi_pci0", "driver": "virtio-scsi-pci", "bus": "pcie-root-port-2", "addr": "0x0", "iothread": "iothread0"}' \
     -blockdev '{"node-name": "file_image1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/mnt/nfs/rhel930-64-virtio.qcow2", "cache": {"direct": true, "no-flush": false}}' \
     -blockdev '{"node-name": "drive_image1", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_image1"}' \
     -device '{"driver": "scsi-hd", "id": "image1", "drive": "drive_image1", "write-cache": "on"}' \
     -device '{"id": "pcie-root-port-3", "port": 3, "driver": "pcie-root-port", "addr": "0x1.0x3", "bus": "pcie.0", "chassis": 4}' \
     -device '{"driver": "virtio-net-pci", "mac": "9a:df:c4:ac:9f:d2", "id": "idDHNSFZ", "netdev": "id29g0e4", "bus": "pcie-root-port-3", "addr": "0x0"}'  \
     -netdev tap,id=id29g0e4,vhost=on \
     -vnc :0  \
     -rtc base=utc,clock=host,driftfix=slew  \
     -boot menu=off,order=cdn,once=c,strict=off \
     -enable-kvm \
     -device '{"id": "pcie_extra_root_port_0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x3", "chassis": 5}' \
     -monitor stdio \

 3. Check src block info
   (qemu)info block
     drive_image1: /mnt/nfs/rhel930-64-virtio.qcow2 (qcow2)
    Attached to:      image1
    Cache mode:       writeback, direct

 4. Start dst guest with qemu cmdline:
     /usr/libexec/qemu-kvm \
     -name 'avocado-vt-vm1'  \
     -sandbox on  \
     -machine q35,memory-backend=mem-machine_mem \
     -device '{"id": "pcie-root-port-0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x1", "chassis": 1}' \
     -device '{"id": "pcie-pci-bridge-0", "driver": "pcie-pci-bridge", "addr": "0x0", "bus": "pcie-root-port-0"}'  \
     -nodefaults \
     -device '{"driver": "VGA", "bus": "pcie.0", "addr": "0x2"}' \
     -m 30720 \
     -object '{"qom-type": "memory-backend-ram", "size": 32212254720, "id": "mem-machine_mem"}'  \
     -smp 12,maxcpus=12,cores=6,threads=1,dies=1,sockets=2  \
     -cpu 'Skylake-Server',+kvm_pv_unhalt \
     -chardev socket,wait=off,server=on,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20230202-211855-PNb4QIQg  \
     -mon chardev=qmp_id_qmpmonitor1,mode=control \
     -chardev socket,wait=off,server=on,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20230202-211855-PNb4QIQg  \
     -mon chardev=qmp_id_catch_monitor,mode=control \
     -device '{"ioport": 1285, "driver": "pvpanic", "id": "idcE8sYZ"}' \
     -chardev socket,wait=off,server=on,id=chardev_serial0,path=/var/tmp/serial-serial0-20230202-211855-PNb4QIQg \
     -device '{"id": "serial0", "driver": "isa-serial", "chardev": "chardev_serial0"}'  \
     -chardev socket,id=seabioslog_id_20230202-211855-PNb4QIQg,path=/var/tmp/seabios-20230202-211855-PNb4QIQg,server=on,wait=off \
     -device isa-debugcon,chardev=seabioslog_id_20230202-211855-PNb4QIQg,iobase=0x402 \
     -device '{"id": "pcie-root-port-1", "port": 1, "driver": "pcie-root-port", "addr": "0x1.0x1", "bus": "pcie.0", "chassis": 2}' \
     -device '{"driver": "qemu-xhci", "id": "usb1", "bus": "pcie-root-port-1", "addr": "0x0"}' \
     -device '{"driver": "usb-tablet", "id": "usb-tablet1", "bus": "usb1.0", "port": "1"}' \
     -object '{"qom-type": "iothread", "id": "iothread0"}' \
     -device '{"id": "pcie-root-port-2", "port": 2, "driver": "pcie-root-port", "addr": "0x1.0x2", "bus": "pcie.0", "chassis": 3}' \
     -device '{"id": "virtio_scsi_pci0", "driver": "virtio-scsi-pci", "bus": "pcie-root-port-2", "addr": "0x0", "iothread": "iothread0"}' \
     -blockdev '{"node-name": "file_image1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/mnt/nfs/rhel930-64-virtio.qcow2", "cache": {"direct": true, "no-flush": false}}' \
     -blockdev '{"node-name": "drive_image1", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_image1"}' \
     -device '{"driver": "scsi-hd", "id": "image1", "drive": "drive_image1", "write-cache": "on"}' \
     -device '{"id": "pcie-root-port-3", "port": 3, "driver": "pcie-root-port", "addr": "0x1.0x3", "bus": "pcie.0", "chassis": 4}' \
     -device '{"driver": "virtio-net-pci", "mac": "9a:df:c4:ac:9f:d2", "id": "idDHNSFZ", "netdev": "id29g0e4", "bus": "pcie-root-port-3", "addr": "0x0"}'  \
     -netdev tap,id=id29g0e4,vhost=on \
     -vnc :0  \
     -rtc base=utc,clock=host,driftfix=slew  \
     -boot menu=off,order=cdn,once=c,strict=off \
     -enable-kvm \
     -device '{"id": "pcie_extra_root_port_0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x3", "chassis": 5}' \
     -monitor stdio \
     -incoming tcp:0:5000,server=on,wait=off \

 5. Migrate from src to dst
     {"execute": "migrate","arguments":{"uri": "tcp:10.73.114.141:5000"}}

 6. Check migration info
    (qemu)info migration

 7. During migration active, stop nfs server
    (nfs-server):systemctl stop nfs-server.service

 8. Wait about 30 minutes

 9. Start nfs server, then wait for some minutes, check block info in src.
    (nfs-server):systemctl start nfs-server.service
    (qemu)info block


Actual Result:
 In step8, 30 minutes later,
  dst qemu quit automatically for error.
    (qemu) info status
          VM status: paused (inmigrate)
    (qemu) qemu-kvm: load of migration failed: Input/output error

  src qemu return to running status with error reported.
    (qemu) qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1)
qemu-kvm: Could not reopen qcow2 layer: Could not read qcow2 header: Input/output error
    {"execute": "migrate","arguments":{"uri": "tcp:10.73.114.141:5000"}}
{"return": {}}
{"timestamp": {"seconds": 1684814182, "microseconds": 254216}, "event": "STOP"}
{"timestamp": {"seconds": 1684814723, "microseconds": 733510}, "event": "RESUME"}


 In step9, after nfs server restored, src can't connect to nfs server automatically.
   (qemu) info block
image1: [not inserted]
    Attached to:      image1


 They only thing we can do is to quit src vm then start it.

Expected Result:
 src qemu can connect to nfs server automatically.


So, Eric

  I'm not sure if the result is as expected. We know that nfs+hard will continuously retry to connect to nfs server. But for nfs+soft mode, how it will act after nfs server restored? 


BR,
Aliang

Comment 17 Eric Blake 2023-05-23 12:59:39 UTC

(In reply to aihua liang from comment #16)
> Test on qemu-kvm-6.2.0-34.module+el8.9.0+18868+5565e56d, the coredump issue
> has been resolved.

That was the intent of the patches (prevent the core dump, even if the disk image itself can't be recovered), so we've made progress.


...
> 
>  9. Start nfs server, then wait for some minutes, check block info in src.
>     (nfs-server):systemctl start nfs-server.service
>     (qemu)info block
> 
> 
> Actual Result:
>  In step8, 30 minutes later,
>   dst qemu quit automatically for error.
>     (qemu) info status
>           VM status: paused (inmigrate)
>     (qemu) qemu-kvm: load of migration failed: Input/output error
> 
>   src qemu return to running status with error reported.
>     (qemu) qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable:
> bdrv_inactivate_all() failed (-1)
> qemu-kvm: Could not reopen qcow2 layer: Could not read qcow2 header:
> Input/output error

This matches expectations that qemu is now remembering that I/O failed during the time that NFS was unavailable.

>     {"execute": "migrate","arguments":{"uri": "tcp:10.73.114.141:5000"}}
> {"return": {}}
> {"timestamp": {"seconds": 1684814182, "microseconds": 254216}, "event":
> "STOP"}
> {"timestamp": {"seconds": 1684814723, "microseconds": 733510}, "event":
> "RESUME"}
> 
> 
>  In step9, after nfs server restored, src can't connect to nfs server
> automatically.
>    (qemu) info block
> image1: [not inserted]
>     Attached to:      image1
> 
> 
>  They only thing we can do is to quit src vm then start it.

I'm not sure if this can be helped, but it is beyond the scope of what my patches touched.  I'm adding needinfo on Juan (migration) and Kevin (block layer) to chime in on whether there are any ideas on whether the source of a migration should be able to reconnect to an NFS server after previously getting an I/O error, but we may be at the point where qemu has done the best it can of diagnosing that state was lost and the only safe way to recover is to restart the source.

> 
> Expected Result:
>  src qemu can connect to nfs server automatically.
> 
> 
> So, Eric
> 
>   I'm not sure if the result is as expected. We know that nfs+hard will
> continuously retry to connect to nfs server. But for nfs+soft mode, how it
> will act after nfs server restored? 

You've verified that we avoided qemu dumping core on an assertion failure (the most important thing), so if there is still more to be done at letting qemu regain access to an NFS drive after an NFS failure has caused I/O failures, that may be worth splitting into a separate bug.

Comment 18 Kevin Wolf 2023-05-23 15:50:37 UTC

NFS still returning I/O errors after the connection has restored sounds a bit like one part of the thing recently discussed in bug 2178024.

There, Jeff Layton mentioned bug 2180124 as probably addressing the I/O errors and our QE confirmed. The behaviour still isn't entirely as documented (soft mounts don't reliably timeout, but may just hang), but that's not the part we're seeing here. So I expect that on 9.3, we'll be able to recover successfully on the source.

Of course, I don't know if there are any plans to backport the same to RHEL 8.9, but I wouldn't expect it.

Comment 21 Yanan Fu 2023-05-24 12:44:06 UTC

QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 (18894) test pass.

Comment 31 Juan Quintela 2023-06-27 09:47:03 UTC

(In reply to Eric Blake from comment #17)

> > 
> >   src qemu return to running status with error reported.
> >     (qemu) qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable:
> > bdrv_inactivate_all() failed (-1)
> > qemu-kvm: Could not reopen qcow2 layer: Could not read qcow2 header:
> > Input/output error
> 
> This matches expectations that qemu is now remembering that I/O failed
> during the time that NFS was unavailable.

right.


> >  In step9, after nfs server restored, src can't connect to nfs server
> > automatically.
> >    (qemu) info block
> > image1: [not inserted]
> >     Attached to:      image1
> > 
> > 
> >  They only thing we can do is to quit src vm then start it.
> 
> I'm not sure if this can be helped, but it is beyond the scope of what my
> patches touched.  I'm adding needinfo on Juan (migration) and Kevin (block
> layer) to chime in on whether there are any ideas on whether the source of a
> migration should be able to reconnect to an NFS server after previously
> getting an I/O error

We can only recover from synchronous errors.  And what migration does is "basically" a fsync().
If fsync() fails, we don't know what has happened with the pending writes.

Is there a way to recover all that pending writes?
I am not sure that the block layer knows how to recover from this (i.e. that it has enough information to retry all the pending writes).



> but we may be at the point where qemu has done the
> best it can of diagnosing that state was lost and the only safe way to
> recover is to restart the source.

I fully agree here.  I can't think how to recover the source, specially after a real error.
 
> > Expected Result:
> >  src qemu can connect to nfs server automatically.
> > 
> > 
> > So, Eric
> > 
> >   I'm not sure if the result is as expected. We know that nfs+hard will
> > continuously retry to connect to nfs server. But for nfs+soft mode, how it
> > will act after nfs server restored? 
> 
> You've verified that we avoided qemu dumping core on an assertion failure
> (the most important thing), so if there is still more to be done at letting
> qemu regain access to an NFS drive after an NFS failure has caused I/O
> failures, that may be worth splitting into a separate bug.

But what can we do here?  The code does something like().


write(...)
...

write(...)
...  /* several more times and even from more than one thread */

fsync() /* here is where we fail */

We can detect the error, but how can we recover?

Notice that I know that we have code to fix "some" of this problems.  Because we implement "poor man" sparse images with limited space, and where there is a -ENOSPACE, it stops the guest, you can add space at this point, and continue.

But I don't know how much/any extra handling that we have here.

Comment 33 errata-xmlrpc 2023-11-14 15:33:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: virt:rhel and virt-devel:rhel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6980

Note You need to log in before you can comment on or make changes to this bug.

chayang
coli
eblake
fjin
hreitz
jherrman
jinzhao
jmaloy
juzhang
kwolf
mdean
meili
mrezanin
nilal
qinwang
quintela
vgoyal
virt-maint
xiaohli
yfu