Bug 2177957

Summary:	Qemu core dump if cut off nfs storage during migration
Product:	Red Hat Enterprise Linux 8	Reporter:	aihua liang <aliang>
Component:	qemu-kvm	Assignee:	Eric Blake <eblake>
qemu-kvm sub component:	Storage	QA Contact:	aihua liang <aliang>
Status:	CLOSED ERRATA	Docs Contact:	Parth Shah <pashah>
Severity:	high
Priority:	high	CC:	chayang, coli, eblake, fjin, hreitz, jherrman, jinzhao, jmaloy, juzhang, kwolf, mdean, meili, mrezanin, nilal, qinwang, quintela, vgoyal, virt-maint, xiaohli, yfu
Version:	8.8	Keywords:	CustomerScenariosInitiative, Regression, Triaged
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	qemu-kvm-6.2.0-34.module+el8.9.0+18868+5565e56d	Doc Type:	Known Issue
Doc Text:	.NFS failure during VM migration causes migration failure and source VM coredump Currently, if the NFS service or server is shut down during virtual machine (VM) migration, the source VM's QEMU is unable to reconnect to the NFS server when it starts running again. As a result, the migration fails and a coredump is initiated on the source VM. Currently, there is no workaround available.	Story Points:	---
Clone Of:	2058982	Environment:
Last Closed:	2023-11-14 15:33:28 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	2058982
Bug Blocks:

Comment 2 aihua liang 2023-03-16 05:56:26 UTC

Hit this issue on:
   Qemu-kvm-6.2.0-20.module+el8.8.0+16744+d3c7858f
   Qemu-kvm-6.2.0-9.module+el8.7.0+14737+6552dcb8
   Qemu-kvm-6.2.0-11.module+el8.6.0+18167+43cf40f3.8
   Qemu-kvm-6.2.0-1.module+el8.6.0+13725+61ae1949

Not hit this issue on:
   Qemu-kvm-6.0.0-29.module+el8.6.0+12490+ec3e565c
   Qemu-kvm-6.1.0-1.module+el8.6.0+12535+4e2af250
   Qemu-kvm-6.1.0-5.module+el8.6.0+13430+8fdd5f85


So, it's a regression bug, and it occurs since qemu-kvm6.2, qemu-kvm-6.2.0-1.module+el8.6.0+13725+61ae1949.

Comment 7 Eric Blake 2023-05-01 18:39:05 UTC

Patches backported for 9.3 were simple enough that I will repeat the exercise for 8.9; will post a link to the merge request once it is available...

Comment 8 Eric Blake 2023-05-02 02:26:18 UTC

Merge request up: https://gitlab.com/redhat/rhel/src/qemu-kvm/qemu-kvm/-/merge_requests/273/diffs

Comment 16 aihua liang 2023-05-23 06:29:48 UTC

Test on qemu-kvm-6.2.0-34.module+el8.9.0+18868+5565e56d, the coredump issue has been resolved.

Test Steps:
 1. mount both src and dst with nfsv4,soft
   (src)#mount 10.73.114.30:/home/kvm_autotest_root/images /mnt/nfs
   (dst)#mount 10.73.114.30:/home/kvm_autotest_root/images /mnt/nfs

 2.Start src guest with qemu cmdline:
   /usr/libexec/qemu-kvm \
     -name 'avocado-vt-vm1'  \
     -sandbox on  \
     -machine q35,memory-backend=mem-machine_mem \
     -device '{"id": "pcie-root-port-0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x1", "chassis": 1}' \
     -device '{"id": "pcie-pci-bridge-0", "driver": "pcie-pci-bridge", "addr": "0x0", "bus": "pcie-root-port-0"}'  \
     -nodefaults \
     -device '{"driver": "VGA", "bus": "pcie.0", "addr": "0x2"}' \
     -m 30720 \
     -object '{"qom-type": "memory-backend-ram", "size": 32212254720, "id": "mem-machine_mem"}'  \
     -smp 12,maxcpus=12,cores=6,threads=1,dies=1,sockets=2  \
     -cpu 'Skylake-Server',+kvm_pv_unhalt \
     -chardev socket,wait=off,server=on,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20230202-211855-PNb4QIQg  \
     -mon chardev=qmp_id_qmpmonitor1,mode=control \
     -chardev socket,wait=off,server=on,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20230202-211855-PNb4QIQg  \
     -mon chardev=qmp_id_catch_monitor,mode=control \
     -device '{"ioport": 1285, "driver": "pvpanic", "id": "idcE8sYZ"}' \
     -chardev socket,wait=off,server=on,id=chardev_serial0,path=/var/tmp/serial-serial0-20230202-211855-PNb4QIQg \
     -device '{"id": "serial0", "driver": "isa-serial", "chardev": "chardev_serial0"}'  \
     -chardev socket,id=seabioslog_id_20230202-211855-PNb4QIQg,path=/var/tmp/seabios-20230202-211855-PNb4QIQg,server=on,wait=off \
     -device isa-debugcon,chardev=seabioslog_id_20230202-211855-PNb4QIQg,iobase=0x402 \
     -device '{"id": "pcie-root-port-1", "port": 1, "driver": "pcie-root-port", "addr": "0x1.0x1", "bus": "pcie.0", "chassis": 2}' \
     -device '{"driver": "qemu-xhci", "id": "usb1", "bus": "pcie-root-port-1", "addr": "0x0"}' \
     -device '{"driver": "usb-tablet", "id": "usb-tablet1", "bus": "usb1.0", "port": "1"}' \
     -object '{"qom-type": "iothread", "id": "iothread0"}' \
     -device '{"id": "pcie-root-port-2", "port": 2, "driver": "pcie-root-port", "addr": "0x1.0x2", "bus": "pcie.0", "chassis": 3}' \
     -device '{"id": "virtio_scsi_pci0", "driver": "virtio-scsi-pci", "bus": "pcie-root-port-2", "addr": "0x0", "iothread": "iothread0"}' \
     -blockdev '{"node-name": "file_image1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/mnt/nfs/rhel930-64-virtio.qcow2", "cache": {"direct": true, "no-flush": false}}' \
     -blockdev '{"node-name": "drive_image1", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_image1"}' \
     -device '{"driver": "scsi-hd", "id": "image1", "drive": "drive_image1", "write-cache": "on"}' \
     -device '{"id": "pcie-root-port-3", "port": 3, "driver": "pcie-root-port", "addr": "0x1.0x3", "bus": "pcie.0", "chassis": 4}' \
     -device '{"driver": "virtio-net-pci", "mac": "9a:df:c4:ac:9f:d2", "id": "idDHNSFZ", "netdev": "id29g0e4", "bus": "pcie-root-port-3", "addr": "0x0"}'  \
     -netdev tap,id=id29g0e4,vhost=on \
     -vnc :0  \
     -rtc base=utc,clock=host,driftfix=slew  \
     -boot menu=off,order=cdn,once=c,strict=off \
     -enable-kvm \
     -device '{"id": "pcie_extra_root_port_0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x3", "chassis": 5}' \
     -monitor stdio \

 3. Check src block info
   (qemu)info block
     drive_image1: /mnt/nfs/rhel930-64-virtio.qcow2 (qcow2)
    Attached to:      image1
    Cache mode:       writeback, direct

 4. Start dst guest with qemu cmdline:
     /usr/libexec/qemu-kvm \
     -name 'avocado-vt-vm1'  \
     -sandbox on  \
     -machine q35,memory-backend=mem-machine_mem \
     -device '{"id": "pcie-root-port-0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x1", "chassis": 1}' \
     -device '{"id": "pcie-pci-bridge-0", "driver": "pcie-pci-bridge", "addr": "0x0", "bus": "pcie-root-port-0"}'  \
     -nodefaults \
     -device '{"driver": "VGA", "bus": "pcie.0", "addr": "0x2"}' \
     -m 30720 \
     -object '{"qom-type": "memory-backend-ram", "size": 32212254720, "id": "mem-machine_mem"}'  \
     -smp 12,maxcpus=12,cores=6,threads=1,dies=1,sockets=2  \
     -cpu 'Skylake-Server',+kvm_pv_unhalt \
     -chardev socket,wait=off,server=on,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20230202-211855-PNb4QIQg  \
     -mon chardev=qmp_id_qmpmonitor1,mode=control \
     -chardev socket,wait=off,server=on,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20230202-211855-PNb4QIQg  \
     -mon chardev=qmp_id_catch_monitor,mode=control \
     -device '{"ioport": 1285, "driver": "pvpanic", "id": "idcE8sYZ"}' \
     -chardev socket,wait=off,server=on,id=chardev_serial0,path=/var/tmp/serial-serial0-20230202-211855-PNb4QIQg \
     -device '{"id": "serial0", "driver": "isa-serial", "chardev": "chardev_serial0"}'  \
     -chardev socket,id=seabioslog_id_20230202-211855-PNb4QIQg,path=/var/tmp/seabios-20230202-211855-PNb4QIQg,server=on,wait=off \
     -device isa-debugcon,chardev=seabioslog_id_20230202-211855-PNb4QIQg,iobase=0x402 \
     -device '{"id": "pcie-root-port-1", "port": 1, "driver": "pcie-root-port", "addr": "0x1.0x1", "bus": "pcie.0", "chassis": 2}' \
     -device '{"driver": "qemu-xhci", "id": "usb1", "bus": "pcie-root-port-1", "addr": "0x0"}' \
     -device '{"driver": "usb-tablet", "id": "usb-tablet1", "bus": "usb1.0", "port": "1"}' \
     -object '{"qom-type": "iothread", "id": "iothread0"}' \
     -device '{"id": "pcie-root-port-2", "port": 2, "driver": "pcie-root-port", "addr": "0x1.0x2", "bus": "pcie.0", "chassis": 3}' \
     -device '{"id": "virtio_scsi_pci0", "driver": "virtio-scsi-pci", "bus": "pcie-root-port-2", "addr": "0x0", "iothread": "iothread0"}' \
     -blockdev '{"node-name": "file_image1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/mnt/nfs/rhel930-64-virtio.qcow2", "cache": {"direct": true, "no-flush": false}}' \
     -blockdev '{"node-name": "drive_image1", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_image1"}' \
     -device '{"driver": "scsi-hd", "id": "image1", "drive": "drive_image1", "write-cache": "on"}' \
     -device '{"id": "pcie-root-port-3", "port": 3, "driver": "pcie-root-port", "addr": "0x1.0x3", "bus": "pcie.0", "chassis": 4}' \
     -device '{"driver": "virtio-net-pci", "mac": "9a:df:c4:ac:9f:d2", "id": "idDHNSFZ", "netdev": "id29g0e4", "bus": "pcie-root-port-3", "addr": "0x0"}'  \
     -netdev tap,id=id29g0e4,vhost=on \
     -vnc :0  \
     -rtc base=utc,clock=host,driftfix=slew  \
     -boot menu=off,order=cdn,once=c,strict=off \
     -enable-kvm \
     -device '{"id": "pcie_extra_root_port_0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x3", "chassis": 5}' \
     -monitor stdio \
     -incoming tcp:0:5000,server=on,wait=off \

 5. Migrate from src to dst
     {"execute": "migrate","arguments":{"uri": "tcp:10.73.114.141:5000"}}

 6. Check migration info
    (qemu)info migration

 7. During migration active, stop nfs server
    (nfs-server):systemctl stop nfs-server.service

 8. Wait about 30 minutes

 9. Start nfs server, then wait for some minutes, check block info in src.
    (nfs-server):systemctl start nfs-server.service
    (qemu)info block


Actual Result:
 In step8, 30 minutes later,
  dst qemu quit automatically for error.
    (qemu) info status
          VM status: paused (inmigrate)
    (qemu) qemu-kvm: load of migration failed: Input/output error

  src qemu return to running status with error reported.
    (qemu) qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1)
qemu-kvm: Could not reopen qcow2 layer: Could not read qcow2 header: Input/output error
    {"execute": "migrate","arguments":{"uri": "tcp:10.73.114.141:5000"}}
{"return": {}}
{"timestamp": {"seconds": 1684814182, "microseconds": 254216}, "event": "STOP"}
{"timestamp": {"seconds": 1684814723, "microseconds": 733510}, "event": "RESUME"}


 In step9, after nfs server restored, src can't connect to nfs server automatically.
   (qemu) info block
image1: [not inserted]
    Attached to:      image1


 They only thing we can do is to quit src vm then start it.

Expected Result:
 src qemu can connect to nfs server automatically.


So, Eric

  I'm not sure if the result is as expected. We know that nfs+hard will continuously retry to connect to nfs server. But for nfs+soft mode, how it will act after nfs server restored? 


BR,
Aliang

Comment 17 Eric Blake 2023-05-23 12:59:39 UTC

(In reply to aihua liang from comment #16)
> Test on qemu-kvm-6.2.0-34.module+el8.9.0+18868+5565e56d, the coredump issue
> has been resolved.

That was the intent of the patches (prevent the core dump, even if the disk image itself can't be recovered), so we've made progress.


...
> 
>  9. Start nfs server, then wait for some minutes, check block info in src.
>     (nfs-server):systemctl start nfs-server.service
>     (qemu)info block
> 
> 
> Actual Result:
>  In step8, 30 minutes later,
>   dst qemu quit automatically for error.
>     (qemu) info status
>           VM status: paused (inmigrate)
>     (qemu) qemu-kvm: load of migration failed: Input/output error
> 
>   src qemu return to running status with error reported.
>     (qemu) qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable:
> bdrv_inactivate_all() failed (-1)
> qemu-kvm: Could not reopen qcow2 layer: Could not read qcow2 header:
> Input/output error

This matches expectations that qemu is now remembering that I/O failed during the time that NFS was unavailable.

>     {"execute": "migrate","arguments":{"uri": "tcp:10.73.114.141:5000"}}
> {"return": {}}
> {"timestamp": {"seconds": 1684814182, "microseconds": 254216}, "event":
> "STOP"}
> {"timestamp": {"seconds": 1684814723, "microseconds": 733510}, "event":
> "RESUME"}
> 
> 
>  In step9, after nfs server restored, src can't connect to nfs server
> automatically.
>    (qemu) info block
> image1: [not inserted]
>     Attached to:      image1
> 
> 
>  They only thing we can do is to quit src vm then start it.

I'm not sure if this can be helped, but it is beyond the scope of what my patches touched.  I'm adding needinfo on Juan (migration) and Kevin (block layer) to chime in on whether there are any ideas on whether the source of a migration should be able to reconnect to an NFS server after previously getting an I/O error, but we may be at the point where qemu has done the best it can of diagnosing that state was lost and the only safe way to recover is to restart the source.

> 
> Expected Result:
>  src qemu can connect to nfs server automatically.
> 
> 
> So, Eric
> 
>   I'm not sure if the result is as expected. We know that nfs+hard will
> continuously retry to connect to nfs server. But for nfs+soft mode, how it
> will act after nfs server restored? 

You've verified that we avoided qemu dumping core on an assertion failure (the most important thing), so if there is still more to be done at letting qemu regain access to an NFS drive after an NFS failure has caused I/O failures, that may be worth splitting into a separate bug.

Comment 18 Kevin Wolf 2023-05-23 15:50:37 UTC

NFS still returning I/O errors after the connection has restored sounds a bit like one part of the thing recently discussed in bug 2178024.

There, Jeff Layton mentioned bug 2180124 as probably addressing the I/O errors and our QE confirmed. The behaviour still isn't entirely as documented (soft mounts don't reliably timeout, but may just hang), but that's not the part we're seeing here. So I expect that on 9.3, we'll be able to recover successfully on the source.

Of course, I don't know if there are any plans to backport the same to RHEL 8.9, but I wouldn't expect it.

Comment 21 Yanan Fu 2023-05-24 12:44:06 UTC

QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 (18894) test pass.

Comment 31 Juan Quintela 2023-06-27 09:47:03 UTC

(In reply to Eric Blake from comment #17)

> > 
> >   src qemu return to running status with error reported.
> >     (qemu) qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable:
> > bdrv_inactivate_all() failed (-1)
> > qemu-kvm: Could not reopen qcow2 layer: Could not read qcow2 header:
> > Input/output error
> 
> This matches expectations that qemu is now remembering that I/O failed
> during the time that NFS was unavailable.

right.


> >  In step9, after nfs server restored, src can't connect to nfs server
> > automatically.
> >    (qemu) info block
> > image1: [not inserted]
> >     Attached to:      image1
> > 
> > 
> >  They only thing we can do is to quit src vm then start it.
> 
> I'm not sure if this can be helped, but it is beyond the scope of what my
> patches touched.  I'm adding needinfo on Juan (migration) and Kevin (block
> layer) to chime in on whether there are any ideas on whether the source of a
> migration should be able to reconnect to an NFS server after previously
> getting an I/O error

We can only recover from synchronous errors.  And what migration does is "basically" a fsync().
If fsync() fails, we don't know what has happened with the pending writes.

Is there a way to recover all that pending writes?
I am not sure that the block layer knows how to recover from this (i.e. that it has enough information to retry all the pending writes).



> but we may be at the point where qemu has done the
> best it can of diagnosing that state was lost and the only safe way to
> recover is to restart the source.

I fully agree here.  I can't think how to recover the source, specially after a real error.
 
> > Expected Result:
> >  src qemu can connect to nfs server automatically.
> > 
> > 
> > So, Eric
> > 
> >   I'm not sure if the result is as expected. We know that nfs+hard will
> > continuously retry to connect to nfs server. But for nfs+soft mode, how it
> > will act after nfs server restored? 
> 
> You've verified that we avoided qemu dumping core on an assertion failure
> (the most important thing), so if there is still more to be done at letting
> qemu regain access to an NFS drive after an NFS failure has caused I/O
> failures, that may be worth splitting into a separate bug.

But what can we do here?  The code does something like().


write(...)
...

write(...)
...  /* several more times and even from more than one thread */

fsync() /* here is where we fail */

We can detect the error, but how can we recover?

Notice that I know that we have code to fix "some" of this problems.  Because we implement "poor man" sparse images with limited space, and where there is a -ENOSPACE, it stops the guest, you can add space at this point, and continue.

But I don't know how much/any extra handling that we have here.

Comment 33 errata-xmlrpc 2023-11-14 15:33:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: virt:rhel and virt-devel:rhel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6980