Bug 1685775

Summary: Boot Win2019 guest without network devices will cause qemu crash
Product: Red Hat Enterprise Linux 8 Reporter: Lei Yang <leiyang>
Component: qemu-kvmAssignee: Marc-Andre Lureau <marcandre.lureau>
qemu-kvm sub component: General QA Contact: Lei Yang <leiyang>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: ailan, chayang, ddepaula, jen, juzhang, kanderso, marcandre.lureau, pezhang, philmd, rbalakri, virt-maint, ybendito, yvugenfi
Version: 8.2Keywords: TestOnly
Target Milestone: rcFlags: knoel: mirror+
Target Release: 8.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-4.2.0-20 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-28 07:12:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1810193, 1829082    
Bug Blocks: 1771318    

Description Lei Yang 2019-03-06 02:34:19 UTC
Description of problem:
Hot plug 1 PF in the qemu line with Win2019 guest,guest work well.However cause qemu crash when hot unplug 1 PF

Version-Release number of selected component (if applicable):
qemu-kvm-2.12.0-63.module+el8+2833+c7d6d092.x86_64
virtio-win-1.9.7-3.el8.noarch
kernel-4.18.0-74.el8.x86_64

How reproducible:
4/4

Steps to Reproduce:
1.Boot the guest

/usr/libexec/qemu-kvm -name Win2019 \
-M q35,kernel-irqchip=split -m 1G \
-cpu Haswell-noTSX,hv_stimer,hv_synic,hv_time,hv_relaxed,hv_vpindex,hv_spinlocks=0xfff,hv_vapic,hv_reset,hv_crash \
-device intel-iommu,intremap=true,caching-mode=true \
-smp 2,sockets=1,cores=2,threads=1 \
-device pcie-root-port,id=root.1,chassis=1 \
-device pcie-root-port,id=root.2,chassis=2 \
-device pcie-root-port,id=root.3,chassis=3 \
-device pcie-root-port,id=root.4,chassis=4 \
-device pcie-root-port,id=root.5,chassis=5 \
-device pcie-root-port,id=root.6,chassis=6 \
-blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/home/win2019.qcow2,node-name=my_file \
-blockdev driver=qcow2,node-name=my,file=my_file \
-device virtio-blk-pci,drive=my,id=virtio-blk0,bus=root.1 \
-drive id=drive_cd1,if=none,snapshot=off,aio=native,cache=none,media=cdrom,file=/home/en_windows_server_2019_x64_dvd_4cb967d8.iso \
-device ide-cd,id=cd1,drive=drive_cd1,bus=ide.0,unit=0 \
-drive id=drive_winutils,if=none,snapshot=off,aio=native,cache=none,media=cdrom,file=/usr/share/virtio-win/virtio-win-1.9.7.iso \
-device ide-cd,id=winutils,drive=drive_winutils,bus=ide.1,unit=0 \
-vnc :0 \
-vga qxl \
-monitor stdio \
-usb -device usb-tablet \
-boot menu=on \
-qmp tcp:0:5555,server,nowait \
-device vfio-pci,host=04:00.0,bus=root.3,id=pf-1 \

2.Hot unplug 1 PF

$ telnet dell-per730-29.lab.eng.pek2.redhat.com 5555
Trying 10.73.73.75...
Connected to dell-per730-29.lab.eng.pek2.redhat.com.
Escape character is '^]'.
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 12, "major": 2}, "package": "qemu-kvm-2.12.0-63.module+el8+2833+c7d6d092"}, "capabilities": []}}
{"execute":"qmp_capabilities"}
{"return": {}}
{"execute":"device_del","arguments":{"id":"pf-1"}}
{"return": {}}
{"timestamp": {"seconds": 1551838544, "microseconds": 321081}, "event": "DEVICE_DELETED", "data": {"device": "pf-1", "path": "/machine/peripheral/pf-1"}}
Connection closed by foreign host.

Actual results:
qemu crash when hot unplug

Expected results:
no calltrace or error is observed both in guest and host

Additional info:
1.(qemu) infofree(): invalid next size (normal)

Thread 6 "qemu-kvm" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe97d6700 (LWP 14348)]
0x00007ffff2bd093f in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff2bd093f in raise () from /lib64/libc.so.6
#1  0x00007ffff2bbac95 in abort () from /lib64/libc.so.6
#2  0x00007ffff2c13d57 in __libc_message () from /lib64/libc.so.6
#3  0x00007ffff2c1a68c in malloc_printerr () from /lib64/libc.so.6
#4  0x00007ffff2c1c0b4 in _int_free () from /lib64/libc.so.6
#5  0x00007ffff76b8552 in g_free () from /lib64/libglib-2.0.so.0
#6  0x0000555555a7c869 in m_free ()
#7  0x0000555555a80d35 in tcp_input ()
#8  0x0000555555a7c050 in slirp_input ()
#9  0x0000555555a6bfd4 in net_slirp_receive ()
#10 0x0000555555a6444e in qemu_deliver_packet_iov ()
#11 0x0000555555a66d68 in qemu_net_queue_send_iov ()
#12 0x0000555555a672cb in net_hub_port_receive_iov ()
#13 0x0000555555a64387 in qemu_deliver_packet_iov ()
#14 0x0000555555a66d68 in qemu_net_queue_send_iov ()
#15 0x00005555559fb711 in net_tx_pkt_do_sw_fragmentation ()
#16 0x00005555559fc393 in net_tx_pkt_send ()
#17 0x0000555555a00618 in e1000e_start_xmit.isra ()
#18 0x0000555555a00721 in e1000e_set_tdt ()
#19 0x000055555588ab86 in memory_region_write_accessor ()
#20 0x0000555555888f36 in access_with_adjusted_size ()
#21 0x000055555588cd7a in memory_region_dispatch_write ()
#22 0x000055555583b1c3 in flatview_write ()
#23 0x000055555583f683 in address_space_write ()
#24 0x000055555589bce8 in kvm_cpu_exec ()
#25 0x000055555587874e in qemu_kvm_cpu_thread_fn ()
#26 0x00007ffff2f652de in start_thread () from /lib64/libpthread.so.0
#27 0x00007ffff2c95a63 in clone () from /lib64/libc.so.6

2.#dmesg
[80164.731310] vfio-pci 0000:04:00.0: enabling device (0400 -> 0402)
[80164.844794] vfio-pci 0000:04:00.0: Masking broken INTx support
[80164.851378] vfio_ecap_init: 0000:04:00.0 hiding ecap 0x19@0x1d0
[80573.896468] vfio-pci 0000:04:00.0: enabling device (0400 -> 0402)
[80574.010930] vfio-pci 0000:04:00.0: Masking broken INTx support
[80574.017517] vfio_ecap_init: 0000:04:00.0 hiding ecap 0x19@0x1d0
[80575.321712] vfio_bar_restore: 0000:04:00.0 reset recovery - restoring bars
[80575.593586] vfio_bar_restore: 0000:04:00.0 reset recovery - restoring bars

Comment 1 Lei Yang 2019-03-06 07:40:30 UTC
If boot qemu with "-nodefaults" (other cmd line keep same as above), everything works well. So this should not be VFIO issue.

Comment 2 Lei Yang 2019-03-06 08:34:40 UTC
The same as qemu commad bootting rhel8 guest,guest work well.

Comment 4 Yvugenfi@redhat.com 2019-03-12 09:20:23 UTC
(In reply to Lei Yang from comment #1)
> If boot qemu with "-nodefaults" (other cmd line keep same as above),
> everything works well. So this should not be VFIO issue.

That's the thing. It looks that the problem is in e1000e implementation in QEMU. So when you start VM with "-nodefaults", it is not present.

In the crash trace we can see that the issue originates in e1000e card implementation:
#17 0x0000555555a00618 in e1000e_start_xmit.isra ()
#18 0x0000555555a00721 in e1000e_set_tdt ()

Comment 14 Ademar Reis 2020-02-05 22:54:56 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 24 Marc-Andre Lureau 2020-03-03 11:17:06 UTC
This reminds me of bug 1734745, although we didn't save typical stack traces, as they could occur in various places.
It was fixed with qemu-kvm-2.12.0-83.module+el8.1.0+3852+0ba8aef0.
Can you reproduce with this version or more recent?
thanks

Comment 25 Lei Yang 2020-03-04 05:39:03 UTC
(In reply to Marc-Andre Lureau from comment #24)
> This reminds me of bug 1734745, although we didn't save typical stack
> traces, as they could occur in various places.
> It was fixed with qemu-kvm-2.12.0-83.module+el8.1.0+3852+0ba8aef0.
> Can you reproduce with this version or more recent?
> thanks
Hi,Marc-Andre

Boot Win2019 guest and not add "- nodefaults" in qemu command line,guest will crash on qemu-kvm-2.12.0-83.module+el8.1.0+3852+0ba8aef0.
qemu cli:
/usr/libexec/qemu-kvm -name Win2019 \
-M q35,kernel-irqchip=split -m 1G \
-cpu Haswell-noTSX,hv_stimer,hv_synic,hv_time,hv_relaxed,hv_vpindex,hv_spinlocks=0xfff,hv_vapic,hv_reset,hv_crash \
-device intel-iommu,intremap=true,caching-mode=true \
-smp 2,sockets=1,cores=2,threads=1 \
-device pcie-root-port,id=root.1,chassis=1 \
-device pcie-root-port,id=root.2,chassis=2 \
-device pcie-root-port,id=root.3,chassis=3 \
-blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/home/win2019.qcow2,node-name=my_file \
-blockdev driver=qcow2,node-name=my,file=my_file \
-device virtio-blk-pci,drive=my,id=virtio-blk0,bus=root.1 \
-drive id=drive_cd1,if=none,snapshot=off,aio=native,cache=none,media=cdrom,file=/home/en_windows_server_2019_updated_march_2019_x64_dvd_2ae967ab.iso \
-device ide-cd,id=cd1,drive=drive_cd1,bus=ide.0,unit=0 \
-drive id=drive_winutils,if=none,snapshot=off,aio=native,cache=none,media=cdrom,file=/usr/share/virtio-win/virtio-win-1.9.8.iso \
-device ide-cd,id=winutils,drive=drive_winutils,bus=ide.1,unit=0 \
-vnc :0 \
-vga qxl \
-monitor stdio \
-usb -device usb-tablet \
-boot menu=on \
-nodefaults \
-qmp tcp:0:5555,server,nowait \

Best regards
Lei

Comment 30 Yvugenfi@redhat.com 2020-03-12 09:15:55 UTC
*** Bug 1685445 has been marked as a duplicate of this bug. ***

Comment 31 Danilo de Paula 2020-03-16 18:07:49 UTC
So this in AV seems to be a mistake.
According to Marc, this is already fixed in AV. 
According to my findings, the patches addressing this BZs are included since AV-8.0.1

So, we need this in RHEL.
I don't know why it got moved to AV, but Amnon, if my understanding of this is incorrect, please let me know.
Meanwhile, we will need justification for exception+. Marc, can you provide it, please?

Comment 32 Danilo de Paula 2020-03-17 22:07:48 UTC
QA_ACK please?

Comment 37 Danilo de Paula 2020-05-11 15:29:28 UTC
Both patches fixing this BZs were merged into 8.2 from AV-8.2.1.
Moving this patch to MODIFIED

Comment 39 Lei Yang 2020-05-12 06:53:21 UTC
===Verified with qemu-kvm-4.2.0-20.module+el8.2.1+6467+49dc3278.x86_64

1.Boot Win2019 guest and not add "- nodefaults" in qemu command line.
qemu cmd:
/usr/libexec/qemu-kvm -name Win2019 \
-M q35,kernel-irqchip=split -m 4G \
-cpu Haswell-noTSX,hv_stimer,hv_synic,hv_time,hv_relaxed,hv_vpindex,hv_spinlocks=0xfff,hv_vapic,hv_reset,hv_crash \
-device intel-iommu,intremap=on,caching-mode=on \
-smp 2,sockets=1,cores=2,threads=1 \
-device pcie-root-port,id=root.1,chassis=1 \
-device pcie-root-port,id=root.2,chassis=2 \
-device pcie-root-port,id=root.3,chassis=3 \
-blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/home/win2019.qcow2,node-name=my_file \
-blockdev driver=qcow2,node-name=my,file=my_file \
-device virtio-blk-pci,drive=my,id=virtio-blk0,bus=root.1 \
-drive id=drive_cd1,if=none,snapshot=off,aio=native,cache=none,media=cdrom,file=/home/en_windows_server_2019_x64_dvd_4cb967d8.iso \
-device ide-cd,id=cd1,drive=drive_cd1,bus=ide.0,unit=0 \
-drive id=drive_winutils,if=none,snapshot=off,aio=native,cache=none,media=cdrom,file=/usr/share/virtio-win/virtio-win-1.9.8.iso \
-device ide-cd,id=winutils,drive=drive_winutils,bus=ide.1,unit=0 \
-vnc :0 \
-vga qxl \
-monitor stdio \
-usb -device usb-tablet \
-boot menu=on \
-qmp tcp:0:5555,server,nowait \

2.Guest does not crash.

So this bug has been fixed very well. Move to 'VERIFIED'.

Comment 41 errata-xmlrpc 2020-07-28 07:12:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3172