Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1737400

Summary:	QEMU coredump when hot-plugging an e1000e nic to a q35 machine
Product:	Red Hat Enterprise Linux Advanced Virtualization	Reporter:	Gal Hammer <ghammer>
Component:	qemu-kvm	Assignee:	Yvugenfi <yvugenfi>
qemu-kvm sub component:	Networking	QA Contact:	Quan Wenli <wquan>
Status:	CLOSED CURRENTRELEASE	Docs Contact:
Severity:	unspecified
Priority:	low	CC:	chayang, ehabkost, jinzhao, juzhang, leiyang, mrezanin, pezhang, philmd, virt-maint, yvugenfi
Version:	---	Keywords:	Triaged
Target Milestone:	rc	Flags:	pm-rhel: mirror+
Target Release:	8.0
Hardware:	Unspecified
OS:	Linux
Whiteboard:
Fixed In Version:	qemu-kvm-5.0.0-0.module+el8.3.0+6620+5d5e1420	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-01-08 16:59:45 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Gal Hammer 2019-08-05 09:56:35 UTC

Description of problem: QEMU coredump after "device_add e1000e,id=net0,bus=pcie_extra_root_port_0"

Version-Release number of selected component (if applicable): Upstream QEMU emulator version 4.0.93 (v4.1.0-rc3-3-g02ac2f7f61)

How reproducible: 100% (but might require several attempts).

Steps to Reproduce:

1. Start qemu with a Linux guest (I used Fedora 29):

qemu-system-x86_64 -m 2G -smp 2 -enable-kvm -M q35 -nodefaults \
	-device pcie-root-port,id=pcie_extra_root_port_0 \
        -monitor stdio \
	-vga cirrus \
        -device ide-drive,drive=drive-virtio0-0-0,id=virtio0-0-0,bootindex=1 \
        -drive file=fedora-29.qcow2,if=none,id=drive-virtio0-0-0,format=qcow2,werror=stop,rerror=stop,cache=none

2. Add an e1000e device: device_add e1000e,id=net0,bus=pcie_extra_root_port_0

3. If the no crash has occur, try to "device_del net0", wait a few seconds till device is removed and try plugging it again.

Actual results:

#0  0x0000557c83f62707 in filter_receive_iov (nc=0x557c8547b150, direction=NET_FILTER_DIRECTION_RX, sender=0x557c85463d00, flags=0, iov=0x557c862f57c0, iovcnt=4, sent_cb=0x0) at net/net.c:563
563	        QTAILQ_FOREACH_REVERSE(nf, &nc->filters, next) {
[Current thread is 1 (Thread 0x7f0a614f6700 (LWP 2493))]
(gdb) bt
#0  0x0000557c83f62707 in filter_receive_iov (nc=0x557c8547b150, direction=NET_FILTER_DIRECTION_RX, sender=0x557c85463d00, flags=0, iov=0x557c862f57c0, iovcnt=4, sent_cb=0x0) at net/net.c:563
#1  0x0000557c83f62d3d in qemu_sendv_packet_async (sender=0x557c85463d00, iov=0x557c862f57c0, iovcnt=4, sent_cb=0x0) at net/net.c:764
#2  0x0000557c83f62db7 in qemu_sendv_packet (nc=0x557c85463d00, iov=0x557c862f57c0, iovcnt=4) at net/net.c:780
#3  0x0000557c83e6a799 in net_tx_pkt_sendv (pkt=0x557c85eae2b0, nc=0x557c85463d00, iov=0x557c862f57c0, iov_cnt=4) at hw/net/net_tx_pkt.c:546
#4  0x0000557c83e6aa94 in net_tx_pkt_send (pkt=0x557c85eae2b0, nc=0x557c85463d00) at hw/net/net_tx_pkt.c:620
#5  0x0000557c83e7179b in e1000e_tx_pkt_send (core=0x557c8650d4e0, tx=0x557c8652d748, queue_index=0) at hw/net/e1000e_core.c:665
#6  0x0000557c83e71a91 in e1000e_process_tx_desc (core=0x557c8650d4e0, tx=0x557c8652d748, dp=0x7f0a614f5480, queue_index=0) at hw/net/e1000e_core.c:742
#7  0x0000557c83e720d5 in e1000e_start_xmit (core=0x557c8650d4e0, txr=0x7f0a614f54d0) at hw/net/e1000e_core.c:933
#8  0x0000557c83e756ce in e1000e_set_tdt (core=0x557c8650d4e0, index=3590, val=1) at hw/net/e1000e_core.c:2450
#9  0x0000557c83e762d0 in e1000e_core_write (core=0x557c8650d4e0, addr=14360, val=1, size=4) at hw/net/e1000e_core.c:3255
#10 0x0000557c83e6cd2b in e1000e_mmio_write (opaque=0x557c8650a810, addr=14360, val=1, size=4) at hw/net/e1000e.c:106
#11 0x0000557c83beb384 in memory_region_write_accessor (mr=0x557c8650d110, addr=14360, value=0x7f0a614f5658, size=4, shift=0, mask=4294967295, attrs=...) at /home/scm/qemu/memory.c:508
#12 0x0000557c83beb594 in access_with_adjusted_size (addr=14360, value=0x7f0a614f5658, size=4, access_size_min=4, access_size_max=4, access_fn=
    0x557c83beb29b <memory_region_write_accessor>, mr=0x557c8650d110, attrs=...) at /home/scm/qemu/memory.c:574
#13 0x0000557c83bee57e in memory_region_dispatch_write (mr=0x557c8650d110, addr=14360, data=1, size=4, attrs=...) at /home/scm/qemu/memory.c:1502
#14 0x0000557c83b8eac7 in flatview_write_continue
    (fv=0x7f0a400cfe00, addr=4270077976, attrs=..., buf=0x7f0a70394028 <error: Cannot access memory at address 0x7f0a70394028>, len=4, addr1=14360, l=4, mr=0x557c8650d110) at /home/scm/qemu/exec.c:3337
#15 0x0000557c83b8ec0c in flatview_write (fv=0x7f0a400cfe00, addr=4270077976, attrs=..., buf=0x7f0a70394028 <error: Cannot access memory at address 0x7f0a70394028>, len=4) at /home/scm/qemu/exec.c:3376
#16 0x0000557c83b8ef11 in address_space_write (as=0x557c849544c0 <address_space_memory>, addr=4270077976, attrs=..., buf=0x7f0a70394028 <error: Cannot access memory at address 0x7f0a70394028>, len=4)
    at /home/scm/qemu/exec.c:3466
#17 0x0000557c83b8ef63 in address_space_rw
    (as=0x557c849544c0 <address_space_memory>, addr=4270077976, attrs=..., buf=0x7f0a70394028 <error: Cannot access memory at address 0x7f0a70394028>, len=4, is_write=true) at /home/scm/qemu/exec.c:3477
#18 0x0000557c83c044df in kvm_cpu_exec (cpu=0x557c85703fe0) at /home/scm/qemu/accel/kvm/kvm-all.c:2286
#19 0x0000557c83bddd0f in qemu_kvm_cpu_thread_fn (arg=0x557c85703fe0) at /home/scm/qemu/cpus.c:1285
#20 0x0000557c8410904d in qemu_thread_start (args=0x557c854bd950) at util/qemu-thread-posix.c:502
#21 0x00007f0a73c1358e in start_thread () at /lib64/libpthread.so.0
#22 0x00007f0a73b42713 in clone () at /lib64/libc.so.6

Comment 2 Pei Zhang 2019-08-06 06:23:31 UTC

Thanks Gal for reporting this issue.

I tried to reproduce. This bug can be triggered when only hot plug e1000e device but without tap. If we hot plug a tap device first, then hot plug e1000e, qemu will work well. 

(qemu) netdev_add tap,id=hostnet1
(qemu) device_add e1000e,netdev=hostnet1,id=net1,mac=88:66:da:5f:dd:02,bus=root.5
(qemu) 

As tap is a necessary device for e1000e when guest communicate with external servers. So we usually hot plug a tap device first, then hot plug an e1000e device in our past testings.

Comment 6 Ademar Reis 2020-02-05 23:02:01 UTC

QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 7 John Ferlan 2020-02-06 21:04:18 UTC

This looks like Yan forgot to set the Triaged keyword when changing the subcomponent and reassigning to virt-maint.

I'll place the needinfo on Yan to double check - Yan - if you want this back, then just reassign to yourself; otherwise, just clear the needinfo

Comment 8 Yvugenfi@redhat.com 2020-02-09 08:52:06 UTC

(In reply to John Ferlan from comment #7)
> This looks like Yan forgot to set the Triaged keyword when changing the
> subcomponent and reassigning to virt-maint.
> 
> I'll place the needinfo on Yan to double check - Yan - if you want this
> back, then just reassign to yourself; otherwise, just clear the needinfo

Sorry, John. Reassigning back to myself.

Comment 9 Philippe Mathieu-Daudé 2020-03-04 00:12:28 UTC

Likely fix posted: https://www.mail-archive.com/qemu-devel@nongnu.org/msg685060.html

Comment 10 Yvugenfi@redhat.com 2020-03-04 09:46:28 UTC

(In reply to Philippe Mathieu-Daudé from comment #9)
> Likely fix posted:
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg685060.html

Yes, this is a fix.

Comment 11 John Ferlan 2020-04-21 16:49:28 UTC

Setting the ITR to 8.3.0 since having a bug in POST in the backlog (ITR = '---') doesn't make sense.   Also setting the devel_ack+.

NB: Fix is included in qemu-5.0 (commit id f22a57ac09abdd5afd8a974b52c19eda9347cffd)

Comment 12 Quan Wenli 2020-06-28 05:31:06 UTC

Reproduce it with qemu-kvm-4.2.0-26.module+el8.2.1+7079+67e06423.x86_64. 

hotplug and unhotplug e1000e nic 6 times, the qemu was crashed. 

(qemu) device_add e1000e,id=net0,bus=pcie_extra_root_port_0 
(qemu) device_del net0
(qemu) 156579 Segmentation fault      (core dumped) /usr/libexec/qemu-kvm -m 2G -smp 2 -enable-kvm -M q35 -nodefaults -device pcie-root-port,id=pcie_extra_root_port_0 -monitor stdio -vga cirrus -device ide-drive,drive=drive-virtio0-0-0,id=virtio0-0-0,bootindex=1 -drive file=/root/rhel830-64-virtio.qcow2,if=none,id=drive-virtio0-0-0,format=qcow2,werror=stop,rerror=stop,cache=none


Retried with qemu-kvm-5.0.0-0.module+el8.3.0+6620+5d5e1420 with 10 times. the qemu works well. 


Once it's on_qa status, will set it to VERIFIED.

Comment 13 Quan Wenli 2020-06-29 07:07:14 UTC

Base on comment #12, set it to verified.

Comment 15 Jeff Nelson 2021-01-08 16:59:45 UTC

This BZ was not attached to an advisory and therefore was not closed when RHEL AV was shipped. Correcting this now by marking the BZ CLOSED CURRENTRELEASE.