Bug 1737400 - QEMU coredump when hot-plugging an e1000e nic to a q35 machine
Summary: QEMU coredump when hot-plugging an e1000e nic to a q35 machine
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: ---
Hardware: Unspecified
OS: Linux
low
unspecified
Target Milestone: rc
: 8.0
Assignee: Yvugenfi@redhat.com
QA Contact: Quan Wenli
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-05 09:56 UTC by Gal Hammer
Modified: 2021-01-08 16:59 UTC (History)
10 users (show)

Fixed In Version: qemu-kvm-5.0.0-0.module+el8.3.0+6620+5d5e1420
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-01-08 16:59:45 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)

Description Gal Hammer 2019-08-05 09:56:35 UTC
Description of problem: QEMU coredump after "device_add e1000e,id=net0,bus=pcie_extra_root_port_0"

Version-Release number of selected component (if applicable): Upstream QEMU emulator version 4.0.93 (v4.1.0-rc3-3-g02ac2f7f61)

How reproducible: 100% (but might require several attempts).

Steps to Reproduce:

1. Start qemu with a Linux guest (I used Fedora 29):

qemu-system-x86_64 -m 2G -smp 2 -enable-kvm -M q35 -nodefaults \
	-device pcie-root-port,id=pcie_extra_root_port_0 \
        -monitor stdio \
	-vga cirrus \
        -device ide-drive,drive=drive-virtio0-0-0,id=virtio0-0-0,bootindex=1 \
        -drive file=fedora-29.qcow2,if=none,id=drive-virtio0-0-0,format=qcow2,werror=stop,rerror=stop,cache=none

2. Add an e1000e device: device_add e1000e,id=net0,bus=pcie_extra_root_port_0

3. If the no crash has occur, try to "device_del net0", wait a few seconds till device is removed and try plugging it again.

Actual results:

#0  0x0000557c83f62707 in filter_receive_iov (nc=0x557c8547b150, direction=NET_FILTER_DIRECTION_RX, sender=0x557c85463d00, flags=0, iov=0x557c862f57c0, iovcnt=4, sent_cb=0x0) at net/net.c:563
563	        QTAILQ_FOREACH_REVERSE(nf, &nc->filters, next) {
[Current thread is 1 (Thread 0x7f0a614f6700 (LWP 2493))]
(gdb) bt
#0  0x0000557c83f62707 in filter_receive_iov (nc=0x557c8547b150, direction=NET_FILTER_DIRECTION_RX, sender=0x557c85463d00, flags=0, iov=0x557c862f57c0, iovcnt=4, sent_cb=0x0) at net/net.c:563
#1  0x0000557c83f62d3d in qemu_sendv_packet_async (sender=0x557c85463d00, iov=0x557c862f57c0, iovcnt=4, sent_cb=0x0) at net/net.c:764
#2  0x0000557c83f62db7 in qemu_sendv_packet (nc=0x557c85463d00, iov=0x557c862f57c0, iovcnt=4) at net/net.c:780
#3  0x0000557c83e6a799 in net_tx_pkt_sendv (pkt=0x557c85eae2b0, nc=0x557c85463d00, iov=0x557c862f57c0, iov_cnt=4) at hw/net/net_tx_pkt.c:546
#4  0x0000557c83e6aa94 in net_tx_pkt_send (pkt=0x557c85eae2b0, nc=0x557c85463d00) at hw/net/net_tx_pkt.c:620
#5  0x0000557c83e7179b in e1000e_tx_pkt_send (core=0x557c8650d4e0, tx=0x557c8652d748, queue_index=0) at hw/net/e1000e_core.c:665
#6  0x0000557c83e71a91 in e1000e_process_tx_desc (core=0x557c8650d4e0, tx=0x557c8652d748, dp=0x7f0a614f5480, queue_index=0) at hw/net/e1000e_core.c:742
#7  0x0000557c83e720d5 in e1000e_start_xmit (core=0x557c8650d4e0, txr=0x7f0a614f54d0) at hw/net/e1000e_core.c:933
#8  0x0000557c83e756ce in e1000e_set_tdt (core=0x557c8650d4e0, index=3590, val=1) at hw/net/e1000e_core.c:2450
#9  0x0000557c83e762d0 in e1000e_core_write (core=0x557c8650d4e0, addr=14360, val=1, size=4) at hw/net/e1000e_core.c:3255
#10 0x0000557c83e6cd2b in e1000e_mmio_write (opaque=0x557c8650a810, addr=14360, val=1, size=4) at hw/net/e1000e.c:106
#11 0x0000557c83beb384 in memory_region_write_accessor (mr=0x557c8650d110, addr=14360, value=0x7f0a614f5658, size=4, shift=0, mask=4294967295, attrs=...) at /home/scm/qemu/memory.c:508
#12 0x0000557c83beb594 in access_with_adjusted_size (addr=14360, value=0x7f0a614f5658, size=4, access_size_min=4, access_size_max=4, access_fn=
    0x557c83beb29b <memory_region_write_accessor>, mr=0x557c8650d110, attrs=...) at /home/scm/qemu/memory.c:574
#13 0x0000557c83bee57e in memory_region_dispatch_write (mr=0x557c8650d110, addr=14360, data=1, size=4, attrs=...) at /home/scm/qemu/memory.c:1502
#14 0x0000557c83b8eac7 in flatview_write_continue
    (fv=0x7f0a400cfe00, addr=4270077976, attrs=..., buf=0x7f0a70394028 <error: Cannot access memory at address 0x7f0a70394028>, len=4, addr1=14360, l=4, mr=0x557c8650d110) at /home/scm/qemu/exec.c:3337
#15 0x0000557c83b8ec0c in flatview_write (fv=0x7f0a400cfe00, addr=4270077976, attrs=..., buf=0x7f0a70394028 <error: Cannot access memory at address 0x7f0a70394028>, len=4) at /home/scm/qemu/exec.c:3376
#16 0x0000557c83b8ef11 in address_space_write (as=0x557c849544c0 <address_space_memory>, addr=4270077976, attrs=..., buf=0x7f0a70394028 <error: Cannot access memory at address 0x7f0a70394028>, len=4)
    at /home/scm/qemu/exec.c:3466
#17 0x0000557c83b8ef63 in address_space_rw
    (as=0x557c849544c0 <address_space_memory>, addr=4270077976, attrs=..., buf=0x7f0a70394028 <error: Cannot access memory at address 0x7f0a70394028>, len=4, is_write=true) at /home/scm/qemu/exec.c:3477
#18 0x0000557c83c044df in kvm_cpu_exec (cpu=0x557c85703fe0) at /home/scm/qemu/accel/kvm/kvm-all.c:2286
#19 0x0000557c83bddd0f in qemu_kvm_cpu_thread_fn (arg=0x557c85703fe0) at /home/scm/qemu/cpus.c:1285
#20 0x0000557c8410904d in qemu_thread_start (args=0x557c854bd950) at util/qemu-thread-posix.c:502
#21 0x00007f0a73c1358e in start_thread () at /lib64/libpthread.so.0
#22 0x00007f0a73b42713 in clone () at /lib64/libc.so.6

Comment 2 Pei Zhang 2019-08-06 06:23:31 UTC
Thanks Gal for reporting this issue.

I tried to reproduce. This bug can be triggered when only hot plug e1000e device but without tap. If we hot plug a tap device first, then hot plug e1000e, qemu will work well. 

(qemu) netdev_add tap,id=hostnet1
(qemu) device_add e1000e,netdev=hostnet1,id=net1,mac=88:66:da:5f:dd:02,bus=root.5
(qemu) 

As tap is a necessary device for e1000e when guest communicate with external servers. So we usually hot plug a tap device first, then hot plug an e1000e device in our past testings.

Comment 6 Ademar Reis 2020-02-05 23:02:01 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 7 John Ferlan 2020-02-06 21:04:18 UTC
This looks like Yan forgot to set the Triaged keyword when changing the subcomponent and reassigning to virt-maint.

I'll place the needinfo on Yan to double check - Yan - if you want this back, then just reassign to yourself; otherwise, just clear the needinfo

Comment 8 Yvugenfi@redhat.com 2020-02-09 08:52:06 UTC
(In reply to John Ferlan from comment #7)
> This looks like Yan forgot to set the Triaged keyword when changing the
> subcomponent and reassigning to virt-maint.
> 
> I'll place the needinfo on Yan to double check - Yan - if you want this
> back, then just reassign to yourself; otherwise, just clear the needinfo

Sorry, John. Reassigning back to myself.

Comment 9 Philippe Mathieu-Daudé 2020-03-04 00:12:28 UTC
Likely fix posted: https://www.mail-archive.com/qemu-devel@nongnu.org/msg685060.html

Comment 10 Yvugenfi@redhat.com 2020-03-04 09:46:28 UTC
(In reply to Philippe Mathieu-Daudé from comment #9)
> Likely fix posted:
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg685060.html

Yes, this is a fix.

Comment 11 John Ferlan 2020-04-21 16:49:28 UTC
Setting the ITR to 8.3.0 since having a bug in POST in the backlog (ITR = '---') doesn't make sense.   Also setting the devel_ack+.

NB: Fix is included in qemu-5.0 (commit id f22a57ac09abdd5afd8a974b52c19eda9347cffd)

Comment 12 Quan Wenli 2020-06-28 05:31:06 UTC
Reproduce it with qemu-kvm-4.2.0-26.module+el8.2.1+7079+67e06423.x86_64. 

hotplug and unhotplug e1000e nic 6 times, the qemu was crashed. 

(qemu) device_add e1000e,id=net0,bus=pcie_extra_root_port_0 
(qemu) device_del net0
(qemu) 156579 Segmentation fault      (core dumped) /usr/libexec/qemu-kvm -m 2G -smp 2 -enable-kvm -M q35 -nodefaults -device pcie-root-port,id=pcie_extra_root_port_0 -monitor stdio -vga cirrus -device ide-drive,drive=drive-virtio0-0-0,id=virtio0-0-0,bootindex=1 -drive file=/root/rhel830-64-virtio.qcow2,if=none,id=drive-virtio0-0-0,format=qcow2,werror=stop,rerror=stop,cache=none


Retried with qemu-kvm-5.0.0-0.module+el8.3.0+6620+5d5e1420 with 10 times. the qemu works well. 


Once it's on_qa status, will set it to VERIFIED.

Comment 13 Quan Wenli 2020-06-29 07:07:14 UTC
Base on comment #12, set it to verified.

Comment 15 Jeff Nelson 2021-01-08 16:59:45 UTC
This BZ was not attached to an advisory and therefore was not closed when RHEL AV was shipped. Correcting this now by marking the BZ CLOSED CURRENTRELEASE.


Note You need to log in before you can comment on or make changes to this bug.