Bug 1737400 - QEMU coredump when hot-plugging an e1000e nic to a q35 machine
Summary: QEMU coredump when hot-plugging an e1000e nic to a q35 machine
Keywords:
Status: VERIFIED
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: ---
Hardware: Unspecified
OS: Linux
low
unspecified
Target Milestone: rc
: 8.0
Assignee: Yan Vugenfirer
QA Contact: Quan Wenli
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-05 09:56 UTC by Gal Hammer
Modified: 2020-06-29 07:07 UTC (History)
10 users (show)

Fixed In Version: qemu-kvm-5.0.0-0.module+el8.3.0+6620+5d5e1420
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)

Description Gal Hammer 2019-08-05 09:56:35 UTC
Description of problem: QEMU coredump after "device_add e1000e,id=net0,bus=pcie_extra_root_port_0"

Version-Release number of selected component (if applicable): Upstream QEMU emulator version 4.0.93 (v4.1.0-rc3-3-g02ac2f7f61)

How reproducible: 100% (but might require several attempts).

Steps to Reproduce:

1. Start qemu with a Linux guest (I used Fedora 29):

qemu-system-x86_64 -m 2G -smp 2 -enable-kvm -M q35 -nodefaults \
	-device pcie-root-port,id=pcie_extra_root_port_0 \
        -monitor stdio \
	-vga cirrus \
        -device ide-drive,drive=drive-virtio0-0-0,id=virtio0-0-0,bootindex=1 \
        -drive file=fedora-29.qcow2,if=none,id=drive-virtio0-0-0,format=qcow2,werror=stop,rerror=stop,cache=none

2. Add an e1000e device: device_add e1000e,id=net0,bus=pcie_extra_root_port_0

3. If the no crash has occur, try to "device_del net0", wait a few seconds till device is removed and try plugging it again.

Actual results:

#0  0x0000557c83f62707 in filter_receive_iov (nc=0x557c8547b150, direction=NET_FILTER_DIRECTION_RX, sender=0x557c85463d00, flags=0, iov=0x557c862f57c0, iovcnt=4, sent_cb=0x0) at net/net.c:563
563	        QTAILQ_FOREACH_REVERSE(nf, &nc->filters, next) {
[Current thread is 1 (Thread 0x7f0a614f6700 (LWP 2493))]
(gdb) bt
#0  0x0000557c83f62707 in filter_receive_iov (nc=0x557c8547b150, direction=NET_FILTER_DIRECTION_RX, sender=0x557c85463d00, flags=0, iov=0x557c862f57c0, iovcnt=4, sent_cb=0x0) at net/net.c:563
#1  0x0000557c83f62d3d in qemu_sendv_packet_async (sender=0x557c85463d00, iov=0x557c862f57c0, iovcnt=4, sent_cb=0x0) at net/net.c:764
#2  0x0000557c83f62db7 in qemu_sendv_packet (nc=0x557c85463d00, iov=0x557c862f57c0, iovcnt=4) at net/net.c:780
#3  0x0000557c83e6a799 in net_tx_pkt_sendv (pkt=0x557c85eae2b0, nc=0x557c85463d00, iov=0x557c862f57c0, iov_cnt=4) at hw/net/net_tx_pkt.c:546
#4  0x0000557c83e6aa94 in net_tx_pkt_send (pkt=0x557c85eae2b0, nc=0x557c85463d00) at hw/net/net_tx_pkt.c:620
#5  0x0000557c83e7179b in e1000e_tx_pkt_send (core=0x557c8650d4e0, tx=0x557c8652d748, queue_index=0) at hw/net/e1000e_core.c:665
#6  0x0000557c83e71a91 in e1000e_process_tx_desc (core=0x557c8650d4e0, tx=0x557c8652d748, dp=0x7f0a614f5480, queue_index=0) at hw/net/e1000e_core.c:742
#7  0x0000557c83e720d5 in e1000e_start_xmit (core=0x557c8650d4e0, txr=0x7f0a614f54d0) at hw/net/e1000e_core.c:933
#8  0x0000557c83e756ce in e1000e_set_tdt (core=0x557c8650d4e0, index=3590, val=1) at hw/net/e1000e_core.c:2450
#9  0x0000557c83e762d0 in e1000e_core_write (core=0x557c8650d4e0, addr=14360, val=1, size=4) at hw/net/e1000e_core.c:3255
#10 0x0000557c83e6cd2b in e1000e_mmio_write (opaque=0x557c8650a810, addr=14360, val=1, size=4) at hw/net/e1000e.c:106
#11 0x0000557c83beb384 in memory_region_write_accessor (mr=0x557c8650d110, addr=14360, value=0x7f0a614f5658, size=4, shift=0, mask=4294967295, attrs=...) at /home/scm/qemu/memory.c:508
#12 0x0000557c83beb594 in access_with_adjusted_size (addr=14360, value=0x7f0a614f5658, size=4, access_size_min=4, access_size_max=4, access_fn=
    0x557c83beb29b <memory_region_write_accessor>, mr=0x557c8650d110, attrs=...) at /home/scm/qemu/memory.c:574
#13 0x0000557c83bee57e in memory_region_dispatch_write (mr=0x557c8650d110, addr=14360, data=1, size=4, attrs=...) at /home/scm/qemu/memory.c:1502
#14 0x0000557c83b8eac7 in flatview_write_continue
    (fv=0x7f0a400cfe00, addr=4270077976, attrs=..., buf=0x7f0a70394028 <error: Cannot access memory at address 0x7f0a70394028>, len=4, addr1=14360, l=4, mr=0x557c8650d110) at /home/scm/qemu/exec.c:3337
#15 0x0000557c83b8ec0c in flatview_write (fv=0x7f0a400cfe00, addr=4270077976, attrs=..., buf=0x7f0a70394028 <error: Cannot access memory at address 0x7f0a70394028>, len=4) at /home/scm/qemu/exec.c:3376
#16 0x0000557c83b8ef11 in address_space_write (as=0x557c849544c0 <address_space_memory>, addr=4270077976, attrs=..., buf=0x7f0a70394028 <error: Cannot access memory at address 0x7f0a70394028>, len=4)
    at /home/scm/qemu/exec.c:3466
#17 0x0000557c83b8ef63 in address_space_rw
    (as=0x557c849544c0 <address_space_memory>, addr=4270077976, attrs=..., buf=0x7f0a70394028 <error: Cannot access memory at address 0x7f0a70394028>, len=4, is_write=true) at /home/scm/qemu/exec.c:3477
#18 0x0000557c83c044df in kvm_cpu_exec (cpu=0x557c85703fe0) at /home/scm/qemu/accel/kvm/kvm-all.c:2286
#19 0x0000557c83bddd0f in qemu_kvm_cpu_thread_fn (arg=0x557c85703fe0) at /home/scm/qemu/cpus.c:1285
#20 0x0000557c8410904d in qemu_thread_start (args=0x557c854bd950) at util/qemu-thread-posix.c:502
#21 0x00007f0a73c1358e in start_thread () at /lib64/libpthread.so.0
#22 0x00007f0a73b42713 in clone () at /lib64/libc.so.6

Comment 2 Pei Zhang 2019-08-06 06:23:31 UTC
Thanks Gal for reporting this issue.

I tried to reproduce. This bug can be triggered when only hot plug e1000e device but without tap. If we hot plug a tap device first, then hot plug e1000e, qemu will work well. 

(qemu) netdev_add tap,id=hostnet1
(qemu) device_add e1000e,netdev=hostnet1,id=net1,mac=88:66:da:5f:dd:02,bus=root.5
(qemu) 

As tap is a necessary device for e1000e when guest communicate with external servers. So we usually hot plug a tap device first, then hot plug an e1000e device in our past testings.

Comment 6 Ademar Reis 2020-02-05 23:02:01 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 7 John Ferlan 2020-02-06 21:04:18 UTC
This looks like Yan forgot to set the Triaged keyword when changing the subcomponent and reassigning to virt-maint.

I'll place the needinfo on Yan to double check - Yan - if you want this back, then just reassign to yourself; otherwise, just clear the needinfo

Comment 8 Yan Vugenfirer 2020-02-09 08:52:06 UTC
(In reply to John Ferlan from comment #7)
> This looks like Yan forgot to set the Triaged keyword when changing the
> subcomponent and reassigning to virt-maint.
> 
> I'll place the needinfo on Yan to double check - Yan - if you want this
> back, then just reassign to yourself; otherwise, just clear the needinfo

Sorry, John. Reassigning back to myself.

Comment 9 Philippe Mathieu-Daudé 2020-03-04 00:12:28 UTC
Likely fix posted: https://www.mail-archive.com/qemu-devel@nongnu.org/msg685060.html

Comment 10 Yan Vugenfirer 2020-03-04 09:46:28 UTC
(In reply to Philippe Mathieu-Daudé from comment #9)
> Likely fix posted:
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg685060.html

Yes, this is a fix.

Comment 11 John Ferlan 2020-04-21 16:49:28 UTC
Setting the ITR to 8.3.0 since having a bug in POST in the backlog (ITR = '---') doesn't make sense.   Also setting the devel_ack+.

NB: Fix is included in qemu-5.0 (commit id f22a57ac09abdd5afd8a974b52c19eda9347cffd)

Comment 12 Quan Wenli 2020-06-28 05:31:06 UTC
Reproduce it with qemu-kvm-4.2.0-26.module+el8.2.1+7079+67e06423.x86_64. 

hotplug and unhotplug e1000e nic 6 times, the qemu was crashed. 

(qemu) device_add e1000e,id=net0,bus=pcie_extra_root_port_0 
(qemu) device_del net0
(qemu) 156579 Segmentation fault      (core dumped) /usr/libexec/qemu-kvm -m 2G -smp 2 -enable-kvm -M q35 -nodefaults -device pcie-root-port,id=pcie_extra_root_port_0 -monitor stdio -vga cirrus -device ide-drive,drive=drive-virtio0-0-0,id=virtio0-0-0,bootindex=1 -drive file=/root/rhel830-64-virtio.qcow2,if=none,id=drive-virtio0-0-0,format=qcow2,werror=stop,rerror=stop,cache=none


Retried with qemu-kvm-5.0.0-0.module+el8.3.0+6620+5d5e1420 with 10 times. the qemu works well. 


Once it's on_qa status, will set it to VERIFIED.

Comment 13 Quan Wenli 2020-06-29 07:07:14 UTC
Base on comment #12, set it to verified.


Note You need to log in before you can comment on or make changes to this bug.