Bug 1945826 - [WRB][QEMU6.0] netdev_add cause qemu Segmentation fault
Summary: [WRB][QEMU6.0] netdev_add cause qemu Segmentation fault
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.5
Hardware: Unspecified
OS: All
high
high
Target Milestone: beta
: ---
Assignee: lulu@redhat.com
QA Contact: Lei Yang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-02 06:10 UTC by Yanan Fu
Modified: 2021-11-16 08:27 UTC (History)
12 users (show)

Fixed In Version: qemu-kvm-6.0.0-16.module+el8.5.0+10848+2dccc46d
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-11-16 07:52:31 UTC
Type: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2021:4684 0 None None None 2021-11-16 07:53:15 UTC

Description Yanan Fu 2021-04-02 06:10:53 UTC
Description of problem:
[WRB][QEMU6.0] netdev_add cause qemu Segmentation fault

Version-Release number of selected component (if applicable):
qemu-kvm-6.0.0-14rc0.scrmod+el8.5.0+10480+a8e067ae.wrb210325
https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=1550013

How reproducible:
100%

Steps to Reproduce:
1. launch a vm
2. netdev_add:
(qemu) netdev_add tap,id=tap0
(qemu) Segmentation fault (core dumped)



Actual results:
QEMU Segmentation fault

Expected results:
netdev_add success


Additional info:
Test with both pc and q35 machine type, all have the same issue.

Comment 1 Yanan Fu 2021-04-02 06:21:34 UTC
core file: fileshare.englab.nay.redhat.com/pub/section2/coredump/yfu/bz1945826/

Comment 2 Yanan Fu 2021-04-02 06:42:10 UTC
Update the core file path:  http://fileshare.englab.nay.redhat.com/pub/section2/coredump/yfu/bz1945826/

# gdb /usr/libexec/qemu-kvm core-qemu-kvm-487775-1617345326 
...
(gdb) bt
#0  0x0000555a9cfb3c7f in tap_send (opaque=0x555a9e303800) at ../net/tap.c:206
#1  0x0000555a9d236f19 in aio_dispatch_handler (ctx=ctx@entry=0x555a9e0209c0, node=0x555a9e22dcb0)
    at ../util/aio-posix.c:329
#2  0x0000555a9d23778c in aio_dispatch_handlers (ctx=0x555a9e0209c0) at ../util/aio-posix.c:372
#3  0x0000555a9d23778c in aio_dispatch (ctx=0x555a9e0209c0) at ../util/aio-posix.c:382
#4  0x0000555a9d249872 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>)
    at ../util/async.c:306
#5  0x00007f9a2f0fc77d in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#6  0x0000555a9d242b10 in glib_pollfds_poll () at ../util/main-loop.c:231
#7  0x0000555a9d242b10 in os_host_main_loop_wait (timeout=<optimized out>) at ../util/main-loop.c:254
#8  0x0000555a9d242b10 in main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:530
#9  0x0000555a9d0dfe29 in qemu_main_loop () at ../softmmu/runstate.c:725
#10 0x0000555a9cea22c2 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../softmmu/main.c:50

Comment 4 John Ferlan 2021-04-05 18:20:51 UTC
Moved to RHEL-AV and assigned directly to Jason as it seems the following has an issue:


commit 969e50b61a285b0cc8dea6d4d2ade3f758d5ecc7
Author: Bin Meng <bmeng.cn>
Date:   Wed Mar 17 14:26:29 2021 +0800

    net: Pad short frames to minimum size before sending from SLiRP/TAP
    

...
@@ -189,6 +190,8 @@ static void tap_send(void *opaque)
 
     while (true) {
         uint8_t *buf = s->buf;
+        uint8_t min_pkt[ETH_ZLEN];
+        size_t min_pktsz = sizeof(min_pkt);
 
         size = tap_read_packet(s->fd, s->buf, sizeof(s->buf));
         if (size <= 0) {
@@ -200,6 +203,13 @@ static void tap_send(void *opaque)
             size -= s->host_vnet_hdr_len;
         }
 
+        if (!s->nc.peer->do_not_pad) {
+            if (eth_pad_short_frame(min_pkt, &min_pktsz, buf, size)) {
+                buf = min_pkt;
+                size = min_pktsz;
+            }
+        }
+
....

hopefully something that can be addressed before qemu-6.0 upstream is released.

Comment 5 Yumei Huang 2021-04-29 03:08:48 UTC
Hit same issue when boot guest with below cmd, qemu core dumped directly. The gdb bt is same to comment 2.

# /usr/libexec/qemu-kvm \
    --preconfig  \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine q35 \
    -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
    -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x2 \
    -m 4096 \
    -object memory-backend-ram,size=1024M,id=mem-mem0 \
    -object memory-backend-ram,size=1024M,id=mem-mem1 \
    -object memory-backend-ram,size=1024M,id=mem-mem2 \
    -object memory-backend-ram,size=1024M,id=mem-mem3  \
    -smp 8,maxcpus=8,cores=2,threads=1,dies=2,sockets=2  \
    -numa node,memdev=mem-mem0,nodeid=0  \
    -numa node,memdev=mem-mem1,nodeid=1  \
    -numa node,memdev=mem-mem2,nodeid=2  \
    -numa node,memdev=mem-mem3,nodeid=3  \
    -numa cpu,node-id=0,socket-id=0,die-id=0,core-id=0,thread-id=0  \
    -numa cpu,node-id=1,socket-id=0,die-id=1,core-id=0,thread-id=0  \
    -numa cpu,node-id=2,socket-id=1,die-id=0,core-id=0,thread-id=0  \
    -numa cpu,node-id=3,socket-id=1,die-id=1,core-id=0,thread-id=0  \
    -cpu 'EPYC-Rome'\
    -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
    -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0 \
    -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/win2019-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
    -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
    -device virtio-net-pci,mac=9a:23:f5:4b:4f:66,id=idMSAk2k,netdev=idGXdg9o,bus=pcie-root-port-3,addr=0x0  \
    -netdev tap,id=idGXdg9o \
    -blockdev node-name=file_cd1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/iso/windows/winutils.iso,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_cd1,driver=raw,read-only=on,cache.direct=on,cache.no-flush=off,file=file_cd1 \
    -device scsi-cd,id=cd1,drive=drive_cd1,write-cache=on  \
    -vnc :0  \
    -rtc base=localtime,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -enable-kvm \
    -device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=5

Comment 6 Yumei Huang 2021-04-29 03:10:15 UTC
(In reply to Yumei Huang from comment #5)
> Hit same issue when boot guest with below cmd, qemu core dumped directly.
> The gdb bt is same to comment 2.
> 

Add qemu version: 
qemu-kvm-6.0.0-15rc4.scrmod+el8.4.0+10735+03b13f0b.wrb210422

Comment 7 Lei Yang 2021-04-29 03:34:47 UTC
During the rhel8.4(qemu-kvm-5.2) test was not hit this issue. So add the keyword 'Regression'.

Comment 8 Yanan Fu 2021-04-29 13:29:10 UTC
This issue gone with rc5:
qemu-kvm-core-6.0.0-15rc5.scrmod+el8.5.0+10801+f1aef2c6.wrb210428.x86_64


Checked the changelog, should be fixed by commit:
commit bc38e31b4e0366f3a70c0939abde4c3dd6e0fa30
Author: Jason Wang <jasowang>
Date:   Fri Apr 23 11:18:03 2021 +0800

    net: check the existence of peer before trying to pad
    
    There could be case that peer is NULL. This can happen when during
    network device hot-add where net device needs to be added first. So
    the patch check the existence of peer before trying to do the pad.
    
    Fixes: 969e50b61a285 ("net: Pad short frames to minimum size before sending from SLiRP/TAP")
    Signed-off-by: Jason Wang <jasowang>
    Reviewed-by: Bin Meng <bmeng.cn>
    Reviewed-by: Stefan Weil <sw>
    Message-id: 20210423031803.1479-1-jasowang
    Signed-off-by: Peter Maydell <peter.maydell>



Let's wait for the official downstream build to double check it.

Comment 9 Lei Yang 2021-05-07 03:06:36 UTC
Hi,Cindy

I tried to test it with the latest version - 'qemu-kvm-6.0.0-16.module+el8.5.0+10848+2dccc46d.x86_64',There is no the issue any more. Therefore, I set ITM to 13. Could you help me review and change the status of bz? 

Best Regards
Lei

Comment 10 lulu@redhat.com 2021-05-08 06:11:43 UTC
(In reply to Lei Yang from comment #9)
> Hi,Cindy
> 
> I tried to test it with the latest version -
> 'qemu-kvm-6.0.0-16.module+el8.5.0+10848+2dccc46d.x86_64',There is no the
> issue any more. Therefore, I set ITM to 13. Could you help me review and
> change the status of bz? 
> 
> Best Regards
> Lei

Hi Lei 
I agree with Yanan Fu, this commit should fix this issue, And also verified in my own system 

commit bc38e31b4e0366f3a70c0939abde4c3dd6e0fa30
Author: Jason Wang <jasowang>
Date:   Fri Apr 23 11:18:03 2021 +0800

    net: check the existence of peer before trying to pad
    
    There could be case that peer is NULL. This can happen when during
    network device hot-add where net device needs to be added first. So
    the patch check the existence of peer before trying to do the pad.
    
    Fixes: 969e50b61a285 ("net: Pad short frames to minimum size before sending from SLiRP/TAP")
    Signed-off-by: Jason Wang <jasowang>
    Reviewed-by: Bin Meng <bmeng.cn>
    Reviewed-by: Stefan Weil <sw>
    Message-id: 20210423031803.1479-1-jasowang
    Signed-off-by: Peter Maydell <peter.maydell>

Comment 11 Yanan Fu 2021-05-08 07:02:12 UTC
Hi Lulu,

Could you help update the 'Fixed in version' together ?
Or, you are not the right person, and the package maintainer(ddepaula) is response for that ?


Thanks!

Best regards
Yanan Fu

Comment 12 Yanan Fu 2021-05-08 08:21:47 UTC
The qemu-6.0.0-rc5 is not a downstream version, I update the fixed in version to the downstream build nvr: qemu-kvm-6.0.0-16.module+el8.5.0+10848+2dccc46d.x86_64
Correct me if i am wrong, thanks!

Comment 13 lulu@redhat.com 2021-05-10 02:05:21 UTC
(In reply to Yanan Fu from comment #12)
> The qemu-6.0.0-rc5 is not a downstream version, I update the fixed in
> version to the downstream build nvr:
> qemu-kvm-6.0.0-16.module+el8.5.0+10848+2dccc46d.x86_64
> Correct me if i am wrong, thanks!

Thanks for your help yanan :-)

Comment 14 Lei Yang 2021-05-10 02:15:52 UTC
Hi,Ariel

Could you please help to set devel_ack+ for this bug?

Thanks
Lei

Comment 16 Yanan Fu 2021-05-11 09:34:50 UTC
Set Verified:Tested,SanityOnly as gating/tier1 test pass.

Comment 19 Lei Yang 2021-05-13 06:45:10 UTC
==> Test steps

1.Boot up a vm
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1'  \
-sandbox on  \
-machine q35 \
-device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
-device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
-nodefaults \
-device VGA,bus=pcie.0,addr=0x2 \
-m 7168  \
-smp 6,maxcpus=6,cores=3,threads=1,dies=1,sockets=2  \
-cpu 'Haswell-noTSX',+kvm_pv_unhalt \
-device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
-device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0 \
-blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel850-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
-blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
-device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
-device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
-device virtio-net-pci,mac=9a:51:78:f5:ae:a8,id=idDFfBtb,netdev=id9iAopo,bus=pcie-root-port-3,addr=0x0  \
-netdev tap,id=id9iAopo,vhost=on \
-vnc :0  \
-rtc base=utc,clock=host,driftfix=slew  \
-boot menu=off,order=cdn,once=c,strict=off \
-enable-kvm \
-device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=5 \
-monitor stdio \
-qmp tcp:0:5555,server,nowait \

2.Hot plug nic
#telnet 10.73.225.40 5555
Trying 10.73.225.40...
Connected to 10.73.225.40.
Escape character is '^]'.
{"QMP": {"version": {"qemu": {"micro": 50, "minor": 0, "major": 6}, "package": "qemu-kvm-6.0.0-14rc0.scrmod+el8.5.0+10480+a8e067ae.wrb210325"}, "capabilities": ["oob"]}}
{'execute': 'qmp_capabilities'}
{"return": {}}
{'execute': 'netdev_add', 'arguments': {'type': 'tap', 'id': 'idd6ICtb'}}
 [qemu output] /tmp/aexpect_6rkp81EJ/aexpect-cu2r0vbz.sh: line 1: 207159 Segmentation fault      (core dumped)

==Reproduced with qemu-kvm-6.0.0-14rc0.scrmod+el8.5.0+10480+a8e067ae.wrb210325

==Verified with qemu-kvm-6.0.0-16.module+el8.5.0+10848+2dccc46d.x86_64
1. Boot up a vm
2.Hot plug nic
#telnet 10.73.225.40 5555
Trying 10.73.225.40...
Connected to 10.73.225.40.
Escape character is '^]'.
{"QMP": {"version": {"qemu": {"micro": 50, "minor": 0, "major": 6}, "package": "qemu-kvm-6.0.0-16.module+el8.5.0+10848+2dccc46d"}, "capabilities": ["oob"]}}
{'execute': 'qmp_capabilities'}
{"return": {}}
{'execute': 'netdev_add', 'arguments': {'type': 'tap', 'id': 'idd6ICtb'}}
{"return": {}}
{'execute': 'device_add', 'arguments': {'id':'idhjRMYp','driver':'virtio-net-pci','netdev':'idd6ICtb','mac':'9a:d5:67:68:05:f4','bus': 'pcie_extra_root_port_0','addr':'0x0'}}
{"return": {}}
{"timestamp": {"seconds": 1620887972, "microseconds": 834951}, "event": "NIC_RX_FILTER_CHANGED", "data": {"name": "idhjRMYp", "path": "/machine/peripheral/idhjRMYp/virtio-backend"}}

3. Guest works well,so this bug has been fixed very well on qemu-kvm-6.0.0-16.module+el8.5.0+10848+2dccc46d.x86_64. Move it to "VERIFIED"

Comment 21 errata-xmlrpc 2021-11-16 07:52:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4684


Note You need to log in before you can comment on or make changes to this bug.