Bug 1945826
Summary: | [WRB][QEMU6.0] netdev_add cause qemu Segmentation fault | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Yanan Fu <yfu> |
Component: | qemu-kvm | Assignee: | lulu <lulu> |
qemu-kvm sub component: | Networking | QA Contact: | Lei Yang <leiyang> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | aadam, chayang, ddepaula, juzhang, lulu, mrezanin, virt-maint, xfu, yanghliu, yfu, ymankad, yuhuang |
Version: | 8.5 | Keywords: | Regression, Triaged |
Target Milestone: | beta | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-6.0.0-16.module+el8.5.0+10848+2dccc46d | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-11-16 07:52:31 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Yanan Fu
2021-04-02 06:10:53 UTC
core file: fileshare.englab.nay.redhat.com/pub/section2/coredump/yfu/bz1945826/ Update the core file path: http://fileshare.englab.nay.redhat.com/pub/section2/coredump/yfu/bz1945826/ # gdb /usr/libexec/qemu-kvm core-qemu-kvm-487775-1617345326 ... (gdb) bt #0 0x0000555a9cfb3c7f in tap_send (opaque=0x555a9e303800) at ../net/tap.c:206 #1 0x0000555a9d236f19 in aio_dispatch_handler (ctx=ctx@entry=0x555a9e0209c0, node=0x555a9e22dcb0) at ../util/aio-posix.c:329 #2 0x0000555a9d23778c in aio_dispatch_handlers (ctx=0x555a9e0209c0) at ../util/aio-posix.c:372 #3 0x0000555a9d23778c in aio_dispatch (ctx=0x555a9e0209c0) at ../util/aio-posix.c:382 #4 0x0000555a9d249872 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at ../util/async.c:306 #5 0x00007f9a2f0fc77d in g_main_context_dispatch () at /lib64/libglib-2.0.so.0 #6 0x0000555a9d242b10 in glib_pollfds_poll () at ../util/main-loop.c:231 #7 0x0000555a9d242b10 in os_host_main_loop_wait (timeout=<optimized out>) at ../util/main-loop.c:254 #8 0x0000555a9d242b10 in main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:530 #9 0x0000555a9d0dfe29 in qemu_main_loop () at ../softmmu/runstate.c:725 #10 0x0000555a9cea22c2 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../softmmu/main.c:50 Moved to RHEL-AV and assigned directly to Jason as it seems the following has an issue: commit 969e50b61a285b0cc8dea6d4d2ade3f758d5ecc7 Author: Bin Meng <bmeng.cn> Date: Wed Mar 17 14:26:29 2021 +0800 net: Pad short frames to minimum size before sending from SLiRP/TAP ... @@ -189,6 +190,8 @@ static void tap_send(void *opaque) while (true) { uint8_t *buf = s->buf; + uint8_t min_pkt[ETH_ZLEN]; + size_t min_pktsz = sizeof(min_pkt); size = tap_read_packet(s->fd, s->buf, sizeof(s->buf)); if (size <= 0) { @@ -200,6 +203,13 @@ static void tap_send(void *opaque) size -= s->host_vnet_hdr_len; } + if (!s->nc.peer->do_not_pad) { + if (eth_pad_short_frame(min_pkt, &min_pktsz, buf, size)) { + buf = min_pkt; + size = min_pktsz; + } + } + .... hopefully something that can be addressed before qemu-6.0 upstream is released. Hit same issue when boot guest with below cmd, qemu core dumped directly. The gdb bt is same to comment 2. # /usr/libexec/qemu-kvm \ --preconfig \ -name 'avocado-vt-vm1' \ -sandbox on \ -machine q35 \ -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \ -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x2 \ -m 4096 \ -object memory-backend-ram,size=1024M,id=mem-mem0 \ -object memory-backend-ram,size=1024M,id=mem-mem1 \ -object memory-backend-ram,size=1024M,id=mem-mem2 \ -object memory-backend-ram,size=1024M,id=mem-mem3 \ -smp 8,maxcpus=8,cores=2,threads=1,dies=2,sockets=2 \ -numa node,memdev=mem-mem0,nodeid=0 \ -numa node,memdev=mem-mem1,nodeid=1 \ -numa node,memdev=mem-mem2,nodeid=2 \ -numa node,memdev=mem-mem3,nodeid=3 \ -numa cpu,node-id=0,socket-id=0,die-id=0,core-id=0,thread-id=0 \ -numa cpu,node-id=1,socket-id=0,die-id=1,core-id=0,thread-id=0 \ -numa cpu,node-id=2,socket-id=1,die-id=0,core-id=0,thread-id=0 \ -numa cpu,node-id=3,socket-id=1,die-id=1,core-id=0,thread-id=0 \ -cpu 'EPYC-Rome'\ -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \ -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0 \ -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/win2019-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \ -device virtio-net-pci,mac=9a:23:f5:4b:4f:66,id=idMSAk2k,netdev=idGXdg9o,bus=pcie-root-port-3,addr=0x0 \ -netdev tap,id=idGXdg9o \ -blockdev node-name=file_cd1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/iso/windows/winutils.iso,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_cd1,driver=raw,read-only=on,cache.direct=on,cache.no-flush=off,file=file_cd1 \ -device scsi-cd,id=cd1,drive=drive_cd1,write-cache=on \ -vnc :0 \ -rtc base=localtime,clock=host,driftfix=slew \ -boot menu=off,order=cdn,once=c,strict=off \ -enable-kvm \ -device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=5 (In reply to Yumei Huang from comment #5) > Hit same issue when boot guest with below cmd, qemu core dumped directly. > The gdb bt is same to comment 2. > Add qemu version: qemu-kvm-6.0.0-15rc4.scrmod+el8.4.0+10735+03b13f0b.wrb210422 During the rhel8.4(qemu-kvm-5.2) test was not hit this issue. So add the keyword 'Regression'. This issue gone with rc5: qemu-kvm-core-6.0.0-15rc5.scrmod+el8.5.0+10801+f1aef2c6.wrb210428.x86_64 Checked the changelog, should be fixed by commit: commit bc38e31b4e0366f3a70c0939abde4c3dd6e0fa30 Author: Jason Wang <jasowang> Date: Fri Apr 23 11:18:03 2021 +0800 net: check the existence of peer before trying to pad There could be case that peer is NULL. This can happen when during network device hot-add where net device needs to be added first. So the patch check the existence of peer before trying to do the pad. Fixes: 969e50b61a285 ("net: Pad short frames to minimum size before sending from SLiRP/TAP") Signed-off-by: Jason Wang <jasowang> Reviewed-by: Bin Meng <bmeng.cn> Reviewed-by: Stefan Weil <sw> Message-id: 20210423031803.1479-1-jasowang Signed-off-by: Peter Maydell <peter.maydell> Let's wait for the official downstream build to double check it. Hi,Cindy I tried to test it with the latest version - 'qemu-kvm-6.0.0-16.module+el8.5.0+10848+2dccc46d.x86_64',There is no the issue any more. Therefore, I set ITM to 13. Could you help me review and change the status of bz? Best Regards Lei (In reply to Lei Yang from comment #9) > Hi,Cindy > > I tried to test it with the latest version - > 'qemu-kvm-6.0.0-16.module+el8.5.0+10848+2dccc46d.x86_64',There is no the > issue any more. Therefore, I set ITM to 13. Could you help me review and > change the status of bz? > > Best Regards > Lei Hi Lei I agree with Yanan Fu, this commit should fix this issue, And also verified in my own system commit bc38e31b4e0366f3a70c0939abde4c3dd6e0fa30 Author: Jason Wang <jasowang> Date: Fri Apr 23 11:18:03 2021 +0800 net: check the existence of peer before trying to pad There could be case that peer is NULL. This can happen when during network device hot-add where net device needs to be added first. So the patch check the existence of peer before trying to do the pad. Fixes: 969e50b61a285 ("net: Pad short frames to minimum size before sending from SLiRP/TAP") Signed-off-by: Jason Wang <jasowang> Reviewed-by: Bin Meng <bmeng.cn> Reviewed-by: Stefan Weil <sw> Message-id: 20210423031803.1479-1-jasowang Signed-off-by: Peter Maydell <peter.maydell> Hi Lulu, Could you help update the 'Fixed in version' together ? Or, you are not the right person, and the package maintainer(ddepaula) is response for that ? Thanks! Best regards Yanan Fu The qemu-6.0.0-rc5 is not a downstream version, I update the fixed in version to the downstream build nvr: qemu-kvm-6.0.0-16.module+el8.5.0+10848+2dccc46d.x86_64 Correct me if i am wrong, thanks! (In reply to Yanan Fu from comment #12) > The qemu-6.0.0-rc5 is not a downstream version, I update the fixed in > version to the downstream build nvr: > qemu-kvm-6.0.0-16.module+el8.5.0+10848+2dccc46d.x86_64 > Correct me if i am wrong, thanks! Thanks for your help yanan :-) Hi,Ariel Could you please help to set devel_ack+ for this bug? Thanks Lei Set Verified:Tested,SanityOnly as gating/tier1 test pass. ==> Test steps 1.Boot up a vm /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -machine q35 \ -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \ -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x2 \ -m 7168 \ -smp 6,maxcpus=6,cores=3,threads=1,dies=1,sockets=2 \ -cpu 'Haswell-noTSX',+kvm_pv_unhalt \ -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \ -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0 \ -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel850-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \ -device virtio-net-pci,mac=9a:51:78:f5:ae:a8,id=idDFfBtb,netdev=id9iAopo,bus=pcie-root-port-3,addr=0x0 \ -netdev tap,id=id9iAopo,vhost=on \ -vnc :0 \ -rtc base=utc,clock=host,driftfix=slew \ -boot menu=off,order=cdn,once=c,strict=off \ -enable-kvm \ -device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=5 \ -monitor stdio \ -qmp tcp:0:5555,server,nowait \ 2.Hot plug nic #telnet 10.73.225.40 5555 Trying 10.73.225.40... Connected to 10.73.225.40. Escape character is '^]'. {"QMP": {"version": {"qemu": {"micro": 50, "minor": 0, "major": 6}, "package": "qemu-kvm-6.0.0-14rc0.scrmod+el8.5.0+10480+a8e067ae.wrb210325"}, "capabilities": ["oob"]}} {'execute': 'qmp_capabilities'} {"return": {}} {'execute': 'netdev_add', 'arguments': {'type': 'tap', 'id': 'idd6ICtb'}} [qemu output] /tmp/aexpect_6rkp81EJ/aexpect-cu2r0vbz.sh: line 1: 207159 Segmentation fault (core dumped) ==Reproduced with qemu-kvm-6.0.0-14rc0.scrmod+el8.5.0+10480+a8e067ae.wrb210325 ==Verified with qemu-kvm-6.0.0-16.module+el8.5.0+10848+2dccc46d.x86_64 1. Boot up a vm 2.Hot plug nic #telnet 10.73.225.40 5555 Trying 10.73.225.40... Connected to 10.73.225.40. Escape character is '^]'. {"QMP": {"version": {"qemu": {"micro": 50, "minor": 0, "major": 6}, "package": "qemu-kvm-6.0.0-16.module+el8.5.0+10848+2dccc46d"}, "capabilities": ["oob"]}} {'execute': 'qmp_capabilities'} {"return": {}} {'execute': 'netdev_add', 'arguments': {'type': 'tap', 'id': 'idd6ICtb'}} {"return": {}} {'execute': 'device_add', 'arguments': {'id':'idhjRMYp','driver':'virtio-net-pci','netdev':'idd6ICtb','mac':'9a:d5:67:68:05:f4','bus': 'pcie_extra_root_port_0','addr':'0x0'}} {"return": {}} {"timestamp": {"seconds": 1620887972, "microseconds": 834951}, "event": "NIC_RX_FILTER_CHANGED", "data": {"name": "idhjRMYp", "path": "/machine/peripheral/idhjRMYp/virtio-backend"}} 3. Guest works well,so this bug has been fixed very well on qemu-kvm-6.0.0-16.module+el8.5.0+10848+2dccc46d.x86_64. Move it to "VERIFIED" Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4684 |