Bug 1938042
Summary: | [virtual network][windows2012 vm]hotplug/hot-unplug virtio nics in a loop cause qemu process segfault | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Lei Yang <leiyang> | |
Component: | qemu-kvm | Assignee: | Yvugenfi <yvugenfi> | |
qemu-kvm sub component: | Networking | QA Contact: | Lei Yang <leiyang> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | high | |||
Priority: | high | CC: | aadam, chayang, jinzhao, juzhang, mdean, smitterl, virt-maint, yvugenfi | |
Version: | 8.4 | Keywords: | Regression, TestOnly, Triaged | |
Target Milestone: | rc | |||
Target Release: | 8.4 | |||
Hardware: | Unspecified | |||
OS: | Windows | |||
Whiteboard: | ||||
Fixed In Version: | qemu-kvm-6.0.0-17.module+el8.5.0+11173+c9fce0bb | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1964343 (view as bug list) | Environment: | ||
Last Closed: | 2021-11-16 07:52:17 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1948358, 1964343 |
Description
Lei Yang
2021-03-12 02:06:03 UTC
Hi Ariel, Could your team please have a look? It is a regression on rhel8.4 and seems only windows guest especially win2012 is affected. Additional info: 1. qemu-kvm-5.2.0-1.module+el8.4.0+9091+650b220a.x86_64 -> Did not reproduce, windows and rhel guest works well. 2. qemu-kvm-5.2.0-2.module+el8.4.0+9186+ec44380f.x86_64 -> windows 2012 guest reproduce: 2/2, rhel guest works well. I guess the problem should have occurred after qemu-kvm-5.2.0.1 was re-based to qemu-kvm-5.2.0.2. 3. I tested RHEL.8.4 guest with qemu-kvm-5.2.0-1, qemu-kvm-5.2.0-2, qemu-kvm-5.2.0-11 package,all works well. Therefore added 'Windows' to the OS Hi Yan Hit same issue when run virtio-net function test on rhel9-beat host. Do I need to clone a new bug on rhel9? Test Version: kernel-5.12.0-1.el9.x86_64 qemu-kvm-6.0.0-2.el9.x86_64 virtio-win-prewhql-0.1-199.iso Guest: Windows server 2016 How reproducible: 1/10 dmesg: qemu-kvm[52372]: segfault at d0 ip 00005644d40a43bf sp 00007f211ac2d3c0 error 4 in qemu-kvm[5644d4053000+473000] Thanks Lei (In reply to Lei Yang from comment #4) > Hi Yan > > Hit same issue when run virtio-net function test on rhel9-beat host. Do I > need to clone a new bug on rhel9? > > Test Version: > kernel-5.12.0-1.el9.x86_64 > qemu-kvm-6.0.0-2.el9.x86_64 > virtio-win-prewhql-0.1-199.iso > > Guest: > Windows server 2016 > > How reproducible: > 1/10 > > dmesg: > qemu-kvm[52372]: segfault at d0 ip 00005644d40a43bf sp 00007f211ac2d3c0 > error 4 in qemu-kvm[5644d4053000+473000] > > Thanks > Lei Hi Lei, Please clone. Thanks, Yan. A similar bug is BZ#1743098 Hi Yan Is there plans to fix this bug on RHEL.8.5? If it will be fixed on RHEL.8.5. Could you set the DTM and ITR? Thanks in advance. Best Regards Lei Hi Lei, The fix was merged upstream: https://bugzilla.redhat.com/show_bug.cgi?id=1743098#c19 Depending on priority it can be backported or we can wait for the rebase. Best regards, Yan. (In reply to Yvugenfi from comment #8) > Hi Lei, > > The fix was merged upstream: > https://bugzilla.redhat.com/show_bug.cgi?id=1743098#c19 > Depending on priority it can be backported or we can wait for the rebase. > > > Best regards, > Yan. Hi Yan 1. The bug you mentioned above is from the slow train, while the current bug is from the fast train. I found the corresponding fast train bug:https://bugzilla.redhat.com/show_bug.cgi?id=1690256. 2. I tried to use the fixed version to verify it many times, this issue is not reproduced.According to the test results, from QE point of view, this problem has been fixed well on fast train. Could you help me double confirm whether the current bug is fixed on fast train? Maybe this bug can verified on rhel8.5-av. 3. Based on above set ITM=20, please correct me if I'm wrong. => Test Version: qemu-kvm-6.0.0-17.module+el8.5.0+11173+c9fce0bb.x86_64 kernel-4.18.0-316.el8.x86_64 virtio-win-prewhql-0.1-202.iso => Test steps 1.Boot a win2012 guest /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -machine q35,memory-backend=mem-machine_mem \ -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \ -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x2 \ -m 7168 \ -object memory-backend-ram,size=7168M,id=mem-machine_mem \ -smp 6,maxcpus=6,cores=3,threads=1,dies=1,sockets=2 \ -cpu 'Haswell-noTSX',hv_stimer,hv_synic,hv_vpindex,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_frequencies,hv_runtime,hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,+kvm_pv_unhalt \ -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \ -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0 \ -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/win2012-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \ -device virtio-net-pci,mac=9a:1b:f3:b1:31:e9,id=idNZy0cs,netdev=idCbwdpW,bus=pcie-root-port-3,addr=0x0 \ -netdev tap,id=idCbwdpW,vhost=on \ -blockdev node-name=file_cd1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/iso/windows/winutils.iso,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_cd1,driver=raw,read-only=on,cache.direct=on,cache.no-flush=off,file=file_cd1 \ -device scsi-cd,id=cd1,drive=drive_cd1,write-cache=on \ -vnc :0 \ -rtc base=localtime,clock=host,driftfix=slew \ -boot menu=off,order=cdn,once=c,strict=off \ -enable-kvm \ -device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=5 \ -monitor stdio \ -monitor unix:/tmp/monitor2,server,nowait \ 2.hotplug and hotunplug virtio-net-pci with this script. i=1 while [ $i -lt 2000 ] do echo "**************$i**************" sleep 2 echo "netdev_add type=tap,id=net$i,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown"|nc -U /tmp/monitor2 sleep 1 echo "device_add driver=virtio-net-pci,netdev=net$i,mac=9a:d5:d6:d7:d8:d9,id=dev$i,bus=pcie_extra_root_port_0"|nc -U /tmp/monitor2 sleep 10 echo "device_del dev$i"|nc -U /tmp/monitor2 sleep 5 echo "netdev_del net$i"|nc -U /tmp/monitor2 sleep 5 echo "info network"|nc -U /tmp/monitor2 sleep 3 i=$(($i+1)) done 3.Guest works well,no qemu core dump. Best Regards Lei (In reply to Lei Yang from comment #9) > (In reply to Yvugenfi from comment #8) > > Hi Lei, > > > > The fix was merged upstream: > > https://bugzilla.redhat.com/show_bug.cgi?id=1743098#c19 > > Depending on priority it can be backported or we can wait for the rebase. > > > > > > Best regards, > > Yan. > > Hi Yan > > 1. The bug you mentioned above is from the slow train, while the current bug > is from the fast train. I found the corresponding fast train > bug:https://bugzilla.redhat.com/show_bug.cgi?id=1690256. > 2. I tried to use the fixed version to verify it many times, this issue is > not reproduced.According to the test results, from QE point of view, this > problem has been fixed well on fast train. Could you help me double confirm > whether the current bug is fixed on fast train? Maybe this bug can verified > on rhel8.5-av. > 3. Based on above set ITM=20, please correct me if I'm wrong. > > => Test Version: > qemu-kvm-6.0.0-17.module+el8.5.0+11173+c9fce0bb.x86_64 > kernel-4.18.0-316.el8.x86_64 > virtio-win-prewhql-0.1-202.iso > > ... > > 3.Guest works well,no qemu core dump. > > Best Regards > Lei https://lists.gnu.org/archive/html/qemu-devel/2021-06/msg02239.html was not in the qemu 6.0, I also don't see the patches applied to downstream build The failure was not always reproducible in the development environment. (In reply to Yvugenfi from comment #11) > (In reply to Lei Yang from comment #9) > > (In reply to Yvugenfi from comment #8) > > > Hi Lei, > > > > > > The fix was merged upstream: > > > https://bugzilla.redhat.com/show_bug.cgi?id=1743098#c19 > > > Depending on priority it can be backported or we can wait for the rebase. > > > > > > > > > Best regards, > > > Yan. > > > > Hi Yan > > > > 1. The bug you mentioned above is from the slow train, while the current bug > > is from the fast train. I found the corresponding fast train > > bug:https://bugzilla.redhat.com/show_bug.cgi?id=1690256. > > 2. I tried to use the fixed version to verify it many times, this issue is > > not reproduced.According to the test results, from QE point of view, this > > problem has been fixed well on fast train. Could you help me double confirm > > whether the current bug is fixed on fast train? Maybe this bug can verified > > on rhel8.5-av. > > 3. Based on above set ITM=20, please correct me if I'm wrong. > > > > => Test Version: > > qemu-kvm-6.0.0-17.module+el8.5.0+11173+c9fce0bb.x86_64 > > kernel-4.18.0-316.el8.x86_64 > > virtio-win-prewhql-0.1-202.iso > > > > ... > > > > 3.Guest works well,no qemu core dump. > > > > Best Regards > > Lei > > > https://lists.gnu.org/archive/html/qemu-devel/2021-06/msg02239.html was not > in the qemu 6.0, I also don't see the patches applied to downstream build > > The failure was not always reproducible in the development environment. The https://lists.gnu.org/archive/html/qemu-devel/2021-06/msg02239.html is an additional fix on top of the fix that was mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1743098#c19 BZ#1743098 fix is in QEMU6.0. The first fix (mentioned in BZ#1743098)is the one that covers the most probable cases. Hi Meirav Based Comment 9, 10, 11.It may be automatically fixed by rebase based on RHEL-8.6. Could you help to set up the accordingly ITR? Best Regards Lei The commit referenced in comment 8 points at upstream commit https://github.com/qemu/qemu/commit/c3fd706165e9875a10606453ee2785dd51e987a5 This commit was in qemu-6.0 which was used to rebase RHEL-AV 8.5.0, thus let's move this to ON_QA for verification. I've updated the Devel Whiteboard with the commit information, the fixed in version with the package, the ITR to be 8.5.0, and placed ON_QA Based on Comment 9, Set 'Verified:Tested'. ==> Test Steps Test Version: kernel-4.18.0-296.el8.x86_64 qemu-kvm-5.2.0-11.module+el8.4.0+10268+62bcbbed.x86_64 virtio-win-prewhql-0.1-196.iso 1. Boot a win2012 vm /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -machine q35,memory-backend=mem-machine_mem \ -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \ -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x2 \ -m 7168 \ -object memory-backend-ram,size=7168M,id=mem-machine_mem \ -smp 6,maxcpus=6,cores=3,threads=1,dies=1,sockets=2 \ -cpu 'Haswell-noTSX',hv_stimer,hv_synic,hv_vpindex,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_frequencies,hv_runtime,hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,+kvm_pv_unhalt \ -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0 \ -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/win2012-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \ -device virtio-net-pci,mac=9a:67:40:9e:f3:0b,id=idXmC7IZ,netdev=id79CLcJ,bus=pcie-root-port-3,addr=0x0 \ -netdev tap,id=id79CLcJ,vhost=on \ -blockdev node-name=file_cd1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/iso/windows/winutils.iso,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_cd1,driver=raw,read-only=on,cache.direct=on,cache.no-flush=off,file=file_cd1 \ -device scsi-cd,id=cd1,drive=drive_cd1,write-cache=on \ -vnc :0 \ -rtc base=localtime,clock=host,driftfix=slew \ -boot menu=off,order=cdn,once=c,strict=off \ -enable-kvm \ -device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=5 \ -monitor stdio \ -monitor unix:/tmp/monitor2,server,nowait \ 2.hotplug and hotunplug virtio-net-pci with this script. i=1 while [ $i -lt 10000 ] do echo "**************$i**************" sleep 2 echo "netdev_add type=tap,id=net$i,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown"|nc -U /tmp/monitor2 sleep 1 echo "device_add driver=virtio-net-pci,netdev=net$i,mac=9a:d5:d6:d7:d8:d9,id=dev$i,bus=pcie_extra_root_port_0"|nc -U /tmp/monitor2 sleep 10 echo "device_del dev$i"|nc -U /tmp/monitor2 sleep 5 echo "netdev_del net$i"|nc -U /tmp/monitor2 sleep 5 echo "info network"|nc -U /tmp/monitor2 sleep 3 i=$(($i+1)) done 3.Cause qemu core dump ==Reproduced with qemu-kvm-5.2.0-11.module+el8.4.0+10268+62bcbbed.x86_64 ==Verified with qemu-kvm-6.0.0-17.module+el8.5.0+11173+c9fce0bb 1. Boot a win2012 vm /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -machine q35,memory-backend=mem-machine_mem \ -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \ -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x2 \ -m 7168 \ -object memory-backend-ram,size=7168M,id=mem-machine_mem \ -smp 6,maxcpus=6,cores=3,threads=1,dies=1,sockets=2 \ -cpu 'Haswell-noTSX',hv_stimer,hv_synic,hv_vpindex,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_frequencies,hv_runtime,hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,+kvm_pv_unhalt \ -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0 \ -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/win2012-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \ -device virtio-net-pci,mac=9a:67:40:9e:f3:0b,id=idXmC7IZ,netdev=id79CLcJ,bus=pcie-root-port-3,addr=0x0 \ -netdev tap,id=id79CLcJ,vhost=on \ -blockdev node-name=file_cd1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/iso/windows/winutils.iso,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_cd1,driver=raw,read-only=on,cache.direct=on,cache.no-flush=off,file=file_cd1 \ -device scsi-cd,id=cd1,drive=drive_cd1,write-cache=on \ -vnc :0 \ -rtc base=localtime,clock=host,driftfix=slew \ -boot menu=off,order=cdn,once=c,strict=off \ -enable-kvm \ -device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=5 \ -monitor stdio \ -monitor unix:/tmp/monitor2,server,nowait \ 2.hotplug and hotunplug virtio-net-pci with this script. i=1 while [ $i -lt 10000 ] do echo "**************$i**************" sleep 2 echo "netdev_add type=tap,id=net$i,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown"|nc -U /tmp/monitor2 sleep 1 echo "device_add driver=virtio-net-pci,netdev=net$i,mac=9a:d5:d6:d7:d8:d9,id=dev$i,bus=pcie_extra_root_port_0"|nc -U /tmp/monitor2 sleep 10 echo "device_del dev$i"|nc -U /tmp/monitor2 sleep 5 echo "netdev_del net$i"|nc -U /tmp/monitor2 sleep 5 echo "info network"|nc -U /tmp/monitor2 sleep 3 i=$(($i+1)) done 3. Guest works well, Move it to "VERIFIED" Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4684 |