Bug 2128929
Summary: | [rhel9.2] hotplug/hotunplug mlx vdpa device to the occupied addr port, then qemu core dump occurs after shutdown guest | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Lei Yang <leiyang> | |
Component: | qemu-kvm | Assignee: | Ani Sinha <anisinha> | |
qemu-kvm sub component: | PCI | QA Contact: | Yiqian Wei <yiwei> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | high | |||
Priority: | high | CC: | anisinha, chayang, coli, imammedo, jinzhao, jusual, juzhang, lvivier, mrezanin, virt-maint, ybendito, yiwei, ymankad | |
Version: | 9.1 | Keywords: | Regression, Triaged, ZStream | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | qemu-kvm-8.0.0-7.el9 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2227721 (view as bug list) | Environment: | ||
Last Closed: | 2023-11-07 08:26:38 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2227721 |
Description
Lei Yang
2022-09-22 01:58:47 UTC
Additional info: I tried to used the following qemu cmd, then repeat test the above test steps, it can not reproduced this bug. qemu cmd: /usr/libexec/qemu-kvm \ -name guest=rhel9.2,debug-threads=on \ -blockdev '{"driver":"file","filename":"/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"}' \ -blockdev '{"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/rhel9.2_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"}' \ -machine pc-q35-rhel9.0.0,usb=off,vmport=off,smm=on,kernel_irqchip=split,dump-guest-core=off,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format \ -accel kvm \ -cpu Cascadelake-Server,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,umip=on,pku=on,md-clear=on,stibp=on,arch-capabilities=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on,hle=off,rtm=off,tsc-deadline=on,pmu=off \ -global driver=cfi.pflash01,property=secure,value=on \ -m 8192 \ -overcommit mem-lock=off \ -smp 6,sockets=6,dies=1,cores=1,threads=1 \ -nodefaults \ -boot strict=on \ -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \ -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 \ -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \ -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \ -device pcie-root-port,id=pcie-root-port-6,port=0x6,addr=0x1.0x6,bus=pcie.0,chassis=7 \ -blockdev '{"driver":"file","filename":"/home/images_nfv-virt-rt-kvm/rhel9.2.qcow2","aio":"threads","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \ -device '{"driver":"virtio-blk-pci","bus":"pcie-root-port-2","addr":"0x0","drive":"libvirt-1-format","id":"virtio-disk0","bootindex":1,"write-cache":"on"}' \ -device VGA,bus=pcie.0,addr=0x2 \ -monitor stdio \ -qmp tcp:0:5555,server,nowait \ -vnc :0 \ (In reply to Lei Yang from comment #1) > Additional info: > > I tried to used the following qemu cmd, then repeat test the above test > steps, it can not reproduced this bug. > Please ignore comment 1. I don't think plug the card in an occupied addr port. The port that contains a device is on bus pcie.0: -device '{"driver":"pcie-root-port","port":21,"chassis":6,"id":"pci.6","bus":"pcie.0","addr":"0x2.0x5"}' And you plug the virtio-net device on bus pci.6 (so in pcie-root-port you put on bus pcie.0): {"execute":"device_add","arguments":{"driver":"virtio-net-pci","netdev":"hostnet1","mac":"00:11:22:33:44:03","id": "net1","bus":"pci.6","addr":"0x2.0x5"}} Moreover this behavior is not specific to vdpa or virtio-net, I've been able to reproduce it with driver=e1000 and driver=virtio-scsi. The real bug is the crash when QEMU exit, and it is specific to vdpa. This is an assertion that fails: qemu-system-x86_64: .../hw/net/vhost_net.c:483: get_vhost_net: Assertion `vhost_net' failed. 462 VHostNetState *get_vhost_net(NetClientState *nc) 463 { 464 VHostNetState *vhost_net = 0; ... 480 #ifdef CONFIG_VHOST_NET_VDPA 481 case NET_CLIENT_DRIVER_VHOST_VDPA: 482 vhost_net = vhost_vdpa_get_vhost_net(nc); 483 assert(vhost_net); 484 break; 485 #endif 486 default: 487 break; 488 } 489 490 return vhost_net; 491 } I've been able to reproduce the problem with upstream QEMU (c48c9c6b33d7) The problem seems to be with the netdev_add that registers two instances: QMP: {"execute":"netdev_add","arguments":{"type":"vhost-vdpa","id":"hostnet1","vhostdev":"/dev/vhost-vdpa-0"}} and the monitor we can see then: (qemu) info network hostnet1: index=0,type=vhost-vdpa,vhost-vdpa hostnet1: index=0,type=vhost-vdpa,vhost-vdpa (qemu) That looks like the cause of the problem. (In reply to Laurent Vivier from comment #5) > The problem seems to be with the netdev_add that registers two instances: > > QMP: > {"execute":"netdev_add","arguments":{"type":"vhost-vdpa","id":"hostnet1", > "vhostdev":"/dev/vhost-vdpa-0"}} > > and the monitor we can see then: > > (qemu) info network > hostnet1: index=0,type=vhost-vdpa,vhost-vdpa > hostnet1: index=0,type=vhost-vdpa,vhost-vdpa > (qemu) The same occurs without QMP: (qemu) netdev_add vhost-vdpa,id=hostnet1,vhostdev=/dev/vhost-vdpa-0 (qemu) info network hostnet1: index=0,type=vhost-vdpa,vhost-vdpa hostnet1: index=0,type=vhost-vdpa,vhost-vdpa I reproduce the crash when I exit from the QMP console, and bisecting gives me: commit a00e37a4be88a043fea3e8be3ee3a85f6c4939cf Author: Alex Bennée <alex.bennee> Date: Tue Oct 26 11:22:24 2021 +0100 chardev: don't exit() straight away on C-a x While there are a number of uses in the code-base of the exit(0) pattern it gets in the way of clean exit which can do all of it's house-keeping. In particular it was reported that you can crash plugins this way because TCG can still be running on other threads when the atexit callback is called. Use qmp_quit() instead which takes care of some housekeeping before triggering the shutdown. Signed-off-by: Alex Bennée <alex.bennee> Reported-by: Lukas Jünger <lukas.junger> Reviewed-by: Marc-André Lureau <marcandre.lureau> Reviewed-by: Philippe Mathieu-Daudé <philmd> Acked-by: Paolo Bonzini <pbonzini> Message-Id: <20211026102234.3961636-19-alex.bennee> chardev/char-mux.c | 3 ++- stubs/meson.build | 1 + stubs/qmp-quit.c | 8 ++++++++ The multiple instances of netdev is introduced by the multiqueue: commit 402378407dbdce79ce745a13f5c84815f929cfdd (refs/bisect/good-402378407dbdce79ce745a13f5c84815f929cfdd) Author: Jason Wang <jasowang> Date: Wed Oct 20 12:56:00 2021 +0800 vhost-vdpa: multiqueue support This patch implements the multiqueue support for vhost-vdpa. This is done simply by reading the number of queue pairs from the config space and initialize the datapath and control path net client. Signed-off-by: Jason Wang <jasowang> Message-Id: <20211020045600.16082-11-jasowang> Reviewed-by: Michael S. Tsirkin <mst> Signed-off-by: Michael S. Tsirkin <mst> because it calls several times net_vhost_vdpa_init() Based on the comment 7, this bug should be add keywords "Regression" and change priority to "High". Please correct me if I'm wrong. The problem happens because the address provided in device_add is not valid: {"execute":"device_add","arguments":{"driver":"virtio-net-pci","netdev":"hostnet1","mac":"00:11:22:33:44:03","id": "net1","bus":"pci.6","addr":"0x2.0x5"}} thus the card is not really plugged by the the kernel and then QEMU cannot really remove it. We can remove the instance of the netdev because for QEMU the device is removed, but when we exit QEMU does a cleanup and as the card is not really removed it tries to cleanup it. The instance of the netdev doesn't exist anymore so it triggers a crash. To have multiple instances of netdev doesn't seem to be a problem with a correct address. The following command works fine: {"execute":"device_add","arguments":{"driver":"virtio-net-pci","netdev":"hostnet1","mac":"00:11:22:33:44:03","id": "net1","bus":"pci.6","addr":"0x0"}} The problem seems to be introduced by the swith to ACPI PCI hotplug by default: commit 17858a169508609ca9063c544833e5a1adeb7b52 Author: Julia Suvorova <jusual> Date: Tue Jul 13 02:42:04 2021 +0200 hw/acpi/ich9: Set ACPI PCI hot-plug as default on Q35 Q35 has three different types of PCI devices hot-plug: PCIe Native, SHPC Native and ACPI hot-plug. This patch changes the default choice for cold-plugged bridges from PCIe Native to ACPI Hot-plug with ability to use SHPC and PCIe Native for hot-plugged bridges. This is a list of the PCIe Native hot-plug issues that led to this change: * no racy behavior during boot (see 110c477c2ed) * no delay during deleting - after the actual power off software must wait at least 1 second before indicating about it. This case is quite important for users, it even has its own bug: https://bugzilla.redhat.com/show_bug.cgi?id=1594168 * no timer-based behavior - in addition to the previous example, the attention button has a 5-second waiting period, during which the operation can be canceled with a second press. While this looks fine for manual button control, automation will result in the need to queue or drop events, and the software receiving events in all sort of unspecified combinations of attention/power indicator states, which is racy and uppredictable. * fixes: * https://bugzilla.redhat.com/show_bug.cgi?id=1752465 * https://bugzilla.redhat.com/show_bug.cgi?id=1690256 To return to PCIe Native hot-plug: -global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off Known issue: older linux guests need the following flag to allow hotplugged pci express devices to use io: -device pcie-root-port,io-reserve=4096. io is unusual for pci express so this seems minor. We'll fix this by a follow up patch. Signed-off-by: Julia Suvorova <jusual> Reviewed-by: Igor Mammedov <imammedo> Message-Id: <20210713004205.775386-6-jusual> Reviewed-by: Michael S. Tsirkin <mst> Signed-off-by: Michael S. Tsirkin <mst> Reviewed-by: David Gibson <david.id.au> hw/acpi/ich9.c | 2 +- hw/i386/pc.c | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) If I use "global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off" the card can be removed correctly and there is no crash. The problem is not specific nor to virtio neither to vdpa, it can be reproduced with any PCI card: (qemu) info network (qemu) netdev_add user,id=hostnet1 (qemu) info network hostnet1: index=0,type=user,net=10.0.2.0,restrict=off (qemu) device_add e1000e,netdev=hostnet1,mac=00:11:22:33:44:03,id=net1,bus=pci.6,addr=0x2.0x5 (qemu) info network net1: index=0,type=nic,model=e1000e,macaddr=00:11:22:33:44:03 \ hostnet1: index=0,type=user,net=10.0.2.0,restrict=off (qemu) device_del net1 (qemu) info network net1: index=0,type=nic,model=e1000e,macaddr=00:11:22:33:44:03 \ hostnet1: index=0,type=user,net=10.0.2.0,restrict=off (qemu) netdev_del hostnet1 (qemu) info network net1: index=0,type=nic,model=e1000e,macaddr=00:11:22:33:44:03 \ hostnet1: index=0,type=user,net=10.0.2.0,restrict=off It doesn't crash because in contrary to vdpa there is no assert to check the consistency of the system but the problem is still there as the card is not really unplugged when it has been plugged to an invalid port address. We can reproduce it with a SCSI card: (qemu) info pci Bus 0, device 0, function 0: Host bridge: PCI device 8086:29c0 PCI subsystem 1af4:1100 id "" ... Bus 0, device 2, function 5: PCI bridge: PCI device 1b36:000c IRQ 11, pin A BUS 0. secondary bus 3. subordinate bus 3. IO range [0x3000, 0x3fff] memory range [0xfde00000, 0xfdffffff] prefetchable memory range [0xfe600000, 0xfe7fffff] BAR0: 32 bit memory at 0xfe402000 [0xfe402fff]. id "pci.6" ... (qemu) device_add virtio-scsi,id=net1,bus=pci.6,addr=0x2.0x5 (qemu) info pci Bus 0, device 0, function 0: Host bridge: PCI device 8086:29c0 PCI subsystem 1af4:1100 id "" ... Bus 0, device 2, function 5: PCI bridge: PCI device 1b36:000c IRQ 11, pin A BUS 0. secondary bus 3. subordinate bus 3. IO range [0x3000, 0x3fff] memory range [0xfde00000, 0xfdffffff] prefetchable memory range [0xfe600000, 0xfe7fffff] BAR0: 32 bit memory at 0xfe402000 [0xfe402fff]. id "pci.6" Bus 3, device 2, function 5: SCSI controller: PCI device 1af4:1048 PCI subsystem 1af4:1100 IRQ 0, pin A BAR1: 32 bit memory at 0xffffffffffffffff [0x00000ffe]. BAR4: 64 bit prefetchable memory at 0xffffffffffffffff [0x00003ffe]. id "net1" ... (qemu) device_del net1 (qemu) info pci Bus 0, device 0, function 0: Host bridge: PCI device 8086:29c0 PCI subsystem 1af4:1100 id "" ... Bus 0, device 2, function 5: PCI bridge: PCI device 1b36:000c IRQ 11, pin A BUS 0. secondary bus 3. subordinate bus 3. IO range [0x3000, 0x3fff] memory range [0xfde00000, 0xfdffffff] prefetchable memory range [0xfe600000, 0xfe7fffff] BAR0: 32 bit memory at 0xfe402000 [0xfe402fff]. id "pci.6" Bus 3, device 2, function 5: SCSI controller: PCI device 1af4:1048 PCI subsystem 1af4:1100 IRQ 0, pin A BAR1: 32 bit memory at 0xffffffffffffffff [0x00000ffe]. BAR4: 64 bit prefetchable memory at 0xffffffffffffffff [0x00003ffe]. id "net1" ... The PCI card is not removed after the device_del Moving to PCI subcomponent To summarize Reproducer: /usr/libexec/qemu-kvm \ -machine q35 \ -cpu host -enable-kvm \ -m 8192 \ -smp 6,sockets=6,dies=1,cores=1,threads=1 \ -nodefaults \ -boot strict=on \ -device pcie-root-port,port=16,chassis=1,id=pci.1,bus=pcie.0,multifunction=true,addr=0x2 \ -device pcie-root-port,port=17,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \ -device pcie-root-port,port=21,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \ -blockdev file,filename=/var/lib/libvirt/images/rhel9.1.0.qcow2,aio=threads,node-name=libvirt-1-storage \ -blockdev node-name=libvirt-1-format,driver=qcow2,file=libvirt-1-storage \ -device virtio-blk-pci,bus=pci.2,addr=0x0,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \ -serial mon:stdio \ -nographic In QEMU monitor: for networking card that can be listed by "info network" (qemu) info network (qemu) netdev_add user,id=hostnet1 (qemu) info network hostnet1: index=0,type=user,net=10.0.2.0,restrict=off (qemu) device_add e1000e,netdev=hostnet1,mac=00:11:22:33:44:03,id=net1,bus=pci.6,addr=0x2.0x5 (qemu) info network net1: index=0,type=nic,model=e1000e,macaddr=00:11:22:33:44:03 \ hostnet1: index=0,type=user,net=10.0.2.0,restrict=off (qemu) device_del net1 (qemu) info network net1: index=0,type=nic,model=e1000e,macaddr=00:11:22:33:44:03 \ hostnet1: index=0,type=user,net=10.0.2.0,restrict=off (qemu) netdev_del hostnet1 (qemu) info network net1: index=0,type=nic,model=e1000e,macaddr=00:11:22:33:44:03 \ hostnet1: index=0,type=user,net=10.0.2.0,restrict=off The netdev and the cards are not removed. or for any PCI card: (qemu) info pci Bus 0, device 0, function 0: Host bridge: PCI device 8086:29c0 PCI subsystem 1af4:1100 id "" ... Bus 0, device 2, function 5: PCI bridge: PCI device 1b36:000c IRQ 11, pin A BUS 0. secondary bus 3. subordinate bus 3. IO range [0x3000, 0x3fff] memory range [0xfde00000, 0xfdffffff] prefetchable memory range [0xfe600000, 0xfe7fffff] BAR0: 32 bit memory at 0xfe402000 [0xfe402fff]. id "pci.6" ... (qemu) device_add virtio-scsi,id=net1,bus=pci.6,addr=0x2.0x5 (qemu) info pci Bus 0, device 0, function 0: Host bridge: PCI device 8086:29c0 PCI subsystem 1af4:1100 id "" ... Bus 0, device 2, function 5: PCI bridge: PCI device 1b36:000c IRQ 11, pin A BUS 0. secondary bus 3. subordinate bus 3. IO range [0x3000, 0x3fff] memory range [0xfde00000, 0xfdffffff] prefetchable memory range [0xfe600000, 0xfe7fffff] BAR0: 32 bit memory at 0xfe402000 [0xfe402fff]. id "pci.6" Bus 3, device 2, function 5: SCSI controller: PCI device 1af4:1048 PCI subsystem 1af4:1100 IRQ 0, pin A BAR1: 32 bit memory at 0xffffffffffffffff [0x00000ffe]. BAR4: 64 bit prefetchable memory at 0xffffffffffffffff [0x00003ffe]. id "net1" ... (qemu) device_del net1 (qemu) info pci Bus 0, device 0, function 0: Host bridge: PCI device 8086:29c0 PCI subsystem 1af4:1100 id "" ... Bus 0, device 2, function 5: PCI bridge: PCI device 1b36:000c IRQ 11, pin A BUS 0. secondary bus 3. subordinate bus 3. IO range [0x3000, 0x3fff] memory range [0xfde00000, 0xfdffffff] prefetchable memory range [0xfe600000, 0xfe7fffff] BAR0: 32 bit memory at 0xfe402000 [0xfe402fff]. id "pci.6" Bus 3, device 2, function 5: SCSI controller: PCI device 1af4:1048 PCI subsystem 1af4:1100 IRQ 0, pin A BAR1: 32 bit memory at 0xffffffffffffffff [0x00000ffe]. BAR4: 64 bit prefetchable memory at 0xffffffffffffffff [0x00003ffe]. id "net1" ... The problem is the PCI card is not removed after the device_del. The problem is introduced by 17858a169508 "hw/acpi/ich9: Set ACPI PCI hot-plug as default on Q35" The workaround is to use "global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off" The problem is triggered by using an invalid port address. The problem generates a crash with vDPA because vDPA has an assert that check the state of the netdev. Hi Laurent This problem still can reproduced on the latest qemu-kvm-7.2 version. Are there plans to fix this in the current release? If not, Please move to ITR=9.3.0 . Test Version: kernel-5.14.0-262.el9.x86_64 qemu-kvm-7.2.0-8.el9.x86_64 libvirt-9.0.0-3.el9.x86_64 Thanks Lei (In reply to Lei Yang from comment #13) > Hi Laurent > > This problem still can reproduced on the latest qemu-kvm-7.2 version. Are > there plans to fix this in the current release? If not, Please move to > ITR=9.3.0 . > Yes, it seems to late to be fixed now. Julia, any update (see comment #9)? Should we close as WONTFIX? Thanks (In reply to Laurent Vivier from comment #15) > Julia, > > any update (see comment #9)? > > Should we close as WONTFIX? > > Thanks Thanks for the investigation. There was protection against hotplug at the wrong address on the root ports. I'll take a look at what went wrong. Assigning it to myself after talking to Julia. Will take a stab at it. Even with the native hotplug case, we see that upon insertion of the ethernet device with a wring devfn, the device is actually not functional: (qemu) info pci ... Bus 0, device 2, function 5: PCI bridge: PCI device 1b36:000c IRQ 11, pin A BUS 0. secondary bus 3. subordinate bus 3. IO range [0x6000, 0x6fff] memory range [0xc0800000, 0xc09fffff] prefetchable memory range [0x800400000, 0x8005fffff] BAR0: 32 bit memory at 0xc0e11000 [0xc0e11fff]. id "pci.6" Bus 3, device 2, function 5: Ethernet controller: PCI device 8086:10d3 PCI subsystem 8086:0000 IRQ 0, pin A BAR0: 32 bit memory at 0xffffffffffffffff [0x0001fffe]. BAR1: 32 bit memory at 0xffffffffffffffff [0x0001fffe]. BAR2: I/O at 0xffffffffffffffff [0x001e]. BAR3: 32 bit memory at 0xffffffffffffffff [0x00003ffe]. BAR6: 32 bit memory at 0xffffffffffffffff [0x0003fffe]. id "net1" The addresses are all wrong (with kvm accelerator). When the devfn are correct, we see proper BAR addresses for this inserted device: (qemu) netdev_add socket,id=hostnet1,listen=:1234 (qemu) device_add e1000e,netdev=hostnet1,mac=00:11:22:33:44:03,id=net1,bus=pci.6,addr=0x0.0 (qemu) info pci ... Bus 0, device 2, function 5: PCI bridge: PCI device 1b36:000c IRQ 11, pin A BUS 0. secondary bus 3. subordinate bus 3. IO range [0x6000, 0x6fff] memory range [0xc0800000, 0xc09fffff] prefetchable memory range [0x800400000, 0x8005fffff] BAR0: 32 bit memory at 0xc0e11000 [0xc0e11fff]. id "pci.6" Bus 3, device 0, function 0: Ethernet controller: PCI device 8086:10d3 PCI subsystem 8086:0000 IRQ 0, pin A BAR0: 32 bit memory at 0xc0840000 [0xc085ffff]. BAR1: 32 bit memory at 0xc0860000 [0xc087ffff]. BAR2: I/O at 0xffffffffffffffff [0x001e]. BAR3: 32 bit memory at 0xc0880000 [0xc0883fff]. BAR6: 32 bit memory at 0xffffffffffffffff [0x0003fffe]. id "net1" (In reply to Laurent Vivier from comment #12) > To summarize > > Reproducer: > > /usr/libexec/qemu-kvm \ > -machine q35 \ > -cpu host -enable-kvm \ > -m 8192 \ > -smp 6,sockets=6,dies=1,cores=1,threads=1 \ > -nodefaults \ > -boot strict=on \ > -device > pcie-root-port,port=16,chassis=1,id=pci.1,bus=pcie.0,multifunction=true, > addr=0x2 \ > -device pcie-root-port,port=17,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \ > -device pcie-root-port,port=21,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \ > -blockdev > file,filename=/var/lib/libvirt/images/rhel9.1.0.qcow2,aio=threads,node- > name=libvirt-1-storage \ > -blockdev node-name=libvirt-1-format,driver=qcow2,file=libvirt-1-storage \ > -device > virtio-blk-pci,bus=pci.2,addr=0x0,drive=libvirt-1-format,id=virtio-disk0, > bootindex=1,write-cache=on \ > -serial mon:stdio \ > -nographic > > In QEMU monitor: > > for networking card that can be listed by "info network" > > (qemu) info network > (qemu) netdev_add user,id=hostnet1 > (qemu) info network > hostnet1: index=0,type=user,net=10.0.2.0,restrict=off > (qemu) device_add > e1000e,netdev=hostnet1,mac=00:11:22:33:44:03,id=net1,bus=pci.6,addr=0x2.0x5 > (qemu) info network > net1: index=0,type=nic,model=e1000e,macaddr=00:11:22:33:44:03 > \ hostnet1: index=0,type=user,net=10.0.2.0,restrict=off > (qemu) device_del net1 > (qemu) info network > net1: index=0,type=nic,model=e1000e,macaddr=00:11:22:33:44:03 > \ hostnet1: index=0,type=user,net=10.0.2.0,restrict=off > (qemu) netdev_del hostnet1 > (qemu) info network > net1: index=0,type=nic,model=e1000e,macaddr=00:11:22:33:44:03 > \ hostnet1: index=0,type=user,net=10.0.2.0,restrict=off > > The netdev and the cards are not removed. > > or for any PCI card: > > (qemu) info pci > Bus 0, device 0, function 0: > Host bridge: PCI device 8086:29c0 > PCI subsystem 1af4:1100 > id "" > ... > Bus 0, device 2, function 5: > PCI bridge: PCI device 1b36:000c > IRQ 11, pin A > BUS 0. > secondary bus 3. > subordinate bus 3. > IO range [0x3000, 0x3fff] > memory range [0xfde00000, 0xfdffffff] > prefetchable memory range [0xfe600000, 0xfe7fffff] > BAR0: 32 bit memory at 0xfe402000 [0xfe402fff]. > id "pci.6" > ... > (qemu) device_add virtio-scsi,id=net1,bus=pci.6,addr=0x2.0x5 > (qemu) info pci > Bus 0, device 0, function 0: > Host bridge: PCI device 8086:29c0 > PCI subsystem 1af4:1100 > id "" > ... > Bus 0, device 2, function 5: > PCI bridge: PCI device 1b36:000c > IRQ 11, pin A > BUS 0. > secondary bus 3. > subordinate bus 3. > IO range [0x3000, 0x3fff] > memory range [0xfde00000, 0xfdffffff] > prefetchable memory range [0xfe600000, 0xfe7fffff] > BAR0: 32 bit memory at 0xfe402000 [0xfe402fff]. > id "pci.6" > Bus 3, device 2, function 5: > SCSI controller: PCI device 1af4:1048 > PCI subsystem 1af4:1100 > IRQ 0, pin A > BAR1: 32 bit memory at 0xffffffffffffffff [0x00000ffe]. > BAR4: 64 bit prefetchable memory at 0xffffffffffffffff [0x00003ffe]. > id "net1" > ... > (qemu) device_del net1 > (qemu) info pci > Bus 0, device 0, function 0: > Host bridge: PCI device 8086:29c0 > PCI subsystem 1af4:1100 > id "" > ... > Bus 0, device 2, function 5: > PCI bridge: PCI device 1b36:000c > IRQ 11, pin A > BUS 0. > secondary bus 3. > subordinate bus 3. > IO range [0x3000, 0x3fff] > memory range [0xfde00000, 0xfdffffff] > prefetchable memory range [0xfe600000, 0xfe7fffff] > BAR0: 32 bit memory at 0xfe402000 [0xfe402fff]. > id "pci.6" > Bus 3, device 2, function 5: > SCSI controller: PCI device 1af4:1048 > PCI subsystem 1af4:1100 > IRQ 0, pin A > BAR1: 32 bit memory at 0xffffffffffffffff [0x00000ffe]. > BAR4: 64 bit prefetchable memory at 0xffffffffffffffff [0x00003ffe]. > id "net1" > ... > > The problem is the PCI card is not removed after the device_del. > The problem is introduced by 17858a169508 "hw/acpi/ich9: Set ACPI PCI > hot-plug as default on Q35" > The workaround is to use "global > ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off" > The problem is triggered by using an invalid port address. I don't think that problem with device on non 0 function, it is still present (device_del is just request). I'd say something goes wrong with how we link/account for backend and do cleanup. netdev_del shall fail if there is device that uses it or at least not leave association in inconsistent state. > The problem generates a crash with vDPA because vDPA has an assert that > check the state of the netdev. (In reply to Igor Mammedov from comment #19) > (In reply to Laurent Vivier from comment #12) > > To summarize > > > > Reproducer: > > > > /usr/libexec/qemu-kvm \ > > -machine q35 \ > > -cpu host -enable-kvm \ > > -m 8192 \ > > -smp 6,sockets=6,dies=1,cores=1,threads=1 \ > > -nodefaults \ > > -boot strict=on \ > > -device > > pcie-root-port,port=16,chassis=1,id=pci.1,bus=pcie.0,multifunction=true, > > addr=0x2 \ > > -device pcie-root-port,port=17,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \ > > -device pcie-root-port,port=21,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \ > > -blockdev > > file,filename=/var/lib/libvirt/images/rhel9.1.0.qcow2,aio=threads,node- > > name=libvirt-1-storage \ > > -blockdev node-name=libvirt-1-format,driver=qcow2,file=libvirt-1-storage \ > > -device > > virtio-blk-pci,bus=pci.2,addr=0x0,drive=libvirt-1-format,id=virtio-disk0, > > bootindex=1,write-cache=on \ > > -serial mon:stdio \ > > -nographic > > > > In QEMU monitor: > > > > for networking card that can be listed by "info network" > > > > (qemu) info network > > (qemu) netdev_add user,id=hostnet1 > > (qemu) info network > > hostnet1: index=0,type=user,net=10.0.2.0,restrict=off > > (qemu) device_add > > e1000e,netdev=hostnet1,mac=00:11:22:33:44:03,id=net1,bus=pci.6,addr=0x2.0x5 > > (qemu) info network > > net1: index=0,type=nic,model=e1000e,macaddr=00:11:22:33:44:03 > > \ hostnet1: index=0,type=user,net=10.0.2.0,restrict=off > > (qemu) device_del net1 > > (qemu) info network > > net1: index=0,type=nic,model=e1000e,macaddr=00:11:22:33:44:03 > > \ hostnet1: index=0,type=user,net=10.0.2.0,restrict=off > > (qemu) netdev_del hostnet1 > > (qemu) info network > > net1: index=0,type=nic,model=e1000e,macaddr=00:11:22:33:44:03 > > \ hostnet1: index=0,type=user,net=10.0.2.0,restrict=off > > > > The netdev and the cards are not removed. > > > > or for any PCI card: > > > > (qemu) info pci > > Bus 0, device 0, function 0: > > Host bridge: PCI device 8086:29c0 > > PCI subsystem 1af4:1100 > > id "" > > ... > > Bus 0, device 2, function 5: > > PCI bridge: PCI device 1b36:000c > > IRQ 11, pin A > > BUS 0. > > secondary bus 3. > > subordinate bus 3. > > IO range [0x3000, 0x3fff] > > memory range [0xfde00000, 0xfdffffff] > > prefetchable memory range [0xfe600000, 0xfe7fffff] > > BAR0: 32 bit memory at 0xfe402000 [0xfe402fff]. > > id "pci.6" > > ... > > (qemu) device_add virtio-scsi,id=net1,bus=pci.6,addr=0x2.0x5 > > (qemu) info pci > > Bus 0, device 0, function 0: > > Host bridge: PCI device 8086:29c0 > > PCI subsystem 1af4:1100 > > id "" > > ... > > Bus 0, device 2, function 5: > > PCI bridge: PCI device 1b36:000c > > IRQ 11, pin A > > BUS 0. > > secondary bus 3. > > subordinate bus 3. > > IO range [0x3000, 0x3fff] > > memory range [0xfde00000, 0xfdffffff] > > prefetchable memory range [0xfe600000, 0xfe7fffff] > > BAR0: 32 bit memory at 0xfe402000 [0xfe402fff]. > > id "pci.6" > > Bus 3, device 2, function 5: > > SCSI controller: PCI device 1af4:1048 > > PCI subsystem 1af4:1100 > > IRQ 0, pin A > > BAR1: 32 bit memory at 0xffffffffffffffff [0x00000ffe]. > > BAR4: 64 bit prefetchable memory at 0xffffffffffffffff [0x00003ffe]. > > id "net1" > > ... > > (qemu) device_del net1 > > (qemu) info pci > > Bus 0, device 0, function 0: > > Host bridge: PCI device 8086:29c0 > > PCI subsystem 1af4:1100 > > id "" > > ... > > Bus 0, device 2, function 5: > > PCI bridge: PCI device 1b36:000c > > IRQ 11, pin A > > BUS 0. > > secondary bus 3. > > subordinate bus 3. > > IO range [0x3000, 0x3fff] > > memory range [0xfde00000, 0xfdffffff] > > prefetchable memory range [0xfe600000, 0xfe7fffff] > > BAR0: 32 bit memory at 0xfe402000 [0xfe402fff]. > > id "pci.6" > > Bus 3, device 2, function 5: > > SCSI controller: PCI device 1af4:1048 > > PCI subsystem 1af4:1100 > > IRQ 0, pin A > > BAR1: 32 bit memory at 0xffffffffffffffff [0x00000ffe]. > > BAR4: 64 bit prefetchable memory at 0xffffffffffffffff [0x00003ffe]. > > id "net1" > > ... > > > > The problem is the PCI card is not removed after the device_del. > > The problem is introduced by 17858a169508 "hw/acpi/ich9: Set ACPI PCI > > hot-plug as default on Q35" > > The workaround is to use "global > > ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off" > > The problem is triggered by using an invalid port address. > > I don't think that problem with device on non 0 function, > it is still present (device_del is just request). The problem is not non-zero function. I did some experiments and seems the issue is the wrong (non-zero) bus. Something like device_add e1000e,netdev=hostnet1,mac=00:11:22:33:44:03,id=net1,bus=pci.6,addr=0x0.0x5 works fine and the device does get removed. (In reply to Igor Mammedov from comment #19) > (In reply to Laurent Vivier from comment #12) > > The problem is the PCI card is not removed after the device_del. > > The problem is introduced by 17858a169508 "hw/acpi/ich9: Set ACPI PCI > > hot-plug as default on Q35" > > The workaround is to use "global > > ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off" > > The problem is triggered by using an invalid port address. > > I don't think that problem with device on non 0 function, > it is still present (device_del is just request). I'd say > something goes wrong with how we link/account for backend > and do cleanup. netdev_del shall fail if there is device > that uses it or at least not leave association in inconsistent > state. > What I have found from my investigation is this: When we are using native hotplug, unplug request goes through this chain: pcie_cap_slot_unplug_request_cb() -> pcie_unplug_device() -> pcie_cap_slot_unplug_cb() ->qdev_unrealize(). -> object_unparent() This eventually cleans up the device. So this is a direct cleanup path. For acpi hotplug, unplug_device goes through this: ich9_pm_device_unplug_request_cb() -> acpi_pcihp_device_unplug_request_cb() -> acpi_send_event() Just as Igor mentioned above this sends an unplug event to the guest OS. What I am observing is that it seems when the slot ID != 0, the guest OS seems to ignore this and we never seem to hit ich9_pm_device_unplug_cb(). So the following path is not triggered: ich9_pm_device_unplug_cb() -> acpi_pcihp_device_unplug_cb() -> qdev_unrealize() When the slot ID is 0 for the plugged interface card, this does get hit (like I mentioned in the above comment). So the cleanup that us supposed to happen from ich9_pm_device_unplug_cb() never happens, the card remains plugged in and only upon guest shutdown, QEMU tries to clean this up and with vdpa device it causes a crash. I do believe that the patch that I posted in the mailing list https://lists.gnu.org/archive/html/qemu-devel/2023-06/msg03066.html is in the right direction and slot value anything other than 0 should be disallowed when a device is plugged into the pcie root ports. (In reply to Ani Sinha from comment #21) > (In reply to Igor Mammedov from comment #19) > > (In reply to Laurent Vivier from comment #12) > > > > The problem is the PCI card is not removed after the device_del. > > > The problem is introduced by 17858a169508 "hw/acpi/ich9: Set ACPI PCI > > > hot-plug as default on Q35" > > > The workaround is to use "global > > > ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off" > > > The problem is triggered by using an invalid port address. > > > > I don't think that problem with device on non 0 function, > > it is still present (device_del is just request). I'd say > > something goes wrong with how we link/account for backend > > and do cleanup. netdev_del shall fail if there is device > > that uses it or at least not leave association in inconsistent > > state. > > > > What I have found from my investigation is this: > > When we are using native hotplug, unplug request goes through this chain: > pcie_cap_slot_unplug_request_cb() > -> pcie_unplug_device() > -> pcie_cap_slot_unplug_cb() > ->qdev_unrealize(). > -> object_unparent() > > This eventually cleans up the device. So this is a direct cleanup path. > > For acpi hotplug, unplug_device goes through this: > > ich9_pm_device_unplug_request_cb() > -> acpi_pcihp_device_unplug_request_cb() > -> acpi_send_event() > > Just as Igor mentioned above this sends an unplug event to the guest OS. > > What I am observing is that it seems when the slot ID != 0, the guest OS > seems to ignore this and we never seem to hit ich9_pm_device_unplug_cb(). > > So the following path is not triggered: > > ich9_pm_device_unplug_cb() > -> acpi_pcihp_device_unplug_cb() > -> qdev_unrealize() > > When the slot ID is 0 for the plugged interface card, this does get hit > (like I mentioned in the above comment). Another data point is that acpi_pcihp_eject_slot() also does not get hit when slot ID is non-zero. When slot is 0, it hits and cleans up the device. (In reply to Ani Sinha from comment #21) > (In reply to Igor Mammedov from comment #19) > > (In reply to Laurent Vivier from comment #12) > > > > The problem is the PCI card is not removed after the device_del. > > > The problem is introduced by 17858a169508 "hw/acpi/ich9: Set ACPI PCI > > > hot-plug as default on Q35" > > > The workaround is to use "global > > > ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off" > > > The problem is triggered by using an invalid port address. > > > > I don't think that problem with device on non 0 function, > > it is still present (device_del is just request). I'd say > > something goes wrong with how we link/account for backend > > and do cleanup. netdev_del shall fail if there is device > > that uses it or at least not leave association in inconsistent > > state. > > > > What I have found from my investigation is this: > > When we are using native hotplug, unplug request goes through this chain: > pcie_cap_slot_unplug_request_cb() > -> pcie_unplug_device() > -> pcie_cap_slot_unplug_cb() > ->qdev_unrealize(). > -> object_unparent() > > This eventually cleans up the device. So this is a direct cleanup path. > > For acpi hotplug, unplug_device goes through this: > > ich9_pm_device_unplug_request_cb() > -> acpi_pcihp_device_unplug_request_cb() > -> acpi_send_event() > > Just as Igor mentioned above this sends an unplug event to the guest OS. > > What I am observing is that it seems when the slot ID != 0, the guest OS > seems to ignore this and we never seem to hit ich9_pm_device_unplug_cb(). > > So the following path is not triggered: > > ich9_pm_device_unplug_cb() > -> acpi_pcihp_device_unplug_cb() > -> qdev_unrealize() > The full path is: memory_region_write_accessor() -> pci_write() -> acpi_pcihp_eject_slot() -> hotplug_handler_unplug() -> ich9_pm_device_unplug_cb() -> acpi_pcihp_device_unplug_cb() -> qdev_unrealize() -> object_unparent() acpi_pcihp_eject_slot() triggers device removal on behalf of guest eject action only. Think in more generic terms, does it work if we never unplug nic (if yes, then why). So far I'm convinced that unplug which didn't happen is a trigger that exposes cleanup issue elsewhere. (In reply to Igor Mammedov from comment #24) > acpi_pcihp_eject_slot() triggers device removal on behalf of guest eject > action only. > Think in more generic terms, does it work if we never unplug nic (if yes, > then why). This bug is about hot unplugging a device that was plugged in with a wrong slot address. The way I see this is that the cleanup happens differently for native hotplug vs acpi. For native, we directly cleanup the device. For acpi we depend on the guest triggering the cleanup upon receiving the acpi unplug notification which in our case never happens. If we never unplugged the nic, the qemu cleanup code at guest shutdown would cleanup both the plugged in device and the netdev. > > So far I'm convinced that unplug which didn't happen is a trigger that > exposes cleanup issue elsewhere. elaborate? (In reply to Ani Sinha from comment #25) > (In reply to Igor Mammedov from comment #24) > > acpi_pcihp_eject_slot() triggers device removal on behalf of guest eject > > action only. > > Think in more generic terms, does it work if we never unplug nic (if yes, > > then why). > > This bug is about hot unplugging a device that was plugged in with a wrong > slot address. The way I see this is that the cleanup happens differently for Where it's said that it's wrong address? > native hotplug vs acpi. > For native, we directly cleanup the device. > For acpi we depend on the guest triggering the cleanup upon receiving the > acpi unplug notification > which in our case never happens. once again, unplug request is just a request, the guest doesn't have to to unplug device (it's fine for guest to ignore request). In this case QEMU shall do cleanup on its own, when device is actually destroyed during /machine tree deconstruction and in the right order. > If we never unplugged the nic, the qemu cleanup code at guest shutdown would > cleanup both the plugged in device and the netdev. we asked it to be unplugged but it did not happen => so we never unplugged the nic > > > > So far I'm convinced that unplug which didn't happen is a trigger that > > exposes cleanup issue elsewhere. > > elaborate? see above (In reply to Ani Sinha from comment #25) > > > > So far I'm convinced that unplug which didn't happen is a trigger that > > exposes cleanup issue elsewhere. > > elaborate? OK I think I understand what you mean. There are two issues here: (a) device_del is not removing the device. We explained that by observing that slot unplug is not getting triggered by the guest. (b) the crash on vm shutdown with vdpa/virtio-net. Even if the device was not cleaned up from (a), the shutdown should be able to clean it up without crashes. Let me know if I am wrong here. (In reply to Igor Mammedov from comment #26) > (In reply to Ani Sinha from comment #25) > > (In reply to Igor Mammedov from comment #24) > > > acpi_pcihp_eject_slot() triggers device removal on behalf of guest eject > > > action only. > > > Think in more generic terms, does it work if we never unplug nic (if yes, > > > then why). > > > > This bug is about hot unplugging a device that was plugged in with a wrong > > slot address. The way I see this is that the cleanup happens differently for > > Where it's said that it's wrong address? See comment #9 above https://bugzilla.redhat.com/show_bug.cgi?id=2128929#c9 and my experiments validate it. (In reply to Ani Sinha from comment #27) > (In reply to Ani Sinha from comment #25) > > > > > > > So far I'm convinced that unplug which didn't happen is a trigger that > > > exposes cleanup issue elsewhere. > > > > elaborate? > > OK I think I understand what you mean. There are two issues here: > > (a) device_del is not removing the device. We explained that by observing > that slot unplug is not getting triggered by the guest. and it's normal situation, there is nothing wrong with it in theory. In practice it might be fine to take your patch but see (b) > (b) the crash on vm shutdown with vdpa/virtio-net. Even if the device was > not cleaned up from (a), the shutdown should be able to clean it up without > crashes. yep, the device can be removed when guest tells QEMU to do, but if guest does not (like in this case or imagine guest in process of being shutdown and acpiphp module being deactivated already, then it would race between SCI signal and guest shutdown process) then QEMU should do cleanup on its own like for any other present PCI device. So far I haven't seen an a clear explanation about where is goes wrong. > Let me know if I am wrong here. > (In reply to Igor Mammedov from comment #26) > > (In reply to Ani Sinha from comment #25) > > > (In reply to Igor Mammedov from comment #24) > > > > acpi_pcihp_eject_slot() triggers device removal on behalf of guest eject > > > > action only. > > > > Think in more generic terms, does it work if we never unplug nic (if yes, > > > > then why). > > > > > > This bug is about hot unplugging a device that was plugged in with a wrong > > > slot address. The way I see this is that the cleanup happens differently for > > > > Where it's said that it's wrong address? > > See comment #9 above https://bugzilla.redhat.com/show_bug.cgi?id=2128929#c9 > and my experiments validate it. I wouldn't treat comments (including mine as absolute source of truth) quote from comment 9: " The problem happens because the address provided in device_add is not valid: " QEMU migh not support multifunction hotplug, but it doesn't make 'not valid' per se. " {"execute":"device_add","arguments":{"driver":"virtio-net-pci","netdev":"hostnet1","mac":"00:11:22:33:44:03","id": "net1","bus":"pci.6","addr":"0x2.0x5"}} thus the card is not really plugged by the the kernel and then QEMU cannot really remove it. " ^^^^^^^^^^ " We can remove the instance of the netdev because for QEMU the device is removed, but when we exit QEMU does a cleanup and as the card is not really removed it tries to cleanup it. " above statement contradicts with previous (highlighted) one, so has device been removed or not? " The instance of the netdev doesn't exist anymore so it triggers a crash.(In reply to Ani Sinha from comment #28) " above shall not happen until device referencing it still exists. (In reply to Igor Mammedov from comment #29) > (In reply to Ani Sinha from comment #27) > > (In reply to Ani Sinha from comment #25) > > > > > > > > > > So far I'm convinced that unplug which didn't happen is a trigger that > > > > exposes cleanup issue elsewhere. > > > > > > elaborate? > > > > OK I think I understand what you mean. There are two issues here: > > > > (a) device_del is not removing the device. We explained that by observing > > that slot unplug is not getting triggered by the guest. > > and it's normal situation, there is nothing wrong with it in theory. > In practice it might be fine to take your patch but see (b) > > > (b) the crash on vm shutdown with vdpa/virtio-net. Even if the device was > > not cleaned up from (a), the shutdown should be able to clean it up without > > crashes. > > yep, the device can be removed when guest tells QEMU to do, but if guest > does not > (like in this case or imagine guest in process of being shutdown > and acpiphp module being deactivated already, then it would race between > SCI signal > and guest shutdown process) > then QEMU should do cleanup on its own like for any other present PCI device. > > So far I haven't seen an a clear explanation about where is goes wrong. > OK let me try again. The issue happens with acpi hotplug when we do *both* device_del on the nic and then a netdev_del on the net device. We have already established why device_del actually does not remove the device (point (a) above). So in netdev_del we have this call-chain: hmp_netdev_del() -> qmp_netdev_del() -> qemu_del_net_client() -> qemu_cleanup_net_client() -> vhost_vdpa_cleanup() vhost_vdpa_cleanup() sets s->vhost_net = NULL; (see line 193 in net/vhost-vdpa.c). Therefore when the vm shuts down and qemu tries to cleanup the nic card, we have the following call chain virtio_set_status() -> vhost_vdpa_get_vhost_net() -> get_vhost_net() get_vhost_net() calls vhost_vdpa_get_vhost_net() that returns s->vhost_net() which is NULL and hence the assertion fails causing the crash. I am not sure how to fix this issue. Seems the crash is all related/specific to vhost-net. This is orthogonal to why the nic was not removed on device_del previously. Ok one more thing. qemu_del_net_client() checks if there are any attached network interfaces. In the event when the network interface is cleanly removed (slot 0 was passed as its addresss), nc->peer is NULL and hence the following loop is skipped: /* If there is a peer NIC, delete and cleanup client, but do not free. */ if (nc->peer && nc->peer->info->type == NET_CLIENT_DRIVER_NIC) { NICState *nic = qemu_get_nic(nc->peer); if (nic->peer_deleted) { return; } nic->peer_deleted = true; for (i = 0; i < queues; i++) { ncs[i]->peer->link_down = true; } if (nc->peer->info->link_status_changed) { nc->peer->info->link_status_changed(nc->peer); } for (i = 0; i < queues; i++) { qemu_cleanup_net_client(ncs[i]); } return; } and it goes straight to freeing up of the clients calling qemu_cleanup_net_client() and and then qemu_free_net_client(). Thus during shutdown, there is no vdpa backend left. When the interface is not deleted, nc->peer is non-null, the above loop is executed which eventually calls vhost_vdpa_cleanup() through qemu_cleanup_net_client(). vhost_vdpa_cleanup() sets s->vhost_net = NULL; The vdpa netdev backend is not freed. Maybe if we removed the assertion from get_vhost_net(), it should "just work" since virtio_net_vhost_status() checks for a null return value when calling the former. One another observation is get_vhost_net() does not assert on the return value for tap_get_vhost_net() so its not guranteed to always return non-null or assert. > Maybe if we removed the assertion from get_vhost_net(), it should "just work" since virtio_net_vhost_status() checks for a null return value when calling the former.
I was completely mistaken here. Lots of code paths in vhost-net and virtio-net simply assume that the return pointer from get_vhost_net() is non-null. Assertion there is justified.
I am beginning to think that this needs to be resolved at a different level. Perhaps qemu_del_net_client() is calling qemu_cleanup_net_client() too early?
(In reply to Igor Mammedov from comment #30) > > (In reply to Igor Mammedov from comment #26) > > > (In reply to Ani Sinha from comment #25) > > > > (In reply to Igor Mammedov from comment #24) > > > > > acpi_pcihp_eject_slot() triggers device removal on behalf of guest eject > > > > > action only. > > > > > Think in more generic terms, does it work if we never unplug nic (if yes, > > > > > then why). > > > > > > > > This bug is about hot unplugging a device that was plugged in with a wrong > > > > slot address. The way I see this is that the cleanup happens differently for > > > > > > Where it's said that it's wrong address? > > > > See comment #9 above https://bugzilla.redhat.com/show_bug.cgi?id=2128929#c9 > > and my experiments validate it. > > I wouldn't treat comments (including mine as absolute source of truth) > > quote from comment 9: > " > The problem happens because the address provided in device_add is not > valid: > " > > QEMU migh not support multifunction hotplug, but it doesn't make 'not valid' > per se. > > " > > {"execute":"device_add","arguments":{"driver":"virtio-net-pci","netdev": > "hostnet1","mac":"00:11:22:33:44:03","id": > "net1","bus":"pci.6","addr":"0x2.0x5"}} > > thus the card is not really plugged by the the kernel and then QEMU cannot > really remove it. > " > > ^^^^^^^^^^ In that particular comment #12 example, what do you think pci address of device net1 and pci structure in general would be in the end, and what it should be? > " > We can remove the instance of the netdev because for QEMU the device is > removed, but when we exit QEMU does a cleanup and as the card is not really > removed it tries to cleanup it. > " > above statement contradicts with previous (highlighted) one, so has > device been removed or not? > > " > The instance of the netdev doesn't exist anymore so it triggers a > crash.(In reply to Ani Sinha from comment #28) > " > > above shall not happen until device referencing it still exists. (In reply to Julia Suvorova from comment #34) > (In reply to Igor Mammedov from comment #30) > > > (In reply to Igor Mammedov from comment #26) > > > > (In reply to Ani Sinha from comment #25) > > > > > (In reply to Igor Mammedov from comment #24) > > > > > > acpi_pcihp_eject_slot() triggers device removal on behalf of guest eject > > > > > > action only. > > > > > > Think in more generic terms, does it work if we never unplug nic (if yes, > > > > > > then why). > > > > > > > > > > This bug is about hot unplugging a device that was plugged in with a wrong > > > > > slot address. The way I see this is that the cleanup happens differently for > > > > > > > > Where it's said that it's wrong address? > > > > > > See comment #9 above https://bugzilla.redhat.com/show_bug.cgi?id=2128929#c9 > > > and my experiments validate it. > > > > I wouldn't treat comments (including mine as absolute source of truth) > > > > quote from comment 9: > > " > > The problem happens because the address provided in device_add is not > > valid: > > " > > > > QEMU migh not support multifunction hotplug, but it doesn't make 'not valid' > > per se. > > > > " > > > > {"execute":"device_add","arguments":{"driver":"virtio-net-pci","netdev": > > "hostnet1","mac":"00:11:22:33:44:03","id": > > "net1","bus":"pci.6","addr":"0x2.0x5"}} > > > > thus the card is not really plugged by the the kernel and then QEMU cannot > > really remove it. > > " > > > > ^^^^^^^^^^ > > In that particular comment #12 example, what do you think pci address of > device net1 and pci structure in general would be in the end, and what it > should be? As mentioned in comment #9, > The following command works fine: > {"execute":"device_add","arguments":{"driver":"virtio-net-pci","netdev":"hostnet1","mac":"00:11:22:33:44:03","id": "net1","bus":"pci.6","addr":"0x0"}} "0x0" is the right address. At least the device address should be 0. If the function is non-zero, even if this is technically correct for a multifinction bridge, OSPM won't detect a hotplug. > > > " > > We can remove the instance of the netdev because for QEMU the device is > > removed, but when we exit QEMU does a cleanup and as the card is not really > > removed it tries to cleanup it. > > " > > above statement contradicts with previous (highlighted) one, so has > > device been removed or not? > > > > " > > The instance of the netdev doesn't exist anymore so it triggers a > > crash.(In reply to Ani Sinha from comment #28) > > " > > > > above shall not happen until device referencing it still exists. I have sent two patches upstream in addition to the one I sent earlier for address check (part (a) of the issue) : 1. trivial - add a missing assertion - https://lists.gnu.org/archive/html/qemu-devel/2023-06/msg03317.html 2. actual fix to part (b) of the issue - https://lists.gnu.org/archive/html/qemu-devel/2023-06/msg03319.html Hopefully 2) will be an acceptable solution after some discussions. (In reply to Ani Sinha from comment #35) > (In reply to Julia Suvorova from comment #34) > > (In reply to Igor Mammedov from comment #30) > > > > (In reply to Igor Mammedov from comment #26) > > > > > (In reply to Ani Sinha from comment #25) > > > > > > (In reply to Igor Mammedov from comment #24) > > > > > > > acpi_pcihp_eject_slot() triggers device removal on behalf of guest eject > > > > > > > action only. > > > > > > > Think in more generic terms, does it work if we never unplug nic (if yes, > > > > > > > then why). > > > > > > > > > > > > This bug is about hot unplugging a device that was plugged in with a wrong > > > > > > slot address. The way I see this is that the cleanup happens differently for > > > > > > > > > > Where it's said that it's wrong address? > > > > > > > > See comment #9 above https://bugzilla.redhat.com/show_bug.cgi?id=2128929#c9 > > > > and my experiments validate it. > > > > > > I wouldn't treat comments (including mine as absolute source of truth) > > > > > > quote from comment 9: > > > " > > > The problem happens because the address provided in device_add is not > > > valid: > > > " > > > > > > QEMU migh not support multifunction hotplug, but it doesn't make 'not valid' > > > per se. > > > > > > " > > > > > > {"execute":"device_add","arguments":{"driver":"virtio-net-pci","netdev": > > > "hostnet1","mac":"00:11:22:33:44:03","id": > > > "net1","bus":"pci.6","addr":"0x2.0x5"}} > > > > > > thus the card is not really plugged by the the kernel and then QEMU cannot > > > really remove it. > > > " > > > > > > ^^^^^^^^^^ > > > > In that particular comment #12 example, what do you think pci address of > > device net1 and pci structure in general would be in the end, and what it > > should be? > > As mentioned in comment #9, > > > The following command works fine: > > {"execute":"device_add","arguments":{"driver":"virtio-net-pci","netdev":"hostnet1","mac":"00:11:22:33:44:03","id": "net1","bus":"pci.6","addr":"0x0"}} > > > "0x0" is the right address. At least the device address should be 0. If the > function is non-zero, even if this is technically correct for a > multifinction bridge, OSPM won't detect a hotplug. The question was to Igor, as I did not understand his opposition to correct port check. > > > > > " > > > We can remove the instance of the netdev because for QEMU the device is > > > removed, but when we exit QEMU does a cleanup and as the card is not really > > > removed it tries to cleanup it. > > > " > > > above statement contradicts with previous (highlighted) one, so has > > > device been removed or not? > > > > > > " > > > The instance of the netdev doesn't exist anymore so it triggers a > > > crash.(In reply to Ani Sinha from comment #28) > > > " > > > > > > above shall not happen until device referencing it still exists. (In reply to Julia Suvorova from comment #37) > (In reply to Ani Sinha from comment #35) > > (In reply to Julia Suvorova from comment #34) > > > (In reply to Igor Mammedov from comment #30) > > > > > (In reply to Igor Mammedov from comment #26) > > > > > > (In reply to Ani Sinha from comment #25) > > > > > > > (In reply to Igor Mammedov from comment #24) > > > > > > > > acpi_pcihp_eject_slot() triggers device removal on behalf of guest eject > > > > > > > > action only. > > > > > > > > Think in more generic terms, does it work if we never unplug nic (if yes, > > > > > > > > then why). > > > > > > > > > > > > > > This bug is about hot unplugging a device that was plugged in with a wrong > > > > > > > slot address. The way I see this is that the cleanup happens differently for > > > > > > > > > > > > Where it's said that it's wrong address? > > > > > > > > > > See comment #9 above https://bugzilla.redhat.com/show_bug.cgi?id=2128929#c9 > > > > > and my experiments validate it. > > > > > > > > I wouldn't treat comments (including mine as absolute source of truth) > > > > > > > > quote from comment 9: > > > > " > > > > The problem happens because the address provided in device_add is not > > > > valid: > > > > " > > > > > > > > QEMU migh not support multifunction hotplug, but it doesn't make 'not valid' > > > > per se. > > > > > > > > " > > > > > > > > {"execute":"device_add","arguments":{"driver":"virtio-net-pci","netdev": > > > > "hostnet1","mac":"00:11:22:33:44:03","id": > > > > "net1","bus":"pci.6","addr":"0x2.0x5"}} > > > > > > > > thus the card is not really plugged by the the kernel and then QEMU cannot > > > > really remove it. > > > > " > > > > > > > > ^^^^^^^^^^ > > > > > > In that particular comment #12 example, what do you think pci address of > > > device net1 and pci structure in general would be in the end, and what it > > > should be? > > > > As mentioned in comment #9, > > > > > The following command works fine: > > > {"execute":"device_add","arguments":{"driver":"virtio-net-pci","netdev":"hostnet1","mac":"00:11:22:33:44:03","id": "net1","bus":"pci.6","addr":"0x0"}} > > > > > > "0x0" is the right address. At least the device address should be 0. If the > > function is non-zero, even if this is technically correct for a > > multifinction bridge, OSPM won't detect a hotplug. > > The question was to Igor, as I did not understand his opposition to correct > port check. [...] func != 0 is also technically correct address (even if guest won't re-enumerate it). Like Michael's mentioned potentially we can do multi-function hotplug (plug !0 functions 1st and then plug function 0 last, which will trigger bus re-enumeration. Though I'll resist any notion that it is supported if something doesn't work there) My point was that it is not really hotplug issue (it's just a test case that tigers the problem). I'd hazard a guess, It might explode with cold-plugged multifunction as well and likely is not limited to PCIe only. (i.e. Is skipping device_del and only doing netdev_del on !0 function backend trigers assert?)
> func != 0 is also technically correct address (even if guest won't
> re-enumerate it).
> Like Michael's mentioned potentially we can do multi-function hotplug
> (plug !0 functions 1st and then plug function 0 last, which will trigger bus
> re-enumeration. Though I'll resist any notion that it is supported if
> something doesn't work there)
>
> My point was that it is not really hotplug issue (it's just a test case that
> tigers the problem).
> I'd hazard a guess, It might explode with cold-plugged multifunction as well
> and likely is not limited to PCIe only.
> (i.e. Is skipping device_del and only doing netdev_del on !0 function
> backend trigers assert?)
I think we are putting too much emphasis on the function. Its irrelevant. As long as slot is non-zero, my experiments show that the guest won't eject the device.
That being said, Igor is right. If we skip device_del and just do netdev_del, the issue will trigger - I did a quick test. Function and slot are irrelevant here as we are deliberately not ejecting the nic.
(In reply to Igor Mammedov from comment #38) > (In reply to Julia Suvorova from comment #37) > > (In reply to Ani Sinha from comment #35) > > > (In reply to Julia Suvorova from comment #34) > > > > (In reply to Igor Mammedov from comment #30) > > > > > > (In reply to Igor Mammedov from comment #26) > > > > > > > (In reply to Ani Sinha from comment #25) > > > > > > > > (In reply to Igor Mammedov from comment #24) > > > > > > > > > acpi_pcihp_eject_slot() triggers device removal on behalf of guest eject > > > > > > > > > action only. > > > > > > > > > Think in more generic terms, does it work if we never unplug nic (if yes, > > > > > > > > > then why). > > > > > > > > > > > > > > > > This bug is about hot unplugging a device that was plugged in with a wrong > > > > > > > > slot address. The way I see this is that the cleanup happens differently for > > > > > > > > > > > > > > Where it's said that it's wrong address? > > > > > > > > > > > > See comment #9 above https://bugzilla.redhat.com/show_bug.cgi?id=2128929#c9 > > > > > > and my experiments validate it. > > > > > > > > > > I wouldn't treat comments (including mine as absolute source of truth) > > > > > > > > > > quote from comment 9: > > > > > " > > > > > The problem happens because the address provided in device_add is not > > > > > valid: > > > > > " > > > > > > > > > > QEMU migh not support multifunction hotplug, but it doesn't make 'not valid' > > > > > per se. > > > > > > > > > > " > > > > > > > > > > {"execute":"device_add","arguments":{"driver":"virtio-net-pci","netdev": > > > > > "hostnet1","mac":"00:11:22:33:44:03","id": > > > > > "net1","bus":"pci.6","addr":"0x2.0x5"}} > > > > > > > > > > thus the card is not really plugged by the the kernel and then QEMU cannot > > > > > really remove it. > > > > > " > > > > > > > > > > ^^^^^^^^^^ > > > > > > > > In that particular comment #12 example, what do you think pci address of > > > > device net1 and pci structure in general would be in the end, and what it > > > > should be? > > > > > > As mentioned in comment #9, > > > > > > > The following command works fine: > > > > {"execute":"device_add","arguments":{"driver":"virtio-net-pci","netdev":"hostnet1","mac":"00:11:22:33:44:03","id": "net1","bus":"pci.6","addr":"0x0"}} > > > > > > > > > "0x0" is the right address. At least the device address should be 0. If the > > > function is non-zero, even if this is technically correct for a > > > multifinction bridge, OSPM won't detect a hotplug. > > > > The question was to Igor, as I did not understand his opposition to correct > > port check. > [...] > func != 0 is also technically correct address (even if guest won't > re-enumerate it). > Like Michael's mentioned potentially we can do multi-function hotplug > (plug !0 functions 1st and then plug function 0 last, which will trigger bus > re-enumeration. Though I'll resist any notion that it is supported if > something doesn't work there I don't say anything about multifunction devices, that side is absolutely valid. The device number is wrong there, it's #2 instead of #0 on the root port, which is weird because it shouldn't exist. > My point was that it is not really hotplug issue (it's just a test case that > tigers the problem). > I'd hazard a guess, It might explode with cold-plugged multifunction as well > and likely is not limited to PCIe only. > (i.e. Is skipping device_del and only doing netdev_del on !0 function > backend trigers assert?) (In reply to Julia Suvorova from comment #40) > (In reply to Igor Mammedov from comment #38) > > (In reply to Julia Suvorova from comment #37) > > > (In reply to Ani Sinha from comment #35) > > > > (In reply to Julia Suvorova from comment #34) > > > > > (In reply to Igor Mammedov from comment #30) > > > > > > > (In reply to Igor Mammedov from comment #26) > > > > > > > > (In reply to Ani Sinha from comment #25) > > > > > > > > > (In reply to Igor Mammedov from comment #24) > > > > > > > > > > acpi_pcihp_eject_slot() triggers device removal on behalf of guest eject > > > > > > > > > > action only. > > > > > > > > > > Think in more generic terms, does it work if we never unplug nic (if yes, > > > > > > > > > > then why). > > > > > > > > > > > > > > > > > > This bug is about hot unplugging a device that was plugged in with a wrong > > > > > > > > > slot address. The way I see this is that the cleanup happens differently for > > > > > > > > > > > > > > > > Where it's said that it's wrong address? > > > > > > > > > > > > > > See comment #9 above https://bugzilla.redhat.com/show_bug.cgi?id=2128929#c9 > > > > > > > and my experiments validate it. > > > > > > > > > > > > I wouldn't treat comments (including mine as absolute source of truth) > > > > > > > > > > > > quote from comment 9: > > > > > > " > > > > > > The problem happens because the address provided in device_add is not > > > > > > valid: > > > > > > " > > > > > > > > > > > > QEMU migh not support multifunction hotplug, but it doesn't make 'not valid' > > > > > > per se. > > > > > > > > > > > > " > > > > > > > > > > > > {"execute":"device_add","arguments":{"driver":"virtio-net-pci","netdev": > > > > > > "hostnet1","mac":"00:11:22:33:44:03","id": > > > > > > "net1","bus":"pci.6","addr":"0x2.0x5"}} > > > > > > > > > > > > thus the card is not really plugged by the the kernel and then QEMU cannot > > > > > > really remove it. > > > > > > " > > > > > > > > > > > > ^^^^^^^^^^ > > > > > > > > > > In that particular comment #12 example, what do you think pci address of > > > > > device net1 and pci structure in general would be in the end, and what it > > > > > should be? > > > > > > > > As mentioned in comment #9, > > > > > > > > > The following command works fine: > > > > > {"execute":"device_add","arguments":{"driver":"virtio-net-pci","netdev":"hostnet1","mac":"00:11:22:33:44:03","id": "net1","bus":"pci.6","addr":"0x0"}} > > > > > > > > > > > > "0x0" is the right address. At least the device address should be 0. If the > > > > function is non-zero, even if this is technically correct for a > > > > multifinction bridge, OSPM won't detect a hotplug. > > > > > > The question was to Igor, as I did not understand his opposition to correct > > > port check. > > [...] > > func != 0 is also technically correct address (even if guest won't > > re-enumerate it). > > Like Michael's mentioned potentially we can do multi-function hotplug > > (plug !0 functions 1st and then plug function 0 last, which will trigger bus > > re-enumeration. Though I'll resist any notion that it is supported if > > something doesn't work there > > I don't say anything about multifunction devices, that side is absolutely > valid. The device number is wrong there, it's #2 instead of #0 on the root > port, which is weird because it shouldn't exist. > Yes! 100% agreed with Julia here. (In reply to Ani Sinha from comment #36) > I have sent two patches upstream in addition to the one I sent earlier for > address check (part (a) of the issue) : > > 1. trivial - add a missing assertion - > https://lists.gnu.org/archive/html/qemu-devel/2023-06/msg03317.html > 2. actual fix to part (b) of the issue - > https://lists.gnu.org/archive/html/qemu-devel/2023-06/msg03319.html > > Hopefully 2) will be an acceptable solution after some discussions. Part (a) fix + fixes for tests that were breaking because of the change: https://lists.gnu.org/archive/html/qemu-devel/2023-06/msg04502.html QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass. Raised this https://bugzilla.redhat.com/show_bug.cgi?id=2222009 to track the invalid device addr use issue. An upstream patch has been pushed that spits out a warning https://gitlab.com/qemu-project/qemu/-/commit/ca92eb5defcf9d1c2106341744a73a03cf26e824 ==>Reproduced this problem on the qemu-kvm-8.0.0-6.el9 =>Test Version: kernel-5.14.0-336.el9.x86_64 qemu-kvm-8.0.0-6.el9.x86_64 edk2-ovmf-20230524-1.el9.noarch =>Test Steps 1. Create a vdpa device vdpa dev add name vdpa${i} mgmtdev pci/$pci_addr mac 00:11:22:33:44:${macaddr} 2. Boot a guest without nic device /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -blockdev '{"node-name": "file_ovmf_code", "driver": "file", "filename": "/usr/share/OVMF/OVMF_CODE.secboot.fd", "auto-read-only": true, "discard": "unmap"}' \ -blockdev '{"node-name": "drive_ovmf_code", "driver": "raw", "read-only": true, "file": "file_ovmf_code"}' \ -blockdev '{"node-name": "file_ovmf_vars", "driver": "file", "filename": "/root/avocado/data/avocado-vt/avocado-vt-vm1_rhel930-64-virtio-scsi-ovmf_qcow2_filesystem_VARS.raw", "auto-read-only": true, "discard": "unmap"}' \ -blockdev '{"node-name": "drive_ovmf_vars", "driver": "raw", "read-only": false, "file": "file_ovmf_vars"}' \ -machine q35,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars,memory-backend=mem-machine_mem \ -device '{"driver":"pcie-root-port","port":16,"chassis":1,"id":"pci.1","bus":"pcie.0","multifunction":true,"addr":"0x2"}' \ -device '{"driver":"pcie-root-port","port":17,"chassis":2,"id":"pci.2","bus":"pcie.0","addr":"0x2.0x1"}' \ -device '{"driver":"pcie-root-port","port":21,"chassis":6,"id":"pci.6","bus":"pcie.0","addr":"0x2.0x5"}' \ -nodefaults \ -m 8096 \ -object '{"size": 8489271296, "id": "mem-machine_mem", "qom-type": "memory-backend-ram"}' \ -smp 28,maxcpus=28,cores=14,threads=1,dies=1,sockets=2 \ -cpu 'Icelake-Server-noTSX',+kvm_pv_unhalt \ -device '{"id": "virtio_scsi_pci0", "driver": "virtio-scsi-pci", "bus": "pci.2", "addr": "0x0"}' \ -blockdev '{"node-name": "file_image1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/home/kvm_autotest_root/images/rhel930-64-virtio-scsi-ovmf.qcow2", "cache": {"direct": true, "no-flush": false}}' \ -blockdev '{"node-name": "drive_image1", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_image1"}' \ -device '{"driver": "scsi-hd", "id": "image1", "drive": "drive_image1", "write-cache": "on"}' \ -device '{"driver":"virtio-vga","id":"video0","bus":"pcie.0","addr":"0x1"}' \ -vnc :0 \ -rtc base=utc,clock=host,driftfix=slew \ -boot menu=off,order=cdn,once=c,strict=off \ -enable-kvm \ -device '{"id": "pcie_extra_root_port_0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x3", "chassis": 5}' \ -monitor stdio \ -qmp tcp:0:5555,server,nowait \ 3. Hotpulg&unplug vdpa nic {"execute": "qmp_capabilities"} {"return": {}} {"execute":"netdev_add","arguments":{"type":"vhost-vdpa","id":"hostnet1","vhostdev":"/dev/vhost-vdpa-0"}} {"return": {}} {"execute":"device_add","arguments":{"driver":"virtio-net-pci","netdev":"hostnet1","mac":"00:11:22:33:44:03","id": "net1","bus":"pci.6","addr":"0x2.0x5"}} {"return": {}} {"execute":"device_del","arguments":{"id": "net1"}} {"return": {}} {"execute":"netdev_del","arguments":{"id":"hostnet1"}} {"return": {}} 4. Shutdown guest # shutdown -h now 5. guest hit qemu core dump (qemu) qemu-kvm: ../hw/net/vhost_net.c:520: VHostNetState *get_vhost_net(NetClientState *): Assertion `vhost_net' failed. ==> So reproduced this problem on the qemu-kvm-8.0.0-6.el9 ==>Verified this problem on the qemu-kvm-8.0.0-7.el9 =>Test Version kernel-5.14.0-336.el9.x86_64 qemu-kvm-8.0.0-7.el9.x86_64 edk2-ovmf-20230524-1.el9.noarch =>Test Steps Repeated the above test steps, there is no issues any more, so this problem has been fixed very well on the qemu-kvm-8.0.0-7.el9, move it to "VERIFIED". Asking Z-Stream for RHEL-9.2.0 primarily because the same commit will fix RHEL-200 (https://issues.redhat.com/browse/RHEL-200) for rhel-9.2.0 and Z-stream is not available with JIRA and RHEL-9.2.0. The fix that needs to be backported is: commit a0d7215e339b61c7d7a7b3fcf754954d80d93eb8 Author: Ani Sinha <anisinha> Date: Mon Jun 19 12:22:09 2023 +0530 vhost-vdpa: do not cleanup the vdpa/vhost-net structures if peer nic is present When a peer nic is still attached to the vdpa backend, it is too early to free up the vhost-net and vdpa structures. If these structures are freed here, then QEMU crashes when the guest is being shut down. The following call chain would result in an assertion failure since the pointer returned from vhost_vdpa_get_vhost_net() would be NULL: do_vm_stop() -> vm_state_notify() -> virtio_set_status() -> virtio_net_vhost_status() -> get_vhost_net(). Therefore, we defer freeing up the structures until at guest shutdown time when qemu_cleanup() calls net_cleanup() which then calls qemu_del_net_client() which would eventually call vhost_vdpa_cleanup() again to free up the structures. This time, the loop in net_cleanup() ensures that vhost_vdpa_cleanup() will be called one last time when all the peer nics are detached and freed. All unit tests pass with this change. CC: imammedo CC: jusual CC: mst Fixes: CVE-2023-3301 Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929 Signed-off-by: Ani Sinha <anisinha> Message-Id: <20230619065209.442185-1-anisinha> Reviewed-by: Michael S. Tsirkin <mst> Signed-off-by: Michael S. Tsirkin <mst> This patch should fix a QEMU core dump in both cases (BZ 2128929 and RHEL-200) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:6368 |