Bug 2177620
Summary: | [mlx vhost_vdpa][rhel 9.2]qemu core dump when hot unplug then hotplug a vdpa interface with multi-queue setting | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Lei Yang <leiyang> | |
Component: | qemu-kvm | Assignee: | Laurent Vivier <lvivier> | |
qemu-kvm sub component: | Networking | QA Contact: | Lei Yang <leiyang> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | urgent | |||
Priority: | unspecified | CC: | aadam, chayang, eperezma, jinzhao, juzhang, lulu, lvivier, virt-maint, wquan, yalzhang, yama, ymankad | |
Version: | 9.2 | Keywords: | Regression, Triaged, ZStream | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | qemu-kvm-8.0.0-3.el9 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2213864 (view as bug list) | Environment: | ||
Last Closed: | 2023-11-07 08:27:12 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 2180898 | |||
Bug Blocks: | 2213864 |
Comment 9
Laurent Vivier
2023-03-15 09:02:01 UTC
==> Reproduced this problem on the latest rhel 9.2 qemu-kvm version: qemu-kvm-7.2.0-14.el9_2.x86_64 =>Test Version qemu-kvm-7.2.0-14.el9_2.x86_64 kernel-5.14.0-313.el9.x86_64 iproute-6.2.0-1.el9.x86_64 # flint -d 0000:17:00.0 q Image type: FS4 FW Version: 22.37.0154 FW Release Date: 17.3.2023 Product Version: 22.37.0154 Description: UID GuidsNumber Base GUID: b8cef603000a11f0 4 Base MAC: b8cef60a11f0 4 Image VSD: N/A Device VSD: N/A PSID: MT_0000000359 Security Attributes: N/A =>Test steps 1. Create a multi queues vdpa device # vdpa dev add name vdpa0 mgmtdev pci/$pci_addr mac 00:11:22:33:44:00 max_vqp 8 2. Boot a guest with this vdpa device -device '{"driver": "virtio-net-pci", "mac": "00:11:22:33:44:00", "id": "net0", "netdev": "hostnet0", "mq": true, "vectors": 18, "bus": "pcie-root-port-3", "addr": "0x0"}' \ -netdev vhost-vdpa,id=hostnet0,vhostdev=/dev/vhost-vdpa-0,queues=8 \ 3. Hot unplug devive {"execute": "device_del", "arguments": {"id": "net0"}} {"return": {}} {"timestamp": {"seconds": 1685085387, "microseconds": 922551}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/net0/virtio-backend"}} {"timestamp": {"seconds": 1685085387, "microseconds": 973046}, "event": "DEVICE_DELETED", "data": {"device": "net0", "path": "/machine/peripheral/net0"}} {"execute": "netdev_del", "arguments": {"id": "hostnet0"}} {"return": {}} 4. Hotplug this device again {"execute":"netdev_add","arguments":{"type":"vhost-vdpa","id":"hostnet0","vhostdev":"/dev/vhost-vdpa-0","queues": 8}} {"return": {}} {"execute":"device_add","arguments":{"driver":"virtio-net-pci","netdev":"hostnet0","mac":"00:11:22:33:44:00","id": "net0","bus":"pcie-root-port-3","addr":"0x0","mq":true,"vectors": 18}} {"return": {}} 5. After a few moments, guest hit qemu core dump. ==>So reproduced this problem on qemu-kvm-7.2.0-14.el9_2.x86_64 ==>Verified it on the qemu-kvm-8.0.0-3.el9.x86_64 =>Repeated the above test steps, guest works well, so this bug has been fixed very well on qemu-kvm-8.0.0-3.el9.x86_64. Hello Laurent Based on the above test result, QE would like to confirm two questions, could you please help review them, thanks in advance: 1. This bug has been fixed on the qemu-kvm-8.0.0-3.el9.x86_64. Can QE closed the current bug as "CURRENTRELEASE"? 2. It also can reproduced on the latest rhel 9.2 qemu-kvm version,is it need to backport? Thanks Lei (In reply to Lei Yang from comment #11) > Hello Laurent Hi Lei, > Based on the above test result, QE would like to confirm two questions, > could you please help review them, thanks in advance: > > 1. This bug has been fixed on the qemu-kvm-8.0.0-3.el9.x86_64. Can QE closed > the current bug as "CURRENTRELEASE"? Yes > 2. It also can reproduced on the latest rhel 9.2 qemu-kvm version,is it need > to backport? it's a question for @eperezma And do you know which commits fix the problem? Thanks Hi Laurent According to https://issues.redhat.com/browse/RHEL-274 test result,it should be fixed by this patch: commit 2e1a9de96b487cf818a22d681cad8d3f5d18dcca Author: Eugenio Pérez <eperezma> Date: Thu Feb 9 18:00:04 2023 +0100 vdpa: stop all svq on device deletion Not stopping them leave the device in a bad state when virtio-net fronted device is unplugged with device_del monitor command. This is not triggable in regular poweroff or qemu forces shutdown because cleanup is called right after vhost_vdpa_dev_start(false). But devices hot unplug does not call vdpa device cleanups. This lead to all the vhost_vdpa devices without stop the SVQ but the last. Fix it and clean the code, making it symmetric with vhost_vdpa_svqs_start. Fixes: dff4426fa656 ("vhost: Add Shadow VirtQueue kick forwarding capabilities") Reported-by: Lei Yang <leiyang> Signed-off-by: Eugenio Pérez <eperezma> Message-Id: <20230209170004.899472-1-eperezma> Tested-by: Laurent Vivier <lvivier> Acked-by: Jason Wang <jasowang> Thanks Lei (In reply to Lei Yang from comment #13) > Hi Laurent > > According to https://issues.redhat.com/browse/RHEL-274 test result,it should > be fixed by this patch: > > commit 2e1a9de96b487cf818a22d681cad8d3f5d18dcca > Author: Eugenio Pérez <eperezma> > Date: Thu Feb 9 18:00:04 2023 +0100 > > vdpa: stop all svq on device deletion > > Not stopping them leave the device in a bad state when virtio-net > fronted device is unplugged with device_del monitor command. > > This is not triggable in regular poweroff or qemu forces shutdown > because cleanup is called right after vhost_vdpa_dev_start(false). But > devices hot unplug does not call vdpa device cleanups. This lead to all > the vhost_vdpa devices without stop the SVQ but the last. > > Fix it and clean the code, making it symmetric with > vhost_vdpa_svqs_start. > > Fixes: dff4426fa656 ("vhost: Add Shadow VirtQueue kick forwarding > capabilities") > Reported-by: Lei Yang <leiyang> > Signed-off-by: Eugenio Pérez <eperezma> > Message-Id: <20230209170004.899472-1-eperezma> > Tested-by: Laurent Vivier <lvivier> > Acked-by: Jason Wang <jasowang> > > Thanks > Lei But according to comment #9 this fix introduces a regression, I think it is not enough. Hi Laurent According to QE's test result, comment 9 mentioned problem also had been fixed,just QE can not make sure which commit to fixed that problem. For more details please refer to: https://issues.redhat.com/browse/RHEL-200 latest comment. Thanks Lei According comment #11, it's been fixed in QEMU 8.0.0 and comes with the rebase in RHEL 9.3.0. Moving to MODIFIED, and asking for Z-stream Based on the Comment 10 test result, move to "VERIFIED". Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:6368 |