Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2177620

Summary: [mlx vhost_vdpa][rhel 9.2]qemu core dump when hot unplug then hotplug a vdpa interface with multi-queue setting
Product: Red Hat Enterprise Linux 9 Reporter: Lei Yang <leiyang>
Component: qemu-kvmAssignee: Laurent Vivier <lvivier>
qemu-kvm sub component: Networking QA Contact: Lei Yang <leiyang>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: unspecified CC: aadam, chayang, eperezma, jinzhao, juzhang, lulu, lvivier, virt-maint, wquan, yalzhang, yama, ymankad
Version: 9.2Keywords: Regression, Triaged, ZStream
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-8.0.0-3.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2213864 (view as bug list) Environment:
Last Closed: 2023-11-07 08:27:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2180898    
Bug Blocks: 2213864    

Comment 9 Laurent Vivier 2023-03-15 09:02:01 UTC
@eperezma 

I don't think we can push this fix ("vdpa: stop all svq on device deletion") while the race condition between virtio-net and vhost is not fixed:
do you agree?

As reported by Lei, it doesn't fix RHEL-200 (https://issues.redhat.com/browse/RHEL-200) but moreover it introduces a regression

Comment 10 Lei Yang 2023-05-26 07:33:22 UTC
==> Reproduced this problem on the latest rhel 9.2 qemu-kvm version: qemu-kvm-7.2.0-14.el9_2.x86_64

=>Test Version
qemu-kvm-7.2.0-14.el9_2.x86_64
kernel-5.14.0-313.el9.x86_64
iproute-6.2.0-1.el9.x86_64

# flint -d 0000:17:00.0 q
Image type:            FS4
FW Version:            22.37.0154
FW Release Date:       17.3.2023
Product Version:       22.37.0154
Description:           UID                GuidsNumber
Base GUID:             b8cef603000a11f0        4
Base MAC:              b8cef60a11f0            4
Image VSD:             N/A
Device VSD:            N/A
PSID:                  MT_0000000359
Security Attributes:   N/A

=>Test steps
1. Create a multi queues vdpa device
# vdpa dev add name vdpa0 mgmtdev pci/$pci_addr mac 00:11:22:33:44:00 max_vqp 8

2. Boot a guest with this vdpa device
-device '{"driver": "virtio-net-pci", "mac": "00:11:22:33:44:00", "id": "net0", "netdev": "hostnet0", "mq": true, "vectors": 18, "bus": "pcie-root-port-3", "addr": "0x0"}'  \
-netdev vhost-vdpa,id=hostnet0,vhostdev=/dev/vhost-vdpa-0,queues=8  \

3. Hot unplug devive
{"execute": "device_del", "arguments": {"id": "net0"}}
{"return": {}}
{"timestamp": {"seconds": 1685085387, "microseconds": 922551}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/net0/virtio-backend"}}
{"timestamp": {"seconds": 1685085387, "microseconds": 973046}, "event": "DEVICE_DELETED", "data": {"device": "net0", "path": "/machine/peripheral/net0"}}
{"execute": "netdev_del", "arguments": {"id": "hostnet0"}}
{"return": {}}

4. Hotplug this device again
{"execute":"netdev_add","arguments":{"type":"vhost-vdpa","id":"hostnet0","vhostdev":"/dev/vhost-vdpa-0","queues": 8}}
{"return": {}}
{"execute":"device_add","arguments":{"driver":"virtio-net-pci","netdev":"hostnet0","mac":"00:11:22:33:44:00","id": "net0","bus":"pcie-root-port-3","addr":"0x0","mq":true,"vectors": 18}}
{"return": {}}

5. After a few moments, guest hit qemu core dump.

==>So reproduced this problem on qemu-kvm-7.2.0-14.el9_2.x86_64

==>Verified it on the qemu-kvm-8.0.0-3.el9.x86_64
=>Repeated the above test steps, guest works well, so this bug has been fixed very well on qemu-kvm-8.0.0-3.el9.x86_64.

Comment 11 Lei Yang 2023-05-26 07:38:28 UTC
Hello Laurent

Based on the above test result, QE would like to confirm two questions, could you please help review them, thanks in advance:

1. This bug has been fixed on the qemu-kvm-8.0.0-3.el9.x86_64. Can QE closed the current bug as "CURRENTRELEASE"?
2. It also can reproduced on the latest rhel 9.2 qemu-kvm version,is it need to backport?

Thanks
Lei

Comment 12 Laurent Vivier 2023-05-26 08:31:20 UTC
(In reply to Lei Yang from comment #11)
> Hello Laurent

Hi Lei,

> Based on the above test result, QE would like to confirm two questions,
> could you please help review them, thanks in advance:
> 
> 1. This bug has been fixed on the qemu-kvm-8.0.0-3.el9.x86_64. Can QE closed
> the current bug as "CURRENTRELEASE"?

Yes

> 2. It also can reproduced on the latest rhel 9.2 qemu-kvm version,is it need
> to backport?

it's a question for @eperezma 
And do you know which commits fix the problem?

Thanks

Comment 13 Lei Yang 2023-05-26 09:25:33 UTC
Hi Laurent

According to https://issues.redhat.com/browse/RHEL-274 test result,it should be fixed by this patch:

commit 2e1a9de96b487cf818a22d681cad8d3f5d18dcca
Author: Eugenio Pérez <eperezma>
Date:   Thu Feb 9 18:00:04 2023 +0100

    vdpa: stop all svq on device deletion
    
    Not stopping them leave the device in a bad state when virtio-net
    fronted device is unplugged with device_del monitor command.
    
    This is not triggable in regular poweroff or qemu forces shutdown
    because cleanup is called right after vhost_vdpa_dev_start(false).  But
    devices hot unplug does not call vdpa device cleanups.  This lead to all
    the vhost_vdpa devices without stop the SVQ but the last.
    
    Fix it and clean the code, making it symmetric with
    vhost_vdpa_svqs_start.
    
    Fixes: dff4426fa656 ("vhost: Add Shadow VirtQueue kick forwarding capabilities")
    Reported-by: Lei Yang <leiyang>
    Signed-off-by: Eugenio Pérez <eperezma>
    Message-Id: <20230209170004.899472-1-eperezma>
    Tested-by: Laurent Vivier <lvivier>
    Acked-by: Jason Wang <jasowang>

Thanks
Lei

Comment 14 Laurent Vivier 2023-05-26 11:40:22 UTC
(In reply to Lei Yang from comment #13)
> Hi Laurent
> 
> According to https://issues.redhat.com/browse/RHEL-274 test result,it should
> be fixed by this patch:
> 
> commit 2e1a9de96b487cf818a22d681cad8d3f5d18dcca
> Author: Eugenio Pérez <eperezma>
> Date:   Thu Feb 9 18:00:04 2023 +0100
> 
>     vdpa: stop all svq on device deletion
>     
>     Not stopping them leave the device in a bad state when virtio-net
>     fronted device is unplugged with device_del monitor command.
>     
>     This is not triggable in regular poweroff or qemu forces shutdown
>     because cleanup is called right after vhost_vdpa_dev_start(false).  But
>     devices hot unplug does not call vdpa device cleanups.  This lead to all
>     the vhost_vdpa devices without stop the SVQ but the last.
>     
>     Fix it and clean the code, making it symmetric with
>     vhost_vdpa_svqs_start.
>     
>     Fixes: dff4426fa656 ("vhost: Add Shadow VirtQueue kick forwarding
> capabilities")
>     Reported-by: Lei Yang <leiyang>
>     Signed-off-by: Eugenio Pérez <eperezma>
>     Message-Id: <20230209170004.899472-1-eperezma>
>     Tested-by: Laurent Vivier <lvivier>
>     Acked-by: Jason Wang <jasowang>
> 
> Thanks
> Lei

But according to comment #9 this fix introduces a regression, I think it is not enough.

Comment 15 Lei Yang 2023-05-29 02:02:54 UTC
Hi Laurent

According to QE's test result, comment 9 mentioned problem also had been fixed,just QE can not make sure which commit to fixed that problem. For more details please refer to: https://issues.redhat.com/browse/RHEL-200 latest comment.

Thanks
Lei

Comment 19 Laurent Vivier 2023-06-09 08:42:35 UTC
According comment #11, it's been fixed in QEMU 8.0.0 and comes with the rebase in RHEL 9.3.0.

Moving to MODIFIED, and asking for Z-stream

Comment 24 Lei Yang 2023-06-14 00:35:20 UTC
Based on the Comment 10 test result, move to "VERIFIED".

Comment 26 errata-xmlrpc 2023-11-07 08:27:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6368