RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2177620 - [mlx vhost_vdpa][rhel 9.2]qemu core dump when hot unplug then hotplug a vdpa interface with multi-queue setting
Summary: [mlx vhost_vdpa][rhel 9.2]qemu core dump when hot unplug then hotplug a vdpa ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: qemu-kvm
Version: 9.2
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: rc
: ---
Assignee: Laurent Vivier
QA Contact: Lei Yang
URL:
Whiteboard:
Depends On: 2180898
Blocks: 2213864
TreeView+ depends on / blocked
 
Reported: 2023-03-13 07:33 UTC by Lei Yang
Modified: 2023-11-07 09:19 UTC (History)
12 users (show)

Fixed In Version: qemu-kvm-8.0.0-3.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2213864 (view as bug list)
Environment:
Last Closed: 2023-11-07 08:27:12 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHEL-274 0 None None None 2023-03-13 07:49:02 UTC
Red Hat Issue Tracker RHELPLAN-151533 0 None None None 2023-03-13 07:34:20 UTC
Red Hat Product Errata RHSA-2023:6368 0 None None None 2023-11-07 08:28:45 UTC

Comment 9 Laurent Vivier 2023-03-15 09:02:01 UTC
@eperezma 

I don't think we can push this fix ("vdpa: stop all svq on device deletion") while the race condition between virtio-net and vhost is not fixed:
do you agree?

As reported by Lei, it doesn't fix RHEL-200 (https://issues.redhat.com/browse/RHEL-200) but moreover it introduces a regression

Comment 10 Lei Yang 2023-05-26 07:33:22 UTC
==> Reproduced this problem on the latest rhel 9.2 qemu-kvm version: qemu-kvm-7.2.0-14.el9_2.x86_64

=>Test Version
qemu-kvm-7.2.0-14.el9_2.x86_64
kernel-5.14.0-313.el9.x86_64
iproute-6.2.0-1.el9.x86_64

# flint -d 0000:17:00.0 q
Image type:            FS4
FW Version:            22.37.0154
FW Release Date:       17.3.2023
Product Version:       22.37.0154
Description:           UID                GuidsNumber
Base GUID:             b8cef603000a11f0        4
Base MAC:              b8cef60a11f0            4
Image VSD:             N/A
Device VSD:            N/A
PSID:                  MT_0000000359
Security Attributes:   N/A

=>Test steps
1. Create a multi queues vdpa device
# vdpa dev add name vdpa0 mgmtdev pci/$pci_addr mac 00:11:22:33:44:00 max_vqp 8

2. Boot a guest with this vdpa device
-device '{"driver": "virtio-net-pci", "mac": "00:11:22:33:44:00", "id": "net0", "netdev": "hostnet0", "mq": true, "vectors": 18, "bus": "pcie-root-port-3", "addr": "0x0"}'  \
-netdev vhost-vdpa,id=hostnet0,vhostdev=/dev/vhost-vdpa-0,queues=8  \

3. Hot unplug devive
{"execute": "device_del", "arguments": {"id": "net0"}}
{"return": {}}
{"timestamp": {"seconds": 1685085387, "microseconds": 922551}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/net0/virtio-backend"}}
{"timestamp": {"seconds": 1685085387, "microseconds": 973046}, "event": "DEVICE_DELETED", "data": {"device": "net0", "path": "/machine/peripheral/net0"}}
{"execute": "netdev_del", "arguments": {"id": "hostnet0"}}
{"return": {}}

4. Hotplug this device again
{"execute":"netdev_add","arguments":{"type":"vhost-vdpa","id":"hostnet0","vhostdev":"/dev/vhost-vdpa-0","queues": 8}}
{"return": {}}
{"execute":"device_add","arguments":{"driver":"virtio-net-pci","netdev":"hostnet0","mac":"00:11:22:33:44:00","id": "net0","bus":"pcie-root-port-3","addr":"0x0","mq":true,"vectors": 18}}
{"return": {}}

5. After a few moments, guest hit qemu core dump.

==>So reproduced this problem on qemu-kvm-7.2.0-14.el9_2.x86_64

==>Verified it on the qemu-kvm-8.0.0-3.el9.x86_64
=>Repeated the above test steps, guest works well, so this bug has been fixed very well on qemu-kvm-8.0.0-3.el9.x86_64.

Comment 11 Lei Yang 2023-05-26 07:38:28 UTC
Hello Laurent

Based on the above test result, QE would like to confirm two questions, could you please help review them, thanks in advance:

1. This bug has been fixed on the qemu-kvm-8.0.0-3.el9.x86_64. Can QE closed the current bug as "CURRENTRELEASE"?
2. It also can reproduced on the latest rhel 9.2 qemu-kvm version,is it need to backport?

Thanks
Lei

Comment 12 Laurent Vivier 2023-05-26 08:31:20 UTC
(In reply to Lei Yang from comment #11)
> Hello Laurent

Hi Lei,

> Based on the above test result, QE would like to confirm two questions,
> could you please help review them, thanks in advance:
> 
> 1. This bug has been fixed on the qemu-kvm-8.0.0-3.el9.x86_64. Can QE closed
> the current bug as "CURRENTRELEASE"?

Yes

> 2. It also can reproduced on the latest rhel 9.2 qemu-kvm version,is it need
> to backport?

it's a question for @eperezma 
And do you know which commits fix the problem?

Thanks

Comment 13 Lei Yang 2023-05-26 09:25:33 UTC
Hi Laurent

According to https://issues.redhat.com/browse/RHEL-274 test result,it should be fixed by this patch:

commit 2e1a9de96b487cf818a22d681cad8d3f5d18dcca
Author: Eugenio Pérez <eperezma>
Date:   Thu Feb 9 18:00:04 2023 +0100

    vdpa: stop all svq on device deletion
    
    Not stopping them leave the device in a bad state when virtio-net
    fronted device is unplugged with device_del monitor command.
    
    This is not triggable in regular poweroff or qemu forces shutdown
    because cleanup is called right after vhost_vdpa_dev_start(false).  But
    devices hot unplug does not call vdpa device cleanups.  This lead to all
    the vhost_vdpa devices without stop the SVQ but the last.
    
    Fix it and clean the code, making it symmetric with
    vhost_vdpa_svqs_start.
    
    Fixes: dff4426fa656 ("vhost: Add Shadow VirtQueue kick forwarding capabilities")
    Reported-by: Lei Yang <leiyang>
    Signed-off-by: Eugenio Pérez <eperezma>
    Message-Id: <20230209170004.899472-1-eperezma>
    Tested-by: Laurent Vivier <lvivier>
    Acked-by: Jason Wang <jasowang>

Thanks
Lei

Comment 14 Laurent Vivier 2023-05-26 11:40:22 UTC
(In reply to Lei Yang from comment #13)
> Hi Laurent
> 
> According to https://issues.redhat.com/browse/RHEL-274 test result,it should
> be fixed by this patch:
> 
> commit 2e1a9de96b487cf818a22d681cad8d3f5d18dcca
> Author: Eugenio Pérez <eperezma>
> Date:   Thu Feb 9 18:00:04 2023 +0100
> 
>     vdpa: stop all svq on device deletion
>     
>     Not stopping them leave the device in a bad state when virtio-net
>     fronted device is unplugged with device_del monitor command.
>     
>     This is not triggable in regular poweroff or qemu forces shutdown
>     because cleanup is called right after vhost_vdpa_dev_start(false).  But
>     devices hot unplug does not call vdpa device cleanups.  This lead to all
>     the vhost_vdpa devices without stop the SVQ but the last.
>     
>     Fix it and clean the code, making it symmetric with
>     vhost_vdpa_svqs_start.
>     
>     Fixes: dff4426fa656 ("vhost: Add Shadow VirtQueue kick forwarding
> capabilities")
>     Reported-by: Lei Yang <leiyang>
>     Signed-off-by: Eugenio Pérez <eperezma>
>     Message-Id: <20230209170004.899472-1-eperezma>
>     Tested-by: Laurent Vivier <lvivier>
>     Acked-by: Jason Wang <jasowang>
> 
> Thanks
> Lei

But according to comment #9 this fix introduces a regression, I think it is not enough.

Comment 15 Lei Yang 2023-05-29 02:02:54 UTC
Hi Laurent

According to QE's test result, comment 9 mentioned problem also had been fixed,just QE can not make sure which commit to fixed that problem. For more details please refer to: https://issues.redhat.com/browse/RHEL-200 latest comment.

Thanks
Lei

Comment 19 Laurent Vivier 2023-06-09 08:42:35 UTC
According comment #11, it's been fixed in QEMU 8.0.0 and comes with the rebase in RHEL 9.3.0.

Moving to MODIFIED, and asking for Z-stream

Comment 24 Lei Yang 2023-06-14 00:35:20 UTC
Based on the Comment 10 test result, move to "VERIFIED".

Comment 26 errata-xmlrpc 2023-11-07 08:27:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6368


Note You need to log in before you can comment on or make changes to this bug.