RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2182961 - virtqemud coredump when hotunplug a hostdev interface
Summary: virtqemud coredump when hotunplug a hostdev interface
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: libvirt
Version: 9.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Peter Krempa
QA Contact: yalzhang@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-03-30 04:38 UTC by yalzhang@redhat.com
Modified: 2023-11-07 09:39 UTC (History)
9 users (show)

Fixed In Version: libvirt-9.2.0-1.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-11-07 08:31:17 UTC
Type: Bug
Target Upstream Version: 9.2.0
Embargoed:


Attachments (Terms of Use)
bisection (2.79 KB, application/gzip)
2023-03-30 12:39 UTC, Han Han
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-153499 0 None None None 2023-03-30 04:40:19 UTC
Red Hat Product Errata RHSA-2023:6409 0 None None None 2023-11-07 08:31:49 UTC

Description yalzhang@redhat.com 2023-03-30 04:38:48 UTC
Description of problem:
virtqemud coredump when hotunplug a hostdev interface

Version-Release number of selected component (if applicable):
libvirt-9.1.0-1.el9.x86_64
qemu-kvm-7.2.0-14.el9_2.x86_64
kernel-5.14.0-289.el9.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Start vm with a hostdev interface;# virsh dumpxml avocado-vt-vm1 --xpath //interface
<interface type="hostdev" managed="yes">
  <mac address="52:54:00:aa:5c:5a"/>
  <source>
    <address type="pci" domain="0x0000" bus="0x3b" slot="0x10" function="0x2"/>
  </source>
  <address type="pci" domain="0x0000" bus="0x07" slot="0x00" function="0x0"/>
</interface>

2. After vm boot successfully, hot unplug the hostdev interface, the virtqemud coredump:
# virsh start avocado-vt-vm1
# pidof virtqemud
639569
# virsh detach-interface avocado-vt-vm1 hostdev 
error: Disconnected from qemu:///system due to end of file
error: Failed to detach interface
error: End of file while reading data: Input/output error

# coredumpctl list | grep 639569
Wed 2023-03-29 23:47:30 EDT 639569   0   0 SIGABRT present  /usr/sbin/virtqemud    1.0M

Some errors in the libvirtd log:
2023-03-30 03:47:32.772+0000: 640045: error : virPCIDeviceReset:1073 : internal error: Unable to reset PCI device 0000:3b:10.2: internal error: Active 0000:3b:00.0 devices on bus with 0000:3b:10.2, not doing bus reset
2023-03-30 03:47:32.772+0000: 640045: error : virHostdevResetAllPCIDevices:614 : Failed to reset PCI device: internal error: Unable to reset PCI device 0000:3b:10.2: internal error: Active 0000:3b:00.0 devices on bus with 0000:3b:10.2, not doing bus reset
2023-03-30 03:47:33.778+0000: 640110: error : virCgroupDenyDevicePath:2256 : Path '/dev/vfio/145' is not accessible: No such file or directory
2023-03-30 03:47:33.786+0000: 640110: error : virPCIDeviceTrySecondaryBusReset:838 : internal error: Active 0000:3b:00.0 devices on bus with 0000:3b:10.2, not doing bus reset

Actual results:
virtqemud coredump when hot-unplug a hostdev interface

Expected results:
virtqemud should not coredump

Additional info:
Test with libvirt-9.0.0-10.el9_2.x86_64 with the same qemu and kernel, no such issue

Comment 3 Peter Krempa 2023-03-30 07:26:00 UTC
Looks like a double free:

#8  0x00007f261ab425ed in g_free (mem=0x7f26080432d0) at ../glib/gmem.c:199
#9  0x00007f261a6bdd66 in virBitmapFree (bitmap=0x7f260802ff70) at ../src/util/virbitmap.c:97
#10 virBitmapFree (bitmap=0x7f260802ff70) at ../src/util/virbitmap.c:94
#11 0x00007f261a75d153 in virDomainNetDefFree (def=0x7f2604028860) at ../src/conf/domain_conf.c:2749
#12 virDomainNetDefFree (def=def@entry=0x7f2604028860) at ../src/conf/domain_conf.c:2704
#13 0x00007f26140fe3f4 in qemuDomainRemoveHostDevice (driver=0x7f25cc022310, vm=0x7f25cc08e850, hostdev=<optimized out>) at ../src/qemu/qemu_hotplug.c:4564

The free call seems to correspond with:

virBitmapFree(def->source.subsys.u.pci.origstates);

Comment 4 Han Han 2023-03-30 12:39:35 UTC
Created attachment 1954672 [details]
bisection

Bisection shows the regression comes from:
d9e4075d4e9e4d699b5083572a534545f35a91b1 is the first bad commit
commit d9e4075d4e9e4d699b5083572a534545f35a91b1
Author: Peter Krempa <pkrempa>
Date:   Thu Oct 6 13:17:00 2022 +0200

    conf: Store 'origstates' of PCI hostdevs in a bitmap
    
    Refactor the code to use a bitmap with an enum.
    
    Signed-off-by: Peter Krempa <pkrempa>
    Reviewed-by: Ján Tomko <jtomko>
    Reviewed-by: Martin Kletzander <mkletzan>

 src/conf/domain_conf.c      | 97 ++++++++++++++++++++++-----------------------
 src/conf/domain_conf.h      | 31 +++++----------
 src/conf/virconftypes.h     |  2 -
 src/hypervisor/virhostdev.c | 25 +++++++-----
 4 files changed, 72 insertions(+), 83 deletions(-)


Run the following:
0. Clone libvirt git tree to ~. Extract the attachment to ~. Prepare the disk image as the domain XML rhel.xml; Update your vf PCI address to the inf.xml.
1. Run the virtqemud-onece.sh to trigger the crash. Then the buggy virtqemud version cannot start while the previous qemu-kvm process is running.
2. Run the virtqemud-abrt.sh as the script for bisection.

Comment 5 Peter Krempa 2023-03-30 13:20:53 UTC
Fixed by:

commit 0bfd11dd852335c1274b6dc1e771bd745d1fd94d 
Author: Peter Krempa <pkrempa>
Date:   Thu Mar 30 11:42:31 2023 +0200

    conf: Clear pointer to freed bitmap holding hostdev's 'origstates'
    
    'virDomainHostdevDefClear' must clear the pointers too as it can be
    invoked multiple times on the same object e.g. inside
    qemuDomainRemoveHostDevice once via virDomainHostdevDefFree which skips
    freeing the object if it's used via <interface> and thus has a 'net'
    definition corresponding to it, and then subsequently via
    virDomainNetDefFree.
    
    Fix it by clearing the pointer along with freeing it.
    
    Fixes: d9e4075d4e9
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2182961
    Signed-off-by: Peter Krempa <pkrempa>
    Reviewed-by: Ján Tomko <jtomko>

v9.2.0-rc2-1-g0bfd11dd85

Comment 6 Han Han 2023-03-31 01:54:09 UTC
For the test of comment4, PASS on v9.2.0-rc2-1-g0bfd11dd85
+ timeout -s INT 3 /root/libvirt/build/src/virtqemud
2023-03-31 01:47:14.987+0000: 84639: info : libvirt version: 9.2.0
2023-03-31 01:47:14.987+0000: 84639: info : hostname: dell-per740xd-19.lab.eng.pek2.redhat.com
2023-03-31 01:47:14.987+0000: 84639: error : virCgroupDenyDevicePath:2256 : Path '/dev/vfio/137' is not accessible: No such file or directory
2023-03-31 01:47:14.987+0000: 84639: warning : qemuDomainRemoveHostDevice:4525 : Failed to remove host device cgroup ACL
+ '[' 124 -eq 134 ']'
+ exit 0

Comment 7 yalzhang@redhat.com 2023-04-04 16:16:50 UTC
No such issue exists in automation function test job with libvirt-9.2.0-1.el9.x86_64.

Comment 11 yalzhang@redhat.com 2023-05-19 01:53:13 UTC
Test on libvirt-9.3.0-2.el9.x86_64, the issue is fixed.

Comment 13 errata-xmlrpc 2023-11-07 08:31:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: libvirt security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6409


Note You need to log in before you can comment on or make changes to this bug.