2141364 – libvirt doesn't catch mdevs created thru sysfs [rhel-9.1.0.z]

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2141364 - libvirt doesn't catch mdevs created thru sysfs [rhel-9.1.0.z]

Summary: libvirt doesn't catch mdevs created thru sysfs [rhel-9.1.0.z]

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 9
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	9.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Jonathon Jongsma
QA Contact:	zhentang
Docs Contact:
URL:
Whiteboard:
Depends On:	2109450
Blocks:
TreeView+	depends on / blocked

Reported:	2022-11-09 15:37 UTC by RHEL Program Management Team
Modified:	2023-01-23 15:21 UTC (History)
CC List:	18 users (show)
Fixed In Version:	libvirt-8.5.0-7.1.el9_1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	2109450
Environment:
Last Closed:	2023-01-23 15:18:08 UTC
Type:	---
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHELPLAN-138852	0	None	None	None	2022-11-09 16:05:20 UTC
Red Hat Product Errata	RHBA-2023:0311	0	None	None	None	2023-01-23 15:18:17 UTC

Comment 1 Jonathon Jongsma 2022-11-10 20:39:11 UTC

posted for review: https://gitlab.com/redhat/rhel/src/libvirt/-/merge_requests/52

Comment 4 chhu 2022-11-22 03:24:11 UTC

Tested on on OSP17.0 with libvirt packages:
[heat-admin@compute-0 ~]$ sudo podman exec -it nova_virtqemud rpm -qa | grep libvirt| grep driver
libvirt-daemon-driver-nwfilter-8.5.0-7.1.el9_1.x86_64
libvirt-daemon-driver-qemu-8.5.0-7.1.el9_1.x86_64
libvirt-daemon-driver-storage-core-8.5.0-7.1.el9_1.x86_64
libvirt-daemon-driver-nodedev-8.5.0-7.1.el9_1.x86_64
libvirt-daemon-driver-secret-8.5.0-7.1.el9_1.x86_64

Test steps:
1. Prepare the vGPU environment on OSP17.0 with RHEL9.1 libvirt,qemu-kvm,kernel
(undercloud) [stack@dell-per740-66 ~]$ ssh heat-admin.24.19
[heat-admin@compute-0 ~]$ lspci|grep VGA
03:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. Integrated Matrox G200eW3 Graphics Controller (rev 04)
3d:00.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1)
3e:00.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1)
[heat-admin@compute-0 ~]$ cat /sys/class/mdev_bus/0000\:3d\:00.0/mdev_supported_types/nvidia-18/name
GRID M60-2Q
[heat-admin@compute-0 ~]$ uuid=$(uuidgen)
[heat-admin@compute-0 ~]$ cd /sys/class/mdev_bus/0000:3d:00.0/mdev_supported_types/nvidia-18
[heat-admin@compute-0 nvidia-18]$ sudo chmod 666 create
[heat-admin@compute-0 nvidia-18]$ sudo echo $uuid
10a750fe-e839-481b-96d5-54e9d3eaa786
[heat-admin@compute-0 nvidia-18]$ sudo echo $uuid > create
[heat-admin@compute-0 nvidia-18]$ ls ../../| grep $uuid
10a750fe-e839-481b-96d5-54e9d3eaa786

2. Check in nova_virtqemud, mdev device is present in the list of node devices
(undercloud) [stack@dell-per740-66 ~]$ ssh heat-admin.24.19
[heat-admin@compute-0 ~]$ sudo podman exec -it nova_virtqemud python3
Python 3.9.10 (main, Feb  9 2022, 00:00:00) 
[GCC 11.2.1 20220127 (Red Hat 11.2.1-9)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import libvirt
>>> conn = libvirt.open('qemu:///system')
>>> conn.listDevices('mdev')
['mdev_10a750fe_e839_481b_96d5_54e9d3eaa786_0000_3d_00_0']

Comment 5 chhu 2022-11-22 03:35:55 UTC

Add more test results:
3. Delete the mdev device by using uuid,
   check the available_instances and `virsh nodedev-list` outputs are correct
[heat-admin@compute-0 nvidia-18]$ cat /sys/class/mdev_bus/0000\:3d\:00.0/mdev_supported_types/nvidia-18/available_instances
3
[heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud virsh nodedev-list --cap mdev
mdev_10a750fe_e839_481b_96d5_54e9d3eaa786_0000_3d_00_0

[heat-admin@compute-0 nvidia-18]$ sudo chmod 666 /sys/bus/mdev/devices/10a750fe-e839-481b-96d5-54e9d3eaa786/remove
[heat-admin@compute-0 nvidia-18]$ sudo echo 1 > /sys/bus/mdev/devices/10a750fe-e839-481b-96d5-54e9d3eaa786/remove
[heat-admin@compute-0 nvidia-18]$ sudo ls /sys/class/mdev_bus/0000:3d:00.0| grep $uuid
[heat-admin@compute-0 nvidia-18]$ cat /sys/class/mdev_bus/0000\:3d\:00.0/mdev_supported_types/nvidia-18/available_instances
4
[heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud virsh nodedev-list --cap mdev
No output

Checking in nova_virtqemud:
[heat-admin@compute-0 ~]$ sudo podman exec -it nova_virtqemud python3
Python 3.9.10 (main, Feb  9 2022, 00:00:00) 
[GCC 11.2.1 20220127 (Red Hat 11.2.1-9)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import libvirt
>>> conn = libvirt.open('qemu:///system')
>>> conn.listDevices('mdev')
[]

4. Create the mdev device by virsh commands,
   check the `virsh nodedev-create,list,dumpxml` outputs are correct
[heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud cat mdev.xml
<device>
  <parent>pci_0000_3d_00_0</parent>
  <capability type='mdev'>
    <type id='nvidia-18'/>
    <uuid>c71395b9-0484-46af-9f01-7b00edfe5038</uuid>
  </capability>
</device>
[heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud virsh nodedev-create mdev.xml
Node device mdev_c71395b9_0484_46af_9f01_7b00edfe5038_0000_3d_00_0 created from mdev.xml

[heat-admin@compute-0 nvidia-18]$ cat /sys/class/mdev_bus/0000\:3d\:00.0/mdev_supported_types/nvidia-18/available_instances 
3
[heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud virsh nodedev-list --cap mdev
mdev_c71395b9_0484_46af_9f01_7b00edfe5038_0000_3d_00_0

[heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud virsh nodedev-dumpxml mdev_c71395b9_0484_46af_9f01_7b00edfe5038_0000_3d_00_0
<device>
  <name>mdev_c71395b9_0484_46af_9f01_7b00edfe5038_0000_3d_00_0</name>
  <path>/sys/devices/pci0000:3a/0000:3a:00.0/0000:3b:00.0/0000:3c:08.0/0000:3d:00.0/c71395b9-0484-46af-9f01-7b00edfe5038</path>
  <parent>pci_0000_3d_00_0</parent>
  <driver>
    <name>nvidia-vgpu-vfio</name>
  </driver>
  <capability type='mdev'>
    <type id='nvidia-18'/>
    <uuid>c71395b9-0484-46af-9f01-7b00edfe5038</uuid>
    <parent_addr>0000:3d:00.0</parent_addr>
    <iommuGroup number='138'/>
  </capability>
</device>

Checking in nova_virtqemud:
>>> conn.listDevices('mdev')
['mdev_c71395b9_0484_46af_9f01_7b00edfe5038_0000_3d_00_0']

5. Delete the mdev device by virsh commands,
   check the `virsh nodedev-destroy,list` outputs are correct
[heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud virsh nodedev-destroy mdev_c71395b9_0484_46af_9f01_7b00edfe5038_0000_3d_00_0
Destroyed node device 'mdev_c71395b9_0484_46af_9f01_7b00edfe5038_0000_3d_00_0'

[heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud virsh nodedev-list --cap mdev

[heat-admin@compute-0 nvidia-18]$ cat /sys/class/mdev_bus/0000\:3d\:00.0/mdev_supported_types/nvidia-18/available_instances
4
Checking in nova_virtqemud:
>>> conn.listDevices('mdev')
[]

Comment 6 chhu 2022-11-22 03:47:20 UTC

6. Create the max number of available_instances mdev devices,
   check in nova_virtqemud, all the mdev devices are present in the list of node devices
[heat-admin@compute-0 nvidia-18]$ sudo ls /sys/class/mdev_bus/0000:3d:00.0
0dd92cbb-07c1-4739-8832-f31c27cd1147  broken_parity_status		    device	     iommu_group	   modalias	reset	      resource5
8ac81344-447e-46b3-bfc5-4ef03e4aa742  class				    dma_mask_bits    irq		   msi_bus	reset_method  revision
8ed4d103-deee-464e-977d-b169617ff672  config				    driver	     link		   msi_irqs	resource      subsystem
......
boot_vga			      d4da684d-1f63-4916-937a-1d9e12525baf
[heat-admin@compute-0 nvidia-18]$ cat /sys/class/mdev_bus/0000\:3d\:00.0/mdev_supported_types/nvidia-18/available_instances
0
[heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud virsh nodedev-list --cap mdev
mdev_0dd92cbb_07c1_4739_8832_f31c27cd1147_0000_3d_00_0
mdev_8ac81344_447e_46b3_bfc5_4ef03e4aa742_0000_3d_00_0
mdev_8ed4d103_deee_464e_977d_b169617ff672_0000_3d_00_0
mdev_d4da684d_1f63_4916_937a_1d9e12525baf_0000_3d_00_0

Check in virtqemud:
>>> conn.listDevices('mdev')
['mdev_8ed4d103_deee_464e_977d_b169617ff672_0000_3d_00_0', 'mdev_8ac81344_447e_46b3_bfc5_4ef03e4aa742_0000_3d_00_0', 'mdev_0dd92cbb_07c1_4739_8832_f31c27cd1147_0000_3d_00_0', 'mdev_d4da684d_1f63_4916_937a_1d9e12525baf_0000_3d_00_0']

7. Delete all the mdev devices by `echo 1 > /sys/bus/mdev/devices/***/remove`
check the available_instances and `virsh nodedev-list` outputs are correct

[heat-admin@compute-0 0000:3d:00.0]$ cat /sys/class/mdev_bus/0000\:3d\:00.0/mdev_supported_types/nvidia-18/available_instances
4
[heat-admin@compute-0 0000:3d:00.0]$ sudo podman exec -it nova_virtqemud virsh nodedev-list --cap mdev
No output

Check in virtqemud:
>>> conn.listDevices('mdev')
[]

Comment 8 zhentang 2022-11-28 04:01:48 UTC

Tested on on OSP17.0 with libvirt packages:
[heat-admin@compute-0 ~]$ sudo podman exec -it nova_virtqemud rpm -qa | grep libvirt| grep driver
libvirt-daemon-driver-nwfilter-8.5.0-7.1.el9_1.x86_64
libvirt-daemon-driver-qemu-8.5.0-7.1.el9_1.x86_64
libvirt-daemon-driver-storage-core-8.5.0-7.1.el9_1.x86_64
libvirt-daemon-driver-nodedev-8.5.0-7.1.el9_1.x86_64
libvirt-daemon-driver-secret-8.5.0-7.1.el9_1.x86_64


some udevadm monitor output here just for reference


1.Created mdev using virsh command 

[heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud cat mdev.xml
<device>
  <parent>pci_0000_3d_00_0</parent>
  <capability type='mdev'>
    <type id='nvidia-18'/>
    <uuid>c71395b9-0484-46af-9f01-7b00edfe5038</uuid>
  </capability>
</device>

[heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud virsh nodedev-create mdev.xml
Node device mdev_c71395b9_0484_46af_9f01_7b00edfe5038_0000_3d_00_0 created from mdev.xml


on another tab
[heat-admin@compute-0 ~]$ udevadm monitor
monitor will print the received events for:
UDEV - the event which udev sends out after rule processing
KERNEL - the kernel uevent

KERNEL[516975.794758] add      /devices/pci0000:3a/0000:3a:00.0/0000:3b:00.0/0000:3c:08.0/0000:3d:00.0/c71395b9-0484-46af-9f01-7b00edfe5038 (mdev)
UDEV  [516975.797323] add      /devices/pci0000:3a/0000:3a:00.0/0000:3b:00.0/0000:3c:08.0/0000:3d:00.0/c71395b9-0484-46af-9f01-7b00edfe5038 (mdev)
KERNEL[516975.899282] add      /devices/virtual/vfio/139 (vfio)
KERNEL[516975.899332] bind     /devices/pci0000:3a/0000:3a:00.0/0000:3b:00.0/0000:3c:08.0/0000:3d:00.0/c71395b9-0484-46af-9f01-7b00edfe5038 (mdev)
UDEV  [516975.900515] bind     /devices/pci0000:3a/0000:3a:00.0/0000:3b:00.0/0000:3c:08.0/0000:3d:00.0/c71395b9-0484-46af-9f01-7b00edfe5038 (mdev)
UDEV  [516975.904917] add      /devices/virtual/vfio/139 (vfio)


2. virsh nodedev related commands

scenario: 
undefine mdev and create with same xml 
[heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud virsh nodedev-info mdev_c71395b9_0484_46af_9f01_7b00edfe5038_0000_3d_00_0
Name:           mdev_c71395b9_0484_46af_9f01_7b00edfe5038_0000_3d_00_0
Parent:         pci_0000_3d_00_0
Active:         yes
Persistent:     no
Autostart:      no

[heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud virsh nodedev-destroy mdev_c71395b9_0484_46af_9f01_7b00edfe5038_0000_3d_00_0
Destroyed node device 'mdev_c71395b9_0484_46af_9f01_7b00edfe5038_0000_3d_00_0'

[heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud virsh nodedev-create mdev.xml
Node device mdev_c71395b9_0484_46af_9f01_7b00edfe5038_0000_3d_00_0 created from mdev.xml

scenario:
define then create mdev use virsh nodedev-cmd
[heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud virsh nodedev-define mdev.xml
Node device 'mdev_c71395b9_0484_46af_9f01_7b00edfe5038_0000_3d_00_0' defined from 'mdev.xml'

[heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud virsh nodedev-create mdev.xml
Node device mdev_c71395b9_0484_46af_9f01_7b00edfe5038_0000_3d_00_0 created from mdev.xml

[heat-admin@compute-0 ~]$ udevadm monitor
monitor will print the received events for:
UDEV - the event which udev sends out after rule processing
KERNEL - the kernel uevent

KERNEL[521083.705462] add      /devices/pci0000:3a/0000:3a:00.0/0000:3b:00.0/0000:3c:08.0/0000:3d:00.0/c71395b9-0484-46af-9f01-7b00edfe5038 (mdev)
UDEV  [521083.708322] add      /devices/pci0000:3a/0000:3a:00.0/0000:3b:00.0/0000:3c:08.0/0000:3d:00.0/c71395b9-0484-46af-9f01-7b00edfe5038 (mdev)
KERNEL[521083.810082] add      /devices/virtual/vfio/138 (vfio)
KERNEL[521083.810117] bind     /devices/pci0000:3a/0000:3a:00.0/0000:3b:00.0/0000:3c:08.0/0000:3d:00.0/c71395b9-0484-46af-9f01-7b00edfe5038 (mdev)
UDEV  [521083.811261] bind     /devices/pci0000:3a/0000:3a:00.0/0000:3b:00.0/0000:3c:08.0/0000:3d:00.0/c71395b9-0484-46af-9f01-7b00edfe5038 (mdev)
UDEV  [521083.815385] add      /devices/virtual/vfio/138 (vfio)

Comment 13 errata-xmlrpc 2023-01-23 15:18:08 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (libvirt bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:0311

Note You need to log in before you can comment on or make changes to this bug.