Bug 2141365

Summary: libvirt doesn't catch mdevs created thru sysfs [rhel-9.0.0.z]
Product: Red Hat Enterprise Linux 9 Reporter: RHEL Program Management Team <pgm-rhel-tools>
Component: libvirtAssignee: Jonathon Jongsma <jjongsma>
libvirt sub component: General QA Contact: zhentang <zhetang>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: bdobreli, bsawyers, camorris, chhu, dyuan, egallen, fjin, gveitmic, jdenemar, jsuchane, kchamart, lmen, smooney, virt-maint, xuzhang, yafu, ymankad, zhetang
Version: 9.0Keywords: Regression, Triaged, ZStream
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-8.0.0-8.2.el9_0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2109450 Environment:
Last Closed: 2022-12-13 16:10:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2109450    
Bug Blocks:    

Comment 1 Jonathon Jongsma 2022-11-10 20:35:45 UTC
posted for review: https://gitlab.com/redhat/rhel/src/libvirt/-/merge_requests/51

Comment 2 chhu 2022-11-18 02:57:40 UTC
Tested on on OSP17.0 with libvirt packages:
[heat-admin@compute-0 ~]$ sudo podman exec -it nova_virtqemud rpm -qa | grep libvirt| grep driver
libvirt-daemon-driver-nwfilter-8.0.0-8.2.el9_0.x86_64
libvirt-daemon-driver-nodedev-8.0.0-8.2.el9_0.x86_64
libvirt-daemon-driver-qemu-8.0.0-8.2.el9_0.x86_64
libvirt-daemon-driver-secret-8.0.0-8.2.el9_0.x86_64
libvirt-daemon-driver-storage-core-8.0.0-8.2.el9_0.x86_64

Test steps:
1. Prepare the vGPU environment on OSP17.0
(undercloud) [stack@dell-per740-66 ~]$ ssh heat-admin.24.23
[heat-admin@compute-0 ~]$ lspci|grep VGA
03:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. Integrated Matrox G200eW3 Graphics Controller (rev 04)
3d:00.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1)
3e:00.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1)
[heat-admin@compute-0 ~]$ cat /sys/class/mdev_bus/0000\:3d\:00.0/mdev_supported_types/nvidia-18/name
GRID M60-2Q
[heat-admin@compute-0 ~]$ uuid=$(uuidgen)
[heat-admin@compute-0 ~]$ cd /sys/class/mdev_bus/0000:3d:00.0/mdev_supported_types/nvidia-18
[heat-admin@compute-0 nvidia-18]$ sudo chmod 666 create
[heat-admin@compute-0 nvidia-18]$ sudo echo $uuid
2890cda7-21d3-4106-acee-c238004966b8
[heat-admin@compute-0 nvidia-18]$ sudo echo $uuid > create
[heat-admin@compute-0 nvidia-18]$ ls ../../| grep $uuid
2890cda7-21d3-4106-acee-c238004966b8

2. Check in nova_virtqemud, mdev device is present in the list of node devices
(undercloud) [stack@dell-per740-66 ~]$ ssh heat-admin.24.23
[heat-admin@compute-0 ~]$ sudo podman exec -it nova_virtqemud python3
Python 3.9.10 (main, Feb  9 2022, 00:00:00) 
[GCC 11.2.1 20220127 (Red Hat 11.2.1-9)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import libvirt
>>> conn = libvirt.open('qemu:///system')
>>> conn.listDevices('mdev')
['mdev_2890cda7_21d3_4106_acee_c238004966b8_0000_3d_00_0']

Comment 3 chhu 2022-11-18 03:12:32 UTC
Add more test results:
3. Delete the mdev device by using uuid,
   check the available_instances and `virsh nodedev-list` outputs are correct
[heat-admin@compute-0 nvidia-18]$ cat /sys/class/mdev_bus/0000\:3d\:00.0/mdev_supported_types/nvidia-18/available_instances
3
[heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud virsh nodedev-list --cap mdev
mdev_2890cda7_21d3_4106_acee_c238004966b8_0000_3d_00_0

[heat-admin@compute-0 nvidia-18]$ sudo chmod 666 /sys/bus/mdev/devices/2890cda7-21d3-4106-acee-c238004966b8/remove
[heat-admin@compute-0 nvidia-18]$ sudo echo 1 > /sys/bus/mdev/devices/2890cda7-21d3-4106-acee-c238004966b8/remove
[heat-admin@compute-0 nvidia-18]$ sudo ls /sys/class/mdev_bus/0000:3d:00.0| grep $uuid
[heat-admin@compute-0 nvidia-18]$ cat /sys/class/mdev_bus/0000\:3d\:00.0/mdev_supported_types/nvidia-18/available_instances
4
[heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud virsh nodedev-list --cap mdev
No output

Checking in nova_virtqemud:
[heat-admin@compute-0 ~]$ sudo podman exec -it nova_virtqemud python3
>>> import libvirt
>>> conn = libvirt.open('qemu:///system')
>>> conn.listDevices('mdev')
[]

4. Create the mdev device by virsh commands,
   check the `virsh nodedev-create,list,dumpxml` outputs are correct
[heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud cat mdev.xml
<device>
  <parent>pci_0000_3d_00_0</parent>
  <capability type='mdev'>
    <type id='nvidia-18'/>
    <uuid>c71395b9-0484-46af-9f01-7b00edfe5038</uuid>
  </capability>
</device>
[heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud virsh nodedev-create mdev.xml
Node device mdev_c71395b9_0484_46af_9f01_7b00edfe5038_0000_3d_00_0 created from mdev.xml

[heat-admin@compute-0 nvidia-18]$ cat /sys/class/mdev_bus/0000\:3d\:00.0/mdev_supported_types/nvidia-18/available_instances 
3
[heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud virsh nodedev-list --cap mdev
mdev_c71395b9_0484_46af_9f01_7b00edfe5038_0000_3d_00_0

[heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud virsh nodedev-dumpxml mdev_c71395b9_0484_46af_9f01_7b00edfe5038_0000_3d_00_0
<device>
  <name>mdev_c71395b9_0484_46af_9f01_7b00edfe5038_0000_3d_00_0</name>
  <path>/sys/devices/pci0000:3a/0000:3a:00.0/0000:3b:00.0/0000:3c:08.0/0000:3d:00.0/c71395b9-0484-46af-9f01-7b00edfe5038</path>
  <parent>pci_0000_3d_00_0</parent>
  <driver>
    <name>nvidia-vgpu-vfio</name>
  </driver>
  <capability type='mdev'>
    <type id='nvidia-18'/>
    <uuid>c71395b9-0484-46af-9f01-7b00edfe5038</uuid>
    <iommuGroup number='138'/>
  </capability>
</device>

Checking in nova_virtqemud:
>>> conn.listDevices('mdev')
['mdev_c71395b9_0484_46af_9f01_7b00edfe5038_0000_3d_00_0']

5. Delete the mdev device by virsh commands,
   check the `virsh nodedev-destroy,list` outputs are correct
[heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud virsh nodedev-destroy mdev_c71395b9_0484_46af_9f01_7b00edfe5038_0000_3d_00_0
Destroyed node device 'mdev_c71395b9_0484_46af_9f01_7b00edfe5038_0000_3d_00_0'
[heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud virsh nodedev-list --cap mdev
No output
[heat-admin@compute-0 nvidia-18]$ cat /sys/class/mdev_bus/0000\:3d\:00.0/mdev_supported_types/nvidia-18/available_instances 
4
Checking in nova_virtqemud:
>>> conn.listDevices('mdev')
[]

Comment 4 chhu 2022-11-18 03:20:22 UTC
6. Create the max number of available_instances mdev devices,
   check in nova_virtqemud, all the mdev devices are present in the list of node devices
[heat-admin@compute-0 nvidia-18]$ ls ../../
0ba71129-5db5-40a3-8c0d-a0b8ca15fccb  broken_parity_status                  device           iommu_group           modalias     reset         resource5
8b3cd491-f863-41f3-b4f9-9a0969fdf564  c86e9994-5f0d-4941-8efd-e4d41a2b3c0a  dma_mask_bits    irq                   msi_bus      reset_method  revision
8e3454b5-f19e-4447-88c2-73eab11f8797  class ......

[heat-admin@compute-0 nvidia-18]$ cat /sys/class/mdev_bus/0000\:3d\:00.0/mdev_supported_types/nvidia-18/available_instances 
0

[heat-admin@compute-0 ~]$ sudo podman exec -it nova_virtqemud python3
Python 3.9.10 (main, Feb  9 2022, 00:00:00) 
[GCC 11.2.1 20220127 (Red Hat 11.2.1-9)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import libvirt
>>> conn = libvirt.open('qemu:///system')
>>> conn.listDevices('mdev')
['mdev_c86e9994_5f0d_4941_8efd_e4d41a2b3c0a_0000_3d_00_0', 'mdev_8b3cd491_f863_41f3_b4f9_9a0969fdf564_0000_3d_00_0', 'mdev_0ba71129_5db5_40a3_8c0d_a0b8ca15fccb_0000_3d_00_0', 'mdev_8e3454b5_f19e_4447_88c2_73eab11f8797_0000_3d_00_0']

[heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud virsh nodedev-list --cap mdev
mdev_0ba71129_5db5_40a3_8c0d_a0b8ca15fccb_0000_3d_00_0
mdev_8b3cd491_f863_41f3_b4f9_9a0969fdf564_0000_3d_00_0
mdev_8e3454b5_f19e_4447_88c2_73eab11f8797_0000_3d_00_0
mdev_c86e9994_5f0d_4941_8efd_e4d41a2b3c0a_0000_3d_00_0

Comment 12 errata-xmlrpc 2022-12-13 16:10:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (libvirt bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:8982