Bug 2141365
Summary: | libvirt doesn't catch mdevs created thru sysfs [rhel-9.0.0.z] | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | RHEL Program Management Team <pgm-rhel-tools> |
Component: | libvirt | Assignee: | Jonathon Jongsma <jjongsma> |
libvirt sub component: | General | QA Contact: | zhentang <zhetang> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | bdobreli, bsawyers, camorris, chhu, dyuan, egallen, fjin, gveitmic, jdenemar, jsuchane, kchamart, lmen, smooney, virt-maint, xuzhang, yafu, ymankad, zhetang |
Version: | 9.0 | Keywords: | Regression, Triaged, ZStream |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | libvirt-8.0.0-8.2.el9_0 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | 2109450 | Environment: | |
Last Closed: | 2022-12-13 16:10:06 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 2109450 | ||
Bug Blocks: |
Comment 1
Jonathon Jongsma
2022-11-10 20:35:45 UTC
Tested on on OSP17.0 with libvirt packages:
[heat-admin@compute-0 ~]$ sudo podman exec -it nova_virtqemud rpm -qa | grep libvirt| grep driver
libvirt-daemon-driver-nwfilter-8.0.0-8.2.el9_0.x86_64
libvirt-daemon-driver-nodedev-8.0.0-8.2.el9_0.x86_64
libvirt-daemon-driver-qemu-8.0.0-8.2.el9_0.x86_64
libvirt-daemon-driver-secret-8.0.0-8.2.el9_0.x86_64
libvirt-daemon-driver-storage-core-8.0.0-8.2.el9_0.x86_64
Test steps:
1. Prepare the vGPU environment on OSP17.0
(undercloud) [stack@dell-per740-66 ~]$ ssh heat-admin.24.23
[heat-admin@compute-0 ~]$ lspci|grep VGA
03:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. Integrated Matrox G200eW3 Graphics Controller (rev 04)
3d:00.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1)
3e:00.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1)
[heat-admin@compute-0 ~]$ cat /sys/class/mdev_bus/0000\:3d\:00.0/mdev_supported_types/nvidia-18/name
GRID M60-2Q
[heat-admin@compute-0 ~]$ uuid=$(uuidgen)
[heat-admin@compute-0 ~]$ cd /sys/class/mdev_bus/0000:3d:00.0/mdev_supported_types/nvidia-18
[heat-admin@compute-0 nvidia-18]$ sudo chmod 666 create
[heat-admin@compute-0 nvidia-18]$ sudo echo $uuid
2890cda7-21d3-4106-acee-c238004966b8
[heat-admin@compute-0 nvidia-18]$ sudo echo $uuid > create
[heat-admin@compute-0 nvidia-18]$ ls ../../| grep $uuid
2890cda7-21d3-4106-acee-c238004966b8
2. Check in nova_virtqemud, mdev device is present in the list of node devices
(undercloud) [stack@dell-per740-66 ~]$ ssh heat-admin.24.23
[heat-admin@compute-0 ~]$ sudo podman exec -it nova_virtqemud python3
Python 3.9.10 (main, Feb 9 2022, 00:00:00)
[GCC 11.2.1 20220127 (Red Hat 11.2.1-9)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import libvirt
>>> conn = libvirt.open('qemu:///system')
>>> conn.listDevices('mdev')
['mdev_2890cda7_21d3_4106_acee_c238004966b8_0000_3d_00_0']
Add more test results: 3. Delete the mdev device by using uuid, check the available_instances and `virsh nodedev-list` outputs are correct [heat-admin@compute-0 nvidia-18]$ cat /sys/class/mdev_bus/0000\:3d\:00.0/mdev_supported_types/nvidia-18/available_instances 3 [heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud virsh nodedev-list --cap mdev mdev_2890cda7_21d3_4106_acee_c238004966b8_0000_3d_00_0 [heat-admin@compute-0 nvidia-18]$ sudo chmod 666 /sys/bus/mdev/devices/2890cda7-21d3-4106-acee-c238004966b8/remove [heat-admin@compute-0 nvidia-18]$ sudo echo 1 > /sys/bus/mdev/devices/2890cda7-21d3-4106-acee-c238004966b8/remove [heat-admin@compute-0 nvidia-18]$ sudo ls /sys/class/mdev_bus/0000:3d:00.0| grep $uuid [heat-admin@compute-0 nvidia-18]$ cat /sys/class/mdev_bus/0000\:3d\:00.0/mdev_supported_types/nvidia-18/available_instances 4 [heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud virsh nodedev-list --cap mdev No output Checking in nova_virtqemud: [heat-admin@compute-0 ~]$ sudo podman exec -it nova_virtqemud python3 >>> import libvirt >>> conn = libvirt.open('qemu:///system') >>> conn.listDevices('mdev') [] 4. Create the mdev device by virsh commands, check the `virsh nodedev-create,list,dumpxml` outputs are correct [heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud cat mdev.xml <device> <parent>pci_0000_3d_00_0</parent> <capability type='mdev'> <type id='nvidia-18'/> <uuid>c71395b9-0484-46af-9f01-7b00edfe5038</uuid> </capability> </device> [heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud virsh nodedev-create mdev.xml Node device mdev_c71395b9_0484_46af_9f01_7b00edfe5038_0000_3d_00_0 created from mdev.xml [heat-admin@compute-0 nvidia-18]$ cat /sys/class/mdev_bus/0000\:3d\:00.0/mdev_supported_types/nvidia-18/available_instances 3 [heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud virsh nodedev-list --cap mdev mdev_c71395b9_0484_46af_9f01_7b00edfe5038_0000_3d_00_0 [heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud virsh nodedev-dumpxml mdev_c71395b9_0484_46af_9f01_7b00edfe5038_0000_3d_00_0 <device> <name>mdev_c71395b9_0484_46af_9f01_7b00edfe5038_0000_3d_00_0</name> <path>/sys/devices/pci0000:3a/0000:3a:00.0/0000:3b:00.0/0000:3c:08.0/0000:3d:00.0/c71395b9-0484-46af-9f01-7b00edfe5038</path> <parent>pci_0000_3d_00_0</parent> <driver> <name>nvidia-vgpu-vfio</name> </driver> <capability type='mdev'> <type id='nvidia-18'/> <uuid>c71395b9-0484-46af-9f01-7b00edfe5038</uuid> <iommuGroup number='138'/> </capability> </device> Checking in nova_virtqemud: >>> conn.listDevices('mdev') ['mdev_c71395b9_0484_46af_9f01_7b00edfe5038_0000_3d_00_0'] 5. Delete the mdev device by virsh commands, check the `virsh nodedev-destroy,list` outputs are correct [heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud virsh nodedev-destroy mdev_c71395b9_0484_46af_9f01_7b00edfe5038_0000_3d_00_0 Destroyed node device 'mdev_c71395b9_0484_46af_9f01_7b00edfe5038_0000_3d_00_0' [heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud virsh nodedev-list --cap mdev No output [heat-admin@compute-0 nvidia-18]$ cat /sys/class/mdev_bus/0000\:3d\:00.0/mdev_supported_types/nvidia-18/available_instances 4 Checking in nova_virtqemud: >>> conn.listDevices('mdev') [] 6. Create the max number of available_instances mdev devices,
check in nova_virtqemud, all the mdev devices are present in the list of node devices
[heat-admin@compute-0 nvidia-18]$ ls ../../
0ba71129-5db5-40a3-8c0d-a0b8ca15fccb broken_parity_status device iommu_group modalias reset resource5
8b3cd491-f863-41f3-b4f9-9a0969fdf564 c86e9994-5f0d-4941-8efd-e4d41a2b3c0a dma_mask_bits irq msi_bus reset_method revision
8e3454b5-f19e-4447-88c2-73eab11f8797 class ......
[heat-admin@compute-0 nvidia-18]$ cat /sys/class/mdev_bus/0000\:3d\:00.0/mdev_supported_types/nvidia-18/available_instances
0
[heat-admin@compute-0 ~]$ sudo podman exec -it nova_virtqemud python3
Python 3.9.10 (main, Feb 9 2022, 00:00:00)
[GCC 11.2.1 20220127 (Red Hat 11.2.1-9)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import libvirt
>>> conn = libvirt.open('qemu:///system')
>>> conn.listDevices('mdev')
['mdev_c86e9994_5f0d_4941_8efd_e4d41a2b3c0a_0000_3d_00_0', 'mdev_8b3cd491_f863_41f3_b4f9_9a0969fdf564_0000_3d_00_0', 'mdev_0ba71129_5db5_40a3_8c0d_a0b8ca15fccb_0000_3d_00_0', 'mdev_8e3454b5_f19e_4447_88c2_73eab11f8797_0000_3d_00_0']
[heat-admin@compute-0 nvidia-18]$ sudo podman exec -it nova_virtqemud virsh nodedev-list --cap mdev
mdev_0ba71129_5db5_40a3_8c0d_a0b8ca15fccb_0000_3d_00_0
mdev_8b3cd491_f863_41f3_b4f9_9a0969fdf564_0000_3d_00_0
mdev_8e3454b5_f19e_4447_88c2_73eab11f8797_0000_3d_00_0
mdev_c86e9994_5f0d_4941_8efd_e4d41a2b3c0a_0000_3d_00_0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (libvirt bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:8982 |