Description of problem: Failed to create VM with vGPU, hit the error: nova/virt/libvirt/utils.py "ValueError: badly formed hexadecimal UUID string" Version-Release number of selected component (if applicable): rhosp17-openstack-nova-compute:17.0_20220908.1 python3-nova-23.2.2-0.20220720130412.7074ac0.el9ost.noarch python3-novaclient-17.4.0-0.20210812172018.54d4da1.el9ost.noarch openstack-nova-common-23.2.2-0.20220720130412.7074ac0.el9ost.noarch openstack-nova-compute-23.2.2-0.20220720130412.7074ac0.el9ost.noarch openstack-nova-migration-23.2.2-0.20220720130412.7074ac0.el9ost.noarch Use libvirt with the fix for Bug 2109450 - libvirt doesn't catch mdevs created thru sysfs How reproducible: 100% Steps to Reproduce: 1. Prepare the vGPU environment on OSP17.0 (undercloud) [stack@dell-per740-66 ~]$ ssh heat-admin.24.10 [heat-admin@compute-0 ~]$ lspci|grep VGA 03:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. Integrated Matrox G200eW3 Graphics Controller (rev 04) 3d:00.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1) 3e:00.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1) [heat-admin@compute-0 ~]$ cat /sys/class/mdev_bus/0000\:3d\:00.0/mdev_supported_types/nvidia-22/name GRID M60-8Q [heat-admin@compute-0 ~]$ uuid=$(uuidgen) [heat-admin@compute-0 ~]$ cd /sys/class/mdev_bus/0000:3d:00.0/mdev_supported_types/nvidia-22 [heat-admin@compute-0 nvidia-22]$ sudo chmod 666 create [heat-admin@compute-0 nvidia-22]$ sudo echo $uuid b81a2fb4-1bcf-45b0-b61e-efba7f35b161 [heat-admin@compute-0 nvidia-22]$ sudo echo $uuid > create [heat-admin@compute-0 nvidia-22]$ cd ../../ [heat-admin@compute-0 0000:3d:00.0]$ ls b81a2fb4-1bcf-45b0-b61e-efba7f35b161 d3cold_allowed iommu mdev_supported_types rescan resource3_wc 2. Check in nova_virtqemud, mdev is present in the list of node devices [heat-admin@compute-0 ~]$ sudo podman exec -it nova_virtqemud python3 Python 3.9.10 (main, Feb 9 2022, 00:00:00) [GCC 11.2.1 20220127 (Red Hat 11.2.1-9)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import libvirt >>> conn = libvirt.open('qemu:///system') >>> conn.listDevices('mdev') ['mdev_b81a2fb4_1bcf_45b0_b61e_efba7f35b161_0000_3d_00_0'] 3. Try to start VM with the vGPU device, hit error (overcloud) [stack@dell-per740-66 ~]$ openstack flavor create --vcpus 6 --ram 4196 --disk 20 m2 (overcloud) [stack@dell-per740-66 ~]$ openstack flavor set m2 --property "resources:VGPU=1" (overcloud) [stack@dell-per740-66 ~]$ openstack network create default (overcloud) [stack@dell-per740-66 ~]$ openstack network list +--------------------------------------+---------+--------------------------------- | ID | Name | Subnets | +--------------------------------------+---------+--------------------------------- | 1dba36ee-d473-4354-a5ab-d6c7b6e0e666 | default | 2d6a822b-a511-47c2-918e-37ee947c0a8d | +--------------------------------------+---------+--------------------------------- (overcloud) [stack@dell-per740-66 ~]$ openstack subnet create default --network default --gateway 192.168.32.1 --subnet-range 192.168.32.0/24 (overcloud) [stack@dell-per740-66 ~]$ openstack image create r9-qcow2 --disk-format qcow2 --container-format bare --file RHEL-9.0-x86_64-latest.qcow2 (overcloud) [stack@dell-per740-66 ~]$ openstack volume create r9-qcow2-vol --size 20 --image r9-qcow2 (overcloud) [stack@dell-per740-66 ~]$ openstack volume list +--------------------------------------+--------------+-----------+------+--------- | ID | Name | Status | Size | Attached to | +--------------------------------------+--------------+-----------+------+--------- | bc0c5b4d-40ff-490d-a186-a7450b53c85e | r9-qcow2-vol | available | 20 | | +--------------------------------------+--------------+-----------+------+--------- (overcloud) [stack@dell-per740-66 ~]$ openstack server create --flavor m2 --volume r9-qcow2-vol --nic net-id=1dba36ee-d473-4354-a5ab-d6c7b6e0e666 vm-r9-vol (overcloud) [stack@dell-per740-66 ~]$ openstack server list +--------------------------------------+-----------+--------+----------+----------- | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+-----------+--------+----------+----------- | ff53ee99-69cc-4633-9434-fdf426a45929 | vm-r9-vol | ERROR | | N/A (booted from volume) | m2 | +--------------------------------------+-----------+--------+----------+---------- 4. Check the error in nova-conductor.log on controller node File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 7500, in _count_mediated_devices\n mediated_devices = self._get_mediated_devices(types=enabled_vgpu_types)\n', ' File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 7750, in _get_mediated_devices\n device = self._get_mediated_device_information(name)\n', ' File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 7731, in _get_mediated_device_information\n "uuid": libvirt_utils.mdev_name2uuid(cfgdev.name),\n', ' File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/utils.py", line 583, in mdev_name2uuid\n return str(uuid.UUID(mdev_name[5:].replace(\'_\', \'-\')))\n', ' File "/usr/lib64/python3.9/uuid.py", line 177, in __init__\n raise ValueError(\'badly formed hexadecimal UUID string\')\n', 'ValueError: badly formed hexadecimal UUID string\n' 5. Check the codes nova/virt/libvirt/utils.py: def mdev_name2uuid(mdev_name: str) -> str: """Convert an mdev name (of the form mdev_<uuid_with_underscores>) to a uuid (of the form 8-4-4-4-12). """ return str(uuid.UUID(mdev_name[5:].replace('_', '-'))) => We need to change this line to not include the pci address. More details: mdev_name <= driver.py: _get_mediated_device_information, _get_mediated_devices: dev_names = self._host.list_mediated_devices() or [] <= host.py: _list_devices("mdev", flags=flags), _list_devices self.get_connection().listDevices(cap, flags) [heat-admin@compute-0 0000:3d:00.0]$ sudo podman exec -it nova_virtqemud python3 Python 3.9.10 (main, Feb 9 2022, 00:00:00) [GCC 11.2.1 20220127 (Red Hat 11.2.1-9)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import libvirt >>> conn = libvirt.open('qemu:///system') >>> conn.listDevices('mdev') ['mdev_b81a2fb4_1bcf_45b0_b61e_efba7f35b161_0000_3d_00_0'] >>> import uuid >>> uuid.UUID("b81a2fb4-1bcf-45b0-b61e-efba7f35b161").version 4 >>> uuid.UUID("b81a2fb4-1bcf-45b0-b61e-efba7f35b161-0000-3d-00-0").version Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python3.9/uuid.py", line 177, in __init__ raise ValueError('badly formed hexadecimal UUID string') ValueError: badly formed hexadecimal UUID string Actual results: 1. Failed to create VM with vGPU, hit the error: nova/virt/libvirt/utils.py "ValueError: badly formed hexadecimal UUID string" Expected results: 2. Create VM with the vGPU device successfully Additional info: - nova-conductor.log
This is a known issue due to a new libvirtd release (7.7) that was changing the mdev names. Given we now ship this version with RHEL9 on OSP17.x that's why we're getting hit by the behavioural change without having seen it upstream before. The tracking BZ is https://bugzilla.redhat.com/show_bug.cgi?id=2109616 and we're planning to backport the upstream changes down to 17.1 as soon as they're merged upstream in the Antelope release, hopefully during the next weeks.
*** This bug has been marked as a duplicate of bug 2109616 ***