2109450 – libvirt doesn't catch mdevs created thru sysfs

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2109450 - libvirt doesn't catch mdevs created thru sysfs

Summary: libvirt doesn't catch mdevs created thru sysfs

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 9
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	9.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Jonathon Jongsma
QA Contact:	zhentang
Docs Contact:
URL:
Whiteboard:
Depends On:	2124466
Blocks:	1761861 2109616 2109621 2123586 2141364 2141365
TreeView+	depends on / blocked

Reported:	2022-07-21 09:37 UTC by Sylvain Bauza
Modified:	2023-05-09 08:09 UTC (History)
CC List:	19 users (show)
Fixed In Version:	libvirt-8.7.0-1.el9
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	2141364 2141365 (view as bug list)
Environment:
Last Closed:	2023-05-09 07:26:34 UTC
Type:	Bug
Target Upstream Version:	8.7.0
Embargoed:
Dependent Products:
Flags:	pm-rhel: mirror+

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHELPLAN-128596	0	None	None	None	2022-07-21 09:49:53 UTC
Red Hat Product Errata	RHBA-2023:2171	0	None	None	None	2023-05-09 07:27:25 UTC

Internal Links: 2123586

Description Sylvain Bauza 2022-07-21 09:37:20 UTC

Description of problem:

With OpenStack Nova, we directly create mdevs using sysfs like this is documented in :
https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#creating-legacy-vgpu-device-red-hat-el-kvm
(That being said, we don't use mdevctl for persisting mdev)

Previously, the mdev creation was noticed by libvirt and we were able to see the created mdev thru the libvirt API.

Now, with OSP17 that ships RHEL9.0 and libvirt-8.0.0, there is a regression as it seems udev is not catching up the mdev creation, leading to a Nova bug : https://bugzilla.redhat.com/show_bug.cgi?id=1761861#c14


Version-Release number of selected component (if applicable):
[root@computesriov-0 heat-admin]#  podman exec -it nova_virtqemud rpm -qa | grep libvirt
WARN[0000]  binary not found, container dns will not be enabled 
libvirt-libs-8.0.0-8.1.el9_0.x86_64
libvirt-daemon-8.0.0-8.1.el9_0.x86_64
libvirt-daemon-driver-secret-8.0.0-8.1.el9_0.x86_64
libvirt-daemon-driver-nwfilter-8.0.0-8.1.el9_0.x86_64
libvirt-daemon-driver-qemu-8.0.0-8.1.el9_0.x86_64
libvirt-daemon-driver-storage-core-8.0.0-8.1.el9_0.x86_64
libvirt-daemon-driver-nodedev-8.0.0-8.1.el9_0.x86_64
python3-libvirt-8.0.0-1.el9.x86_64
libvirt-client-8.0.0-8.1.el9_0.x86_64
libvirt-daemon-config-nwfilter-8.0.0-8.1.el9_0.x86_64


How reproducible:
Always

Steps to Reproduce:

1. Create a mdev thru sysfs :
[root@computesriov-0 heat-admin]# echo "0ffa7ecc-8ee2-4bd5-a6b1-fe47304e6740" > /sys/class/mdev_bus/0000\:04\:00.0/mdev_supported_types/nvidia-320/create 

2. Check that libvirt is seeing it :
[root@computesriov-0 heat-admin]# podman exec -it nova_virtqemud python3
WARN[0000]  binary not found, container dns will not be enabled 
Python 3.9.10 (main, Feb  9 2022, 00:00:00) 
[GCC 11.2.1 20220127 (Red Hat 11.2.1-9)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import libvirt
>>> conn = libvirt.open('qemu:///system')
>>> conn.listDevices('mdev')
['mdev_34ddb0d9_d28e_4043_9211_9a6b8feac4c3_0000_82_00_0', 'mdev_bb306139_6392_478b_89cf_bab02a71c985_0000_04_00_0', 'mdev_838942a4_be3e_4c22_a563_4383a321ac55_0000_82_00_0']
>>> 


3. Restart libvirtd to see whether it helps 
[root@computesriov-0 heat-admin]# systemctl restart tripleo_nova_virtnodedevd.service

4. Verify that after libvirtd restarted, mdev is appearing :
[root@computesriov-0 heat-admin]# podman exec -it nova_virtqemud python3
WARN[0000]  binary not found, container dns will not be enabled 
Python 3.9.10 (main, Feb  9 2022, 00:00:00) 
[GCC 11.2.1 20220127 (Red Hat 11.2.1-9)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import libvirt
>>> conn = libvirt.open('qemu:///system')
>>> conn.listDevices('mdev')
['mdev_34ddb0d9_d28e_4043_9211_9a6b8feac4c3_0000_82_00_0', 'mdev_838942a4_be3e_4c22_a563_4383a321ac55_0000_82_00_0', 'mdev_0ffa7ecc_8ee2_4bd5_a6b1_fe47304e6740_0000_04_00_0', 'mdev_bb306139_6392_478b_89cf_bab02a71c985_0000_04_00_0']

Actual results:

mdev is not present in the list of node devices that have a 'mdev' capability.

Expected results:
the mdev should be seen.

Additional info:
We just noticied that the libvirt API now provides a way to define a mdev and to generate it directly https://libvirt.org/drvnodedev.html#mediated-devices-mdevs
We'll work on modifying Nova to use this API instead of calling sysfs but this is an RFE that will be targeted for our next release.

Comment 2 Sylvain Bauza 2022-07-22 08:49:18 UTC

I tried using the libvirt createDevice API by providing a device XML in order to verify whether this regression was due to the use of sysfs.
Unfortunately, the behaviour remains the same : 
http://pastebin.test.redhat.com/1066905

With a minimal definition of a mdev :
<device>
  <parent>pci_0000_04_00_0</parent>
  <capability type='mdev'>
    <type id='nvidia-320'/>
    <uuid>728781db-ce0b-473e-aae0-e7ab2c5ece93</uuid>
  </capability>
</device>

then the virsh call fails but the mdev is created :

[root@computesriov-0 heat-admin]# podman exec -it nova_virtqemud virsh nodedev-create test.xml
WARN[0000]  binary not found, container dns will not be enabled
error: Failed to create node device from test.xml
error: An error occurred, but the cause is unknown

[root@computesriov-0 heat-admin]# ll /sys/bus/mdev/devices/
total 0
lrwxrwxrwx. 1 root root 0 Jul 22 08:25 728781db-ce0b-473e-aae0-e7ab2c5ece93 -> ../../../devices/pci0000:00/0000:00:02.0/0000:04:00.0/728781db-ce0b-473e-aae0-e7ab2c5ece93

That said, we don't see the mdev thru the libvirt API until we restart the nodedev daemon :

[root@computesriov-0 heat-admin]# podman exec -it nova_virtqemud virsh nodedev-list --cap mdev
WARN[0000]  binary not found, container dns will not be enabled
 

[root@computesriov-0 heat-admin]# systemctl restart tripleo_nova_virtnodedevd.service
[root@computesriov-0 heat-admin]# podman exec -it nova_virtqemud virsh nodedev-list --cap mdev
WARN[0000]  binary not found, container dns will not be enabled
mdev_728781db_ce0b_473e_aae0_e7ab2c5ece93_0000_04_00_0



Also, please note this issue is also impacting other upstream operators that *don't* use OSP17, so this isn't a deployment problem :
https://bugs.launchpad.net/nova/+bug/1981631

Comment 3 Sylvain Bauza 2022-07-22 08:51:08 UTC

I also noted that deleting a mdev thru sysfs is automatically seen by libvirt without requiring a nodedev recycling :

[root@computesriov-0 heat-admin]# echo 1 > /sys/bus/mdev/devices/728781db-ce0b-473e-aae0-e7ab2c5ece93/remove 
[root@computesriov-0 heat-admin]# podman exec -it nova_virtqemud virsh nodedev-list --cap mdev
WARN[0000]  binary not found, container dns will not be enabled 

[root@computesriov-0 heat-admin]# cat /sys/class/mdev_bus/0000\:04\:00.0/mdev_supported_types/nvidia-320/available_instances 
2

Comment 4 Sylvain Bauza 2022-07-22 08:57:24 UTC

Looks like mdevctl is correctly seeing the created mdev but not libvirt :

[root@computesriov-0 heat-admin]# podman exec -it nova_virtqemud virsh nodedev-create test.xml 
WARN[0000]  binary not found, container dns will not be enabled 
error: Failed to create node device from test.xml
error: An error occurred, but the cause is unknown

[root@computesriov-0 heat-admin]# cat /sys/class/mdev_bus/0000\:04\:00.0/mdev_supported_types/nvidia-320/available_instances 
1

[root@computesriov-0 heat-admin]# podman exec -it nova_virtqemud mdevctl list
WARN[0000]  binary not found, container dns will not be enabled 
728781db-ce0b-473e-aae0-e7ab2c5ece93 0000:04:00.0 nvidia-320 manual

[root@computesriov-0 heat-admin]# podman exec -it nova_virtqemud virsh nodedev-list --cap mdev
WARN[0000]  binary not found, container dns will not be enabled

Comment 5 Sylvain Bauza 2022-07-22 09:18:08 UTC

Last piece of information, I tried creating a mdev using the exact same XML from the previous removed mdev and it continues to fail (so basically, the issue isn't due to a lack of details in the XML) :

## GET THE PREVIOUS XML FROM A MDEV
[root@computesriov-0 heat-admin]# podman exec -it nova_virtqemud virsh nodedev-dumpxml mdev_728781db_ce0b_473e_aae0_e7ab2c5ece93_0000_04_00_0
WARN[0000]  binary not found, container dns will not be enabled 
<device>
  <name>mdev_728781db_ce0b_473e_aae0_e7ab2c5ece93_0000_04_00_0</name>
  <path>/sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/728781db-ce0b-473e-aae0-e7ab2c5ece93</path>
  <parent>pci_0000_04_00_0</parent>
  <driver>
    <name>vfio_mdev</name>
  </driver>
  <capability type='mdev'>
    <type id='nvidia-320'/>
    <uuid>728781db-ce0b-473e-aae0-e7ab2c5ece93</uuid>
    <iommuGroup number='109'/>
  </capability>
</device>


## DELETE THE OLD MDEV

[root@computesriov-0 heat-admin]# echo 1 > /sys/bus/mdev/devices/728781db-ce0b-473e-aae0-e7ab2c5ece93/remove 
[root@computesriov-0 heat-admin]# podman exec -it nova_virtqemud virsh nodedev-list --cap mdev
WARN[0000]  binary not found, container dns will not be enabled 

## UPDATE THE XML ##

[root@computesriov-0 heat-admin]# podman exec -it nova_virtqemud cat test.xml 
WARN[0000]  binary not found, container dns will not be enabled 
<device>
  <name>mdev_728781db_ce0b_473e_aae0_e7ab2c5ece93_0000_04_00_0</name>
  <path>/sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/728781db-ce0b-473e-aae0-e7ab2c5ece93</path>
  <parent>pci_0000_04_00_0</parent>
  <driver>
    <name>vfio_mdev</name>
  </driver>
  <capability type='mdev'>
    <type id='nvidia-320'/>
    <uuid>728781db-ce0b-473e-aae0-e7ab2c5ece93</uuid>
    <iommuGroup number='109'/>
  </capability>
</device>
[root@computesriov-0 heat-admin]# cat /sys/class/mdev_bus/0000\:04\:00.0/mdev_supported_types/nvidia-320/available_instances 
2
[root@computesriov-0 heat-admin]# ll /sys/bus/mdev/devices/
total 0
[root@computesriov-0 heat-admin]# podman exec -it nova_virtqemud virsh nodedev-create test.xml 
WARN[0000]  binary not found, container dns will not be enabled 
error: Failed to create node device from test.xml
error: An error occurred, but the cause is unknown

[root@computesriov-0 heat-admin]# podman exec -it nova_virtqemud virsh nodedev-list --cap mdev
WARN[0000]  binary not found, container dns will not be enabled

Comment 13 smooney 2022-08-08 21:05:48 UTC

I did some manual testing today.

running udevadm monitor on the host I can see the udev events when I add or remove a mediated device

KERNEL[259776.790864] add      /devices/pci0000:00/0000:00:02.0/0000:04:00.0/710766c7-994c-40d1-be44-1a670bdfece2 (mdev)
UDEV  [259776.793732] add      /devices/pci0000:00/0000:00:02.0/0000:04:00.0/710766c7-994c-40d1-be44-1a670bdfece2 (mdev)
KERNEL[259776.998323] add      /devices/virtual/vfio/109 (vfio)
KERNEL[259776.998346] bind     /devices/pci0000:00/0000:00:02.0/0000:04:00.0/710766c7-994c-40d1-be44-1a670bdfece2 (mdev)
UDEV  [259776.999648] bind     /devices/pci0000:00/0000:00:02.0/0000:04:00.0/710766c7-994c-40d1-be44-1a670bdfece2 (mdev)
UDEV  [259777.004647] add      /devices/virtual/vfio/109 (vfio)
KERNEL[259908.099801] remove   /devices/virtual/vfio/109 (vfio)
UDEV  [259908.102185] remove   /devices/virtual/vfio/109 (vfio)
KERNEL[259908.312512] unbind   /devices/pci0000:00/0000:00:02.0/0000:04:00.0/710766c7-994c-40d1-be44-1a670bdfece2 (mdev)
KERNEL[259908.312533] remove   /devices/pci0000:00/0000:00:02.0/0000:04:00.0/710766c7-994c-40d1-be44-1a670bdfece2 (mdev)
UDEV  [259908.313050] unbind   /devices/pci0000:00/0000:00:02.0/0000:04:00.0/710766c7-994c-40d1-be44-1a670bdfece2 (mdev)
UDEV  [259908.313149] remove   /devices/pci0000:00/0000:00:02.0/0000:04:00.0/710766c7-994c-40d1-be44-1a670bdfece2 (mdev)
KERNEL[260459.337234] add      /devices/pci0000:00/0000:00:02.0/0000:04:00.0/710766c7-994c-40d1-be44-1a670bdfece2 (mdev)
UDEV  [260459.340162] add      /devices/pci0000:00/0000:00:02.0/0000:04:00.0/710766c7-994c-40d1-be44-1a670bdfece2 (mdev)
KERNEL[260459.590260] add      /devices/virtual/vfio/109 (vfio)
KERNEL[260459.590289] bind     /devices/pci0000:00/0000:00:02.0/0000:04:00.0/710766c7-994c-40d1-be44-1a670bdfece2 (mdev)
UDEV  [260459.591622] bind     /devices/pci0000:00/0000:00:02.0/0000:04:00.0/710766c7-994c-40d1-be44-1a670bdfece2 (mdev)
UDEV  [260459.596676] add      /devices/virtual/vfio/109 (vfio)
KERNEL[272070.995805] remove   /devices/virtual/vfio/109 (vfio)
UDEV  [272070.998217] remove   /devices/virtual/vfio/109 (vfio)
KERNEL[272071.207298] unbind   /devices/pci0000:00/0000:00:02.0/0000:04:00.0/710766c7-994c-40d1-be44-1a670bdfece2 (mdev)
KERNEL[272071.207323] remove   /devices/pci0000:00/0000:00:02.0/0000:04:00.0/710766c7-994c-40d1-be44-1a670bdfece2 (mdev)
UDEV  [272071.207824] unbind   /devices/pci0000:00/0000:00:02.0/0000:04:00.0/710766c7-994c-40d1-be44-1a670bdfece2 (mdev)
UDEV  [272071.207922] remove   /devices/pci0000:00/0000:00:02.0/0000:04:00.0/710766c7-994c-40d1-be44-1a670bdfece2 (mdev)



udevadm cannot be run in the container any more tanks to https://github.com/systemd/systemd/pull/11346/commits/b05a4c950a928b3407f983f4d9a8a2ff8cbc34f0
so I cant actually run it in the Podman container 
[root@computesriov-1 /]# udevadm monitor
Running in chroot, ignoring request.


if I newly restart the nodedevd container and list the nodedevs as root then allocate an mdev and list again it will not be in the list
returned by libvirt but if I then list its as nova it generally will be once.

so by swapping from the root session to the nova user session the per session nature of the cache allows the second request to work without restarting the contianer

[heat-admin@computesriov-1 ~]$ sudo podman exec -it  nova_virtnodedevd  virsh nodedev-list | grep mdev
WARN[0000]  binary not found, container dns will not be enabled 
[heat-admin@computesriov-1 ~]$ sudo podman exec -it  -u nova nova_virtnodedevd  virsh nodedev-list | grep mdev
WARN[0000]  binary not found, container dns will not be enabled 
mdev_710766c7_994c_40d1_be44_1a670bdfece2_0000_04_00_0
[heat-admin@computesriov-1 ~]$ sudo podman exec -it  nova_virtnodedevd  virsh nodedev-list | grep mdev
WARN[0000]  binary not found, container dns will not be enabled 

that confirms that libvirt could see the device if it was not for the cacheing.

in the contianer the device is also clearly visible in in /sys reinforcing the idea that this is in fact a issue with the not udev events in so form

oddly deleting the mdev seems to propagate to the vm.

[heat-admin@computesriov-1 ~]$ sudo systemctl restart tripleo_nova_virtnodedevd
[heat-admin@computesriov-1 ~]$ sudo podman exec -it  nova_virtnodedevd  virsh nodedev-list | grep mdev
WARN[0000]  binary not found, container dns will not be enabled 
mdev_710766c7_994c_40d1_be44_1a670bdfece2_0000_04_00_0
[heat-admin@computesriov-1 ~]$ echo 1 | sudo tee /sys/bus/mdev/devices/*/remove 
1
[heat-admin@computesriov-1 ~]$ sudo podman exec -it  nova_virtnodedevd  virsh nodedev-list | grep mdev
WARN[0000]  binary not found, container dns will not be enabled 
[heat-admin@computesriov-1 ~]$ 


we run the virtnodedevd container with /dev and /run mounted form the host as well as --net=host --pid=host and --privileged 

so srw-------. 1 root root 0 Aug  5 16:46 /run/udev/control. the socket is available in the container and libvirt should have all permissions to interact with it.

since the remove works but the add does not that implies that at least some of the udev messages are being picked up.

Comment 14 smooney 2022-08-08 21:58:17 UTC

this is interesting

2022-08-08 17:06:07.560+0000: 12351: debug : virNetlinkEventCallback:889 : event not handled.
2022-08-08 17:06:07.668+0000: 12415: error : udevProcessMediatedDevice:1038 : failed to wait for file '/sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/710766c7-994c-40d1-be44-1a670bdfece2/mdev_type' to appear: No such file or director

that does in fact exists and can be seen form inside the nodedevd container
[root@computesriov-1 libvirt]# ls /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/710766c7-994c-40d1-be44-1a670bdfece2/mdev_type -al
lrwxrwxrwx. 1 root root 0 Aug  8 21:48 /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/710766c7-994c-40d1-be44-1a670bdfece2/mdev_type -> ../mdev_supported_types/nvidia-222

root      132387  0.0  0.0 1509456 22660 ?       Sl   20:59   0:00 /usr/sbin/virtnodedevd --config /etc/libvirt/virtnodedevd.conf

virtnodedevd is running as root inside the container just in case you were wondering if there was a permissions issue.

2022-08-08 21:00:23.756+0000: 132387: debug : virConnectClose:1316 : conn=0x7f1d78013690
2022-08-08 21:48:21.631+0000: 132387: debug : virNetlinkEventCallback:875 : dispatching to max 0 clients, called from event watch 6
2022-08-08 21:48:21.631+0000: 132387: debug : virNetlinkEventCallback:889 : event not handled.
2022-08-08 21:48:21.739+0000: 132440: error : udevProcessMediatedDevice:1038 : failed to wait for file '/sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/710766c7-994c-40d1-be44-1a670bdfece2/mdev_type' to appear: No such file or directory
2022-08-08 21:48:21.996+0000: 132387: debug : virNetlinkEventCallback:875 : dispatching to max 0 clients, called from event watch 6
2022-08-08 21:48:21.996+0000: 132387: debug : virNetlinkEventCallback:889 : event not handled.
2022-08-08 21:48:21.996+0000: 132387: debug : virNetlinkEventCallback:875 : dispatching to max 0 clients, called from event watch 6
2022-08-08 21:48:21.996+0000: 132387: debug : virNetlinkEventCallback:889 : event not handled.


im not seing ny selinux denieals or similare that would indicate why libvirt cant red this but it almost looks like a race.

Comment 15 smooney 2022-08-08 21:59:21 UTC

https://github.com/libvirt/libvirt/blob/3d5245e3ebd1e143ea858c8535474b681bc21a38/src/node_device/node_device_udev.c#L1039-L1044
this is where its currently failing.

Comment 16 smooney 2022-08-08 22:09:43 UTC

 /* Because of a kernel uevent race, we might get the 'add' event prior to
     * the sysfs tree being ready, so any attempt to access any sysfs attribute
     * would result in ENOENT and us dropping the device, so let's work around
     * it by waiting for the attributes to become available.
     */

that is the doc string for the function 

https://github.com/libvirt/libvirt/blob/3d5245e3ebd1e143ea858c8535474b681bc21a38/src/node_device/node_device_udev.c#L1031-L1035

so ya I would say there is a high probablity that we are losing that race and 100 ms is not enough time

KERNEL[260459.337234] add      /devices/pci0000:00/0000:00:02.0/0000:04:00.0/710766c7-994c-40d1-be44-1a670bdfece2 (mdev)
UDEV  [260459.340162] add      /devices/pci0000:00/0000:00:02.0/0000:04:00.0/710766c7-994c-40d1-be44-1a670bdfece2 (mdev)
KERNEL[260459.590260] add      /devices/virtual/vfio/109 (vfio)
KERNEL[260459.590289] bind     /devices/pci0000:00/0000:00:02.0/0000:04:00.0/710766c7-994c-40d1-be44-1a670bdfece2 (mdev)
UDEV  [260459.591622] bind     /devices/pci0000:00/0000:00:02.0/0000:04:00.0/710766c7-994c-40d1-be44-1a670bdfece2 (mdev)
UDEV  [260459.596676] add      /devices/virtual/vfio/109 (vfio)

the full add time 260ms and about 230ms for it to be would.

it looks like libvirt is being optimistic 

virFileWaitForExists takes time in milliseonds and tries as the args

https://github.com/libvirt/libvirt/commit/caf26412b691bf6a7cb34b9db837b92a4e6eb689

so 
if (virFileWaitForExists(linkpath, 1, 100) < 0) {
        virReportSystemError(errno,
                             _("failed to wait for file '%s' to appear"),
                             linkpath);
        return -1;
    }

is waiting at most 100ms and it took much longer for the device init to complete.

Comment 17 Jonathon Jongsma 2022-08-09 14:49:31 UTC

So in this specific case, the dealy is around 250ms. But I wonder what sort of variation we can expect? I don't really want to propose changing it to another semi-arbitary value only to find out that this value is also not sufficient...

Comment 18 Jonathon Jongsma 2022-08-15 16:23:58 UTC

Sean, any ideas about my last question? It would be nice to get an idea about what sort of ranges you're seeing for the delay time so that I can pick an appropriate value that will work.

Comment 19 smooney 2022-08-17 11:38:29 UTC

i honestly dont know.

the short hack would be to make the sleep time 10 ms instead of one allowing this to take up to 1 second instead of 100ms.
the better fix i think would be to trigger off the bind udev event instead of the add or in addition to the add event.
that i think would eliminate the race but I'm not 100% sure about that.

i don't have access to this env permanently its one of our qe test systems so
just looking at the time from comment 13
we have 208ms 259ms and 260 for comment 16
so it seams to be around a quarter of a second.

so changing the sleep time to 10ms and keeping 100 retries to extend the total time to 1 second likely should be enough but
also trigiging on the bind udev event likely would be more robust.

those are not mutually exclusive either so you could do both to harden this more.
i don't think there should be any issue with updating the list twice if we process both the add and bind udev event successfully
you just need to ensure the bind even checks that the device is not already present.

Comment 20 John Ferlan 2022-08-17 14:55:04 UTC

NB: Changing the ITR to 9.2.0 as we are getting late in the process for inclusion into 9.1.0 of an issue that while is easily fixed/resolved, but perhaps needs some more design discussion (e.g. hard-coded 100 or 200 or 500 timeout value vs. customize-able value)...  All for a problem from another subsystem (udev) which doesn't guarantee instantaneous and/or synchronous creation of the device. In the long run "where" the fix should be could be up for discussion. Should Nova be the place where the timeout exists since it's the place that initiated the creation?  Leaving the ZTR at 9.0.0 since that's where the z-stream would need to be eventually.

We can move it back to 9.1.0, but we're at the point of needing an exception... Since RHOS won't be consuming 9.1.0, it's unlikely to be granted.

Comment 28 Jonathon Jongsma 2022-11-02 19:05:06 UTC

The fix for this has been upstream since libvirt 8.7.0 (commit e4f9682ebc442bb5dfee807ba618c8863355776d). RHEL 9.2.0 currently ships libvirt 8.9.0, so this bug should already be fixed in 9.2.0 by bug #2124466.

Comment 30 Jaroslav Suchanek 2022-11-03 11:23:49 UTC

Already in libvirt-8.7.0-1.el9.

Comment 37 chhu 2022-11-10 12:54:32 UTC

Reproduced on OSP17.0 with libvirt packages:
$ sudo podman exec -it nova_virtqemud rpm -qa | grep libvirt| grep driver
libvirt-daemon-driver-secret-8.0.0-8.1.el9_0.x86_64
libvirt-daemon-driver-nwfilter-8.0.0-8.1.el9_0.x86_64
libvirt-daemon-driver-storage-core-8.0.0-8.1.el9_0.x86_64
libvirt-daemon-driver-nodedev-8.0.0-8.1.el9_0.x86_64
libvirt-daemon-driver-qemu-8.0.0-8.1.el9_0.x86_64

Reproduced steps:
1. Prepare the vGPU environment on OSP17.0
[heat-admin@compute-0 ~]$ cat /sys/class/mdev_bus/0000\:3d\:00.0/mdev_supported_types/nvidia-22/name 
GRID M60-8Q
[heat-admin@compute-0 ~]$ uuid=$(uuidgen)
[heat-admin@compute-0 ~]$ cd /sys/class/mdev_bus/0000:3d:00.0/mdev_supported_types/nvidia-22
[heat-admin@compute-0 nvidia-22]$ sudo chmod 666 create
[heat-admin@compute-0 nvidia-22]$ sudo echo $uuid
5b5ed2cb-cd00-4afe-b975-1fe467a1757f
[heat-admin@compute-0 nvidia-22]$ sudo echo $uuid > create
[heat-admin@compute-0 nvidia-22]$ cd ../../
[heat-admin@compute-0 0000:3d:00.0]$ ls
5b5ed2cb-cd00-4afe-b975-1fe467a1757f  config  driver_override  local_cpulist ......

2. Check in nova_virtqemud, mdev is not present in the list of node devices
[heat-admin@compute-0 ~]$ sudo podman exec -it nova_virtqemud python3
Python 3.9.10 (main, Feb  9 2022, 00:00:00) 
[GCC 11.2.1 20220127 (Red Hat 11.2.1-9)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import libvirt
>>> conn = libvirt.open('qemu:///system')
>>> conn.listDevices('mdev')
[]

3. Restart the nova_virtnodedevd
[heat-admin@compute-0 0000:3d:00.0]$ sudo systemctl restart tripleo_nova_virtnodedevd.service
[heat-admin@compute-0 ~]$ sudo podman ps|grep nova_virt
......
967b3774fcfd  dell-per740-66.ctlplane.localdomain:8787/rh-osbs/rhosp17-openstack-nova-libvirt:17.0_20220908.1                kolla_start           10 days ago     Up 6 seconds ago                         nova_virtnodedevd
4a07bd7374b2  dell-per740-66.ctlplane.localdomain:8787/rh-osbs/rhosp17-openstack-nova-libvirt:17.0_20220908.1                kolla_start           10 days ago     Up 40 minutes ago                        nova_virtstoraged
a5dc3c6fa7c2  dell-per740-66.ctlplane.localdomain:8787/rh-osbs/rhosp17-openstack-nova-libvirt:17.0_20220908.1                kolla_start           10 days ago     Up 40 minutes ago                        nova_virtqemud

4. Check in nova_virtqemud, mdev is present in the list of node devices
[heat-admin@compute-0 ~]$ sudo podman exec -it nova_virtqemud python3
Python 3.9.10 (main, Feb  9 2022, 00:00:00) 
[GCC 11.2.1 20220127 (Red Hat 11.2.1-9)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import libvirt
>>> conn = libvirt.open('qemu:///system')
>>> conn.listDevices('mdev')
['mdev_5b5ed2cb_cd00_4afe_b975_1fe467a1757f_0000_3d_00_0']
>>>

Comment 38 Jonathon Jongsma 2022-11-10 16:26:02 UTC

(In reply to chhu from comment #37)
> Reproduced on OSP17.0 with libvirt packages:
> $ sudo podman exec -it nova_virtqemud rpm -qa | grep libvirt| grep driver
> libvirt-daemon-driver-secret-8.0.0-8.1.el9_0.x86_64
> libvirt-daemon-driver-nwfilter-8.0.0-8.1.el9_0.x86_64
> libvirt-daemon-driver-storage-core-8.0.0-8.1.el9_0.x86_64
> libvirt-daemon-driver-nodedev-8.0.0-8.1.el9_0.x86_64
> libvirt-daemon-driver-qemu-8.0.0-8.1.el9_0.x86_64


This problem is not expected to be fixed yet in libvirt 8.0.0-8.1.el9_0. This bug is about the libvirt version in rhel 9.2. The problem should be fixed in libvirt-8.7.0-1.el9 (rhel-9.2) and is in the process of being backported to older releases. See cloned bugs mentioned above for those releases.

Comment 39 chhu 2022-11-15 05:42:15 UTC

Tested on on OSP17.0 with libvirt packages:
[heat-admin@compute-0 ~]$ sudo podman exec -it nova_virtqemud rpm -qa | grep libvirt| grep driver
libvirt-daemon-driver-nwfilter-8.9.0-2.el9.x86_64
libvirt-daemon-driver-qemu-8.9.0-2.el9.x86_64
libvirt-daemon-driver-storage-core-8.9.0-2.el9.x86_64
libvirt-daemon-driver-nodedev-8.9.0-2.el9.x86_64
libvirt-daemon-driver-secret-8.9.0-2.el9.x86_64

Test steps:
1. Prepare the vGPU environment on OSP17.0
(undercloud) [stack@dell-per740-66 ~]$ ssh heat-admin.24.10
[heat-admin@compute-0 ~]$ lspci|grep VGA
03:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. Integrated Matrox G200eW3 Graphics Controller (rev 04)
3d:00.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1)
3e:00.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1)
[heat-admin@compute-0 ~]$ cat /sys/class/mdev_bus/0000\:3d\:00.0/mdev_supported_types/nvidia-22/name 
GRID M60-8Q
[heat-admin@compute-0 ~]$ uuid=$(uuidgen)
[heat-admin@compute-0 ~]$ cd /sys/class/mdev_bus/0000:3d:00.0/mdev_supported_types/nvidia-22
[heat-admin@compute-0 nvidia-22]$ sudo chmod 666 create
[heat-admin@compute-0 nvidia-22]$ sudo echo $uuid
b81a2fb4-1bcf-45b0-b61e-efba7f35b161
[heat-admin@compute-0 nvidia-22]$ sudo echo $uuid > create
[heat-admin@compute-0 nvidia-22]$ cd ../../
[heat-admin@compute-0 0000:3d:00.0]$ ls
b81a2fb4-1bcf-45b0-b61e-efba7f35b161  d3cold_allowed            iommu            mdev_supported_types  rescan        resource3_wc

2. Check in nova_virtqemud, mdev is present in the list of node devices
[heat-admin@compute-0 ~]$ sudo podman exec -it nova_virtqemud python3
Python 3.9.10 (main, Feb  9 2022, 00:00:00) 
[GCC 11.2.1 20220127 (Red Hat 11.2.1-9)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import libvirt
>>> conn = libvirt.open('qemu:///system')
>>> conn.listDevices('mdev')
['mdev_b81a2fb4_1bcf_45b0_b61e_efba7f35b161_0000_3d_00_0']

Comment 40 chhu 2022-11-15 08:51:35 UTC

Failed to start a VM with vGPU in OSP17.0, filed OpenStack Bug2142768,
but it'll not block this bug's verification.
Bug 2142768 - Failed to create VM with vGPU - Hit error "badly formed hexadecimal UUID string"

Comment 41 chhu 2022-11-18 03:26:24 UTC

Add more test results here:
1. Create and delete the mdev device by using uuid,
   check the available_instances and `virsh nodedev-list` outputs are correct
[heat-admin@compute-0 ~]$ cat /sys/class/mdev_bus/0000\:3d\:00.0/mdev_supported_types/nvidia-22/available_instances
1
[heat-admin@compute-0 ~]$ uuid=$(uuidgen)
[heat-admin@compute-0 ~]$ sudo echo $uuid > /sys/class/mdev_bus/0000:3d:00.0/mdev_supported_types/nvidia-22/create
[heat-admin@compute-0 ~]$ sudo ls /sys/class/mdev_bus/0000:3d:00.0| grep $uuid
639913d4-247e-41e2-ac46-2e1eb4b32730
[heat-admin@compute-0 ~]$ cat /sys/class/mdev_bus/0000\:3d\:00.0/mdev_supported_types/nvidia-22/available_instances
0
[heat-admin@compute-0 ~]$ sudo podman exec -it nova_virtqemud virsh nodedev-list --cap mdev
mdev_639913d4_247e_41e2_ac46_2e1eb4b32730_0000_3d_00_0
[heat-admin@compute-0 ~]$ sudo chmod 666 /sys/bus/mdev/devices/639913d4-247e-41e2-ac46-2e1eb4b32730/remove
[heat-admin@compute-0 ~]$ sudo echo 1 > /sys/bus/mdev/devices/639913d4-247e-41e2-ac46-2e1eb4b32730/remove
[heat-admin@compute-0 ~]$ sudo ls /sys/class/mdev_bus/0000:3d:00.0| grep $uuid
No output
[heat-admin@compute-0 ~]$ cat /sys/class/mdev_bus/0000\:3d\:00.0/mdev_supported_types/nvidia-22/available_instances
1
[heat-admin@compute-0 ~]$ sudo podman exec -it nova_virtqemud virsh nodedev-list --cap mdev
No output

2. Create and delete the mdev device by virsh commands,
   check the `virsh nodedev-create,destroy,list,dumpxml` outputs are correct
[heat-admin@compute-0 ~]$ sudo podman exec -it nova_virtqemud cat mdev.xml
<device>
  <parent>pci_0000_3d_00_0</parent>
  <capability type='mdev'>
    <type id='nvidia-22'/>
    <uuid>d7277b0f-ef00-4fc9-bcc3-300b0b33a638</uuid>
  </capability>
</device>
[heat-admin@compute-0 ~]$ sudo podman exec -it nova_virtqemud virsh nodedev-create mdev.xml
Node device mdev_d7277b0f_ef00_4fc9_bcc3_300b0b33a638_0000_3d_00_0 created from mdev.xml

[heat-admin@compute-0 ~]$ cat /sys/class/mdev_bus/0000\:3d\:00.0/mdev_supported_types/nvidia-22/available_instances
0
[heat-admin@compute-0 ~]$ sudo podman exec -it nova_virtqemud virsh nodedev-list --cap mdev
mdev_d7277b0f_ef00_4fc9_bcc3_300b0b33a638_0000_3d_00_0

[heat-admin@compute-0 ~]$ sudo podman exec -it nova_virtqemud virsh nodedev-dumpxml mdev_d7277b0f_ef00_4fc9_bcc3_300b0b33a638_0000_3d_00_0
<device>
  <name>mdev_d7277b0f_ef00_4fc9_bcc3_300b0b33a638_0000_3d_00_0</name>
  <path>/sys/devices/pci0000:3a/0000:3a:00.0/0000:3b:00.0/0000:3c:08.0/0000:3d:00.0/d7277b0f-ef00-4fc9-bcc3-300b0b33a638</path>
  <parent>pci_0000_3d_00_0</parent>
  <driver>
    <name>nvidia-vgpu-vfio</name>
  </driver>
  <capability type='mdev'>
    <type id='nvidia-22'/>
    <uuid>d7277b0f-ef00-4fc9-bcc3-300b0b33a638</uuid>
    <parent_addr>0000:3d:00.0</parent_addr>
    <iommuGroup number='138'/>
  </capability>
</device>
[heat-admin@compute-0 ~]$ sudo podman exec -it nova_virtqemud virsh nodedev-destroy mdev_d7277b0f_ef00_4fc9_bcc3_300b0b33a638_0000_3d_00_0
Destroyed node device 'mdev_d7277b0f_ef00_4fc9_bcc3_300b0b33a638_0000_3d_00_0'
[heat-admin@compute-0 ~]$ sudo podman exec -it nova_virtqemud virsh nodedev-list --cap mdev
No output
[heat-admin@compute-0 ~]$ cat /sys/class/mdev_bus/0000\:3d\:00.0/mdev_supported_types/nvidia-22/available_instances
1

Comment 43 errata-xmlrpc 2023-05-09 07:26:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (libvirt bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2171

Note You need to log in before you can comment on or make changes to this bug.