Description of problem: Hotplug hostdev (type=vf or scsi) / interface (hostdev) failed after libvirtd restart in env with cgroup2 enabled error: failed to count cgroup BPF map items: No such file or directory Version qemu-kvm-4.2.0-20.module+el8.2.1+6467+49dc3278.x86_64 libvirt-daemon-6.0.0-18.module+el8.2.1+6456+a6d62e4e.x86_64 How reproducible: 100% steps: In cgroup2 enabled host - # mount | grep cgroup cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel,nsdelegate) scenario 1: scsi passthrough scenario: 1. Start a domain # virsh start avocado-vt-vm1 Domain avocado-vt-vm1 started 2. # systemctl restart libvirtd 3. # cat scsi.xml <hostdev mode='subsystem' type='scsi'> <source> <adapter name='scsi_host15'/> <address bus='0' target='0' unit='0'/> </source> </hostdev> 4. # virsh attach-device avocado-vt-vm1 scsi.xml error: Failed to attach device from scsi.xml error: failed to count cgroup BPF map items: No such file or directory scenario 2: In a host with SRIOV hba - # lspci -v |grep -i ethernet 05:00.0 Ethernet controller: Intel Corporation Ethernet 10G 2P X520 Adapter (rev 01) 05:00.1 Ethernet controller: Intel Corporation Ethernet 10G 2P X520 Adapter (rev 01) 05:10.1 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:10.3 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:10.5 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:10.7 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:11.1 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:11.3 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:11.5 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:11.7 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:12.1 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:12.3 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:12.5 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:12.7 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:13.1 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:13.3 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:13.5 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:13.7 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:14.1 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:14.3 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:14.5 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:14.7 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:15.1 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:15.3 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:15.5 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:15.7 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:16.1 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:16.3 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:16.5 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:16.7 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:17.1 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:17.3 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:17.5 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:17.7 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 1. Start a domain # virsh start avocado-vt-vm1 Domain avocado-vt-vm1 started 2. Prepare a xml file - <hostdev managed="yes" mode="subsystem" type="pci"><source><address bus="0x05" domain="0x0000" function="0x7" slot="0x17" /></source></hostdev> # virsh attach-device avocado-vt-vm1 /var/tmp/xml_utils_temp_81ju7h16.xml Device attached successfully # virsh detach-device avocado-vt-vm1 /var/tmp/xml_utils_temp_81ju7h16.xml Device detached successfully 3.# systemctl restart libvirtd # virsh attach-device avocado-vt-vm1 /var/tmp/xml_utils_temp_81ju7h16.xml error: Failed to attach device from /var/tmp/xml_utils_temp_81ju7h16.xml error: failed to count cgroup BPF map items: No such file or directory Expected results: With libvirtd restart, the attach-device command still can succeed. Actual results: The attach-device command failed. Additional info: For the interface with "type=hostdev", the same issue is encountered. # cat inter-hostdev-vfio.xml <interface managed="yes" type="hostdev"> <mac address="9a:7a:6d:dd:b0:b9"/> <source><address bus="0x05" domain="0x0000" function="0x1" slot="0x10" type="pci" /> </source> <alias name="ua-EYJPB12xqYVkkVV-9ga4YRWNe5WRO7wpDmQRm5z6iR61TTNeZ_hZOPMI_LVuy2w5" /> </interface>
Part of the libvirt.log - 2020-05-08 12:38:03.034+0000: 553687: info : virObjectRef:386 : OBJECT_REF: obj=0x7fe6741bf420 2020-05-08 12:38:03.034+0000: 553687: info : virObjectUnref:348 : OBJECT_UNREF: obj=0x7fe660003040 2020-05-08 12:38:03.034+0000: 553687: info : virEventPollUpdateHandle:147 : EVENT_POLL_UPDATE_HANDLE: watch=11 events=13 2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollInterruptLocked:723 : Skip interrupt, 1 140629459041280 2020-05-08 12:38:03.034+0000: 553687: info : virObjectUnref:348 : OBJECT_UNREF: obj=0x7fe660003040 2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollDispatchHandles:487 : i=11 w=12 2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollCleanupTimeouts:520 : Cleanup 2 2020-05-08 12:38:03.034+0000: 553687: info : virEventPollCleanupTimeouts:533 : EVENT_POLL_PURGE_TIMEOUT: timer=4 2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollCleanupHandles:569 : Cleanup 12 2020-05-08 12:38:03.034+0000: 553687: debug : virEventRunDefaultImpl:350 : running default event implementation 2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollCleanupTimeouts:520 : Cleanup 1 2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollCleanupHandles:569 : Cleanup 12 2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollMakePollFDs:396 : Prepare n=0 w=1, f=8 e=1 d=0 2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollMakePollFDs:396 : Prepare n=1 w=2, f=10 e=1 d=0 2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollMakePollFDs:396 : Prepare n=2 w=3, f=5 e=1 d=0 2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollMakePollFDs:396 : Prepare n=3 w=4, f=3 e=1 d=0 2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollMakePollFDs:396 : Prepare n=4 w=5, f=4 e=1 d=0 2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollMakePollFDs:396 : Prepare n=5 w=6, f=13 e=1 d=0 2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollMakePollFDs:396 : Prepare n=6 w=7, f=14 e=1 d=0 2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollMakePollFDs:396 : Prepare n=7 w=8, f=18 e=0 d=0 2020-05-08 12:38:03.034+0000: 553786: debug : virThreadJobSetWorker:75 : Thread 553786 is running worker qemuProcessEventHandler 2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollMakePollFDs:396 : Prepare n=8 w=9, f=18 e=1 d=0 2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollMakePollFDs:396 : Prepare n=9 w=10, f=28 e=1 d=0 2020-05-08 12:38:03.034+0000: 553786: debug : qemuProcessEventHandler:4866 : vm=0x7fe6741bf420, event=2 2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollMakePollFDs:396 : Prepare n=10 w=11, f=30 e=25 d=0 2020-05-08 12:38:03.034+0000: 553786: info : virObjectRef:386 : OBJECT_REF: obj=0x7fe674145560 2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollMakePollFDs:396 : Prepare n=11 w=12, f=22 e=25 d=0 2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollCalculateTimeout:333 : Calculate expiry of 1 timers 2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollCalculateTimeout:341 : Got a timeout scheduled for 1588941601559 2020-05-08 12:38:03.034+0000: 553786: debug : processDeviceDeletedEvent:4283 : Removing device hostdev0 from domain 0x7fe6741bf420 avocado-vt-vm1 2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollCalculateTimeout:354 : Schedule timeout then=1588941601559 now=1588941483034 2020-05-08 12:38:03.034+0000: 553786: info : virObjectRef:386 : OBJECT_REF: obj=0x7fe674145560 2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollCalculateTimeout:364 : Timeout at 1588941601559 due in 118525 ms 2020-05-08 12:38:03.034+0000: 553786: debug : qemuDomainObjBeginJobInternal:9798 : Starting job: job=modify agentJob=none asyncJob=none (vm=0x7fe6741bf420 name=avocado-vt-vm1, current job=none agentJob=none async=none) 2020-05-08 12:38:03.034+0000: 553687: info : virEventPollRunOnce:635 : EVENT_POLL_RUN: nhandles=11 timeout=118525 2020-05-08 12:38:03.034+0000: 553786: debug : qemuDomainObjBeginJobInternal:9847 : Started job: modify (async=none vm=0x7fe6741bf420 name=avocado-vt-vm1) 2020-05-08 12:38:03.034+0000: 553786: info : virObjectUnref:348 : OBJECT_UNREF: obj=0x7fe674145560 2020-05-08 12:38:03.034+0000: 553786: debug : qemuDomainRemoveHostDevice:4426 : Removing host device hostdev0 from domain 0x7fe6741bf420 avocado-vt-vm1 2020-05-08 12:38:03.035+0000: 553786: debug : virFileClose:110 : Closed fd 34 2020-05-08 12:38:03.035+0000: 553786: debug : virFileClose:110 : Closed fd 34 2020-05-08 12:38:03.035+0000: 553786: debug : virPCIDeviceNew:1418 : 8086 10ed 0000:05:10.1: initialized 2020-05-08 12:38:03.035+0000: 553786: debug : virPCIDeviceFree:1449 : 8086 10ed 0000:05:10.1: freeing 2020-05-08 12:38:03.035+0000: 553786: debug : qemuTeardownHostdevCgroup:479 : Cgroup deny /dev/vfio/58 2020-05-08 12:38:03.035+0000: 553786: error : virCgroupV2DevicesDetectProg:423 : failed to count cgroup BPF map items: No such file or directory 2020-05-08 12:38:03.035+0000: 553786: debug : virFileClose:110 : Closed fd 34 2020-05-08 12:38:03.035+0000: 553786: warning : qemuDomainRemoveHostDevice:4480 : Failed to remove host device cgroup ACL 2020-05-08 12:38:03.035+0000: 553786: debug : virFileClose:110 : Closed fd 34 2020-05-08 12:38:03.035+0000: 553786: debug : virFileClose:110 : Closed fd 34
Pavel, is there something libvirt should fix or is it likely a kernel problem? Thanks.
I'll have to investigate if it's a libvirt of kernel issue.
So the issue is in libvirt in the code that loads BPF map of running QEMU process after the daemon was restarted. I'll post a patch to upstream and back-port it to downstream.
Upstream patch posted: https://www.redhat.com/archives/libvir-list/2020-August/msg00429.html
Upstream commit: commit 7e574d1a079bd13aeeedb7024cc45f85b1843fcc Author: Pavel Hrdina <phrdina> Date: Tue Aug 11 11:07:06 2020 +0200 vircgroupv2devices: fix counting entries in BPF map
Test with: libvirt-6.6.0-4.module+el8.3.0+7883+3d717aa8.x86_64 Result: PASS 1. prepare a scsi device on host [root@dell-per740xd-11 ~]# lsscsi [0:2:0:0] disk DELL PERC H730P Adp 4.30 /dev/sda [17:0:0:0] disk LIO-ORG device.logical- 4.0 /dev/sdb 2. prepare a device xml to be attached [root@dell-per740xd-11 ~]# cat scsi.xml <hostdev mode='subsystem' type='scsi'> <source> <adapter name='scsi_host17'/> <address bus='0' target='0' unit='0'/> </source> </hostdev> 3. start vm [root@dell-per740xd-11 ~]# virsh start vm1 Domain vm1 started 4. restart libvirtd [root@dell-per740xd-11 ~]# systemctl restart libvirtd 5. attach the device [root@dell-per740xd-11 ~]# virsh attach-device vm1 scsi.xml Device attached successfully 6. check the device actually attched into vm localhost login: root Password: Last login: Thu Sep 3 16:47:54 on ttyS0 [root@localhost ~]# lsscsi [root@localhost ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT ... vdb 252:16 0 100M 0 disk
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:5137