Bug 1833321 - [cgroup_v2] failed to count cgroup BPF map items: No such file or directory
Summary: [cgroup_v2] failed to count cgroup BPF map items: No such file or directory
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.2
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: 8.3
Assignee: Pavel Hrdina
QA Contact: yisun
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-08 12:31 UTC by Jing Qi
Modified: 2020-11-17 17:49 UTC (History)
9 users (show)

Fixed In Version: libvirt-6.6.0-4.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-17 17:48:34 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jing Qi 2020-05-08 12:31:01 UTC
Description of problem:

Hotplug hostdev (type=vf or scsi) / interface (hostdev) failed after libvirtd restart in env with cgroup2 enabled
error: failed to count cgroup BPF map items: No such file or directory

Version
qemu-kvm-4.2.0-20.module+el8.2.1+6467+49dc3278.x86_64
libvirt-daemon-6.0.0-18.module+el8.2.1+6456+a6d62e4e.x86_64


How reproducible:
100%

steps:
In cgroup2 enabled host -
# mount | grep cgroup
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel,nsdelegate)

scenario 1:
scsi passthrough scenario:
1. Start a domain
# virsh start avocado-vt-vm1
Domain avocado-vt-vm1 started
2. # systemctl restart libvirtd
3. #  cat scsi.xml
  <hostdev mode='subsystem' type='scsi'>
    <source>
      <adapter name='scsi_host15'/>
      <address bus='0' target='0' unit='0'/>
    </source>
  </hostdev>
4. # virsh attach-device avocado-vt-vm1 scsi.xml
error: Failed to attach device from scsi.xml
error: failed to count cgroup BPF map items: No such file or directory 

scenario 2:

In a host with SRIOV hba  -
# lspci -v |grep -i ethernet
05:00.0 Ethernet controller: Intel Corporation Ethernet 10G 2P X520 Adapter
(rev 01)
05:00.1 Ethernet controller: Intel Corporation Ethernet 10G 2P X520 Adapter
(rev 01)
05:10.1 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
05:10.3 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
05:10.5 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
05:10.7 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
05:11.1 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
05:11.3 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
05:11.5 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
05:11.7 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
05:12.1 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
05:12.3 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
05:12.5 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
05:12.7 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
05:13.1 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
05:13.3 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
05:13.5 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
05:13.7 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
05:14.1 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
05:14.3 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
05:14.5 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
05:14.7 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
05:15.1 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
05:15.3 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
05:15.5 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
05:15.7 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
05:16.1 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
05:16.3 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
05:16.5 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
05:16.7 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
05:17.1 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
05:17.3 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
05:17.5 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
05:17.7 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)

1. Start a domain
# virsh start avocado-vt-vm1
Domain avocado-vt-vm1 started

2.  Prepare  a xml file  -
<hostdev managed="yes" mode="subsystem" type="pci"><source><address
bus="0x05" domain="0x0000" function="0x7" slot="0x17" /></source></hostdev>
   # virsh attach-device avocado-vt-vm1 /var/tmp/xml_utils_temp_81ju7h16.xml
   Device attached successfully

   # virsh detach-device avocado-vt-vm1 /var/tmp/xml_utils_temp_81ju7h16.xml
Device detached successfully

3.# systemctl restart libvirtd

# virsh attach-device avocado-vt-vm1 /var/tmp/xml_utils_temp_81ju7h16.xml
error: Failed to attach device from /var/tmp/xml_utils_temp_81ju7h16.xml
error: failed to count cgroup BPF map items: No such file or directory

Expected results:
 With libvirtd restart, the attach-device command still can succeed.

Actual results:
  The attach-device command failed.

Additional info:
For the interface with "type=hostdev", the same issue is encountered.
# cat inter-hostdev-vfio.xml
<interface managed="yes" type="hostdev">
<mac address="9a:7a:6d:dd:b0:b9"/>
<source><address bus="0x05" domain="0x0000" function="0x1" slot="0x10" type="pci" />
</source>
<alias
name="ua-EYJPB12xqYVkkVV-9ga4YRWNe5WRO7wpDmQRm5z6iR61TTNeZ_hZOPMI_LVuy2w5"
/>
</interface>

Comment 1 Jing Qi 2020-05-08 12:43:12 UTC
Part of the libvirt.log -

2020-05-08 12:38:03.034+0000: 553687: info : virObjectRef:386 : OBJECT_REF: obj=0x7fe6741bf420
2020-05-08 12:38:03.034+0000: 553687: info : virObjectUnref:348 : OBJECT_UNREF: obj=0x7fe660003040
2020-05-08 12:38:03.034+0000: 553687: info : virEventPollUpdateHandle:147 : EVENT_POLL_UPDATE_HANDLE: watch=11 events=13
2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollInterruptLocked:723 : Skip interrupt, 1 140629459041280
2020-05-08 12:38:03.034+0000: 553687: info : virObjectUnref:348 : OBJECT_UNREF: obj=0x7fe660003040
2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollDispatchHandles:487 : i=11 w=12
2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollCleanupTimeouts:520 : Cleanup 2
2020-05-08 12:38:03.034+0000: 553687: info : virEventPollCleanupTimeouts:533 : EVENT_POLL_PURGE_TIMEOUT: timer=4
2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollCleanupHandles:569 : Cleanup 12
2020-05-08 12:38:03.034+0000: 553687: debug : virEventRunDefaultImpl:350 : running default event implementation
2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollCleanupTimeouts:520 : Cleanup 1
2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollCleanupHandles:569 : Cleanup 12
2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollMakePollFDs:396 : Prepare n=0 w=1, f=8 e=1 d=0
2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollMakePollFDs:396 : Prepare n=1 w=2, f=10 e=1 d=0
2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollMakePollFDs:396 : Prepare n=2 w=3, f=5 e=1 d=0
2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollMakePollFDs:396 : Prepare n=3 w=4, f=3 e=1 d=0
2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollMakePollFDs:396 : Prepare n=4 w=5, f=4 e=1 d=0
2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollMakePollFDs:396 : Prepare n=5 w=6, f=13 e=1 d=0
2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollMakePollFDs:396 : Prepare n=6 w=7, f=14 e=1 d=0
2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollMakePollFDs:396 : Prepare n=7 w=8, f=18 e=0 d=0
2020-05-08 12:38:03.034+0000: 553786: debug : virThreadJobSetWorker:75 : Thread 553786 is running worker qemuProcessEventHandler
2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollMakePollFDs:396 : Prepare n=8 w=9, f=18 e=1 d=0
2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollMakePollFDs:396 : Prepare n=9 w=10, f=28 e=1 d=0
2020-05-08 12:38:03.034+0000: 553786: debug : qemuProcessEventHandler:4866 : vm=0x7fe6741bf420, event=2
2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollMakePollFDs:396 : Prepare n=10 w=11, f=30 e=25 d=0
2020-05-08 12:38:03.034+0000: 553786: info : virObjectRef:386 : OBJECT_REF: obj=0x7fe674145560
2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollMakePollFDs:396 : Prepare n=11 w=12, f=22 e=25 d=0
2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollCalculateTimeout:333 : Calculate expiry of 1 timers
2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollCalculateTimeout:341 : Got a timeout scheduled for 1588941601559
2020-05-08 12:38:03.034+0000: 553786: debug : processDeviceDeletedEvent:4283 : Removing device hostdev0 from domain 0x7fe6741bf420 avocado-vt-vm1
2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollCalculateTimeout:354 : Schedule timeout then=1588941601559 now=1588941483034
2020-05-08 12:38:03.034+0000: 553786: info : virObjectRef:386 : OBJECT_REF: obj=0x7fe674145560
2020-05-08 12:38:03.034+0000: 553687: debug : virEventPollCalculateTimeout:364 : Timeout at 1588941601559 due in 118525 ms
2020-05-08 12:38:03.034+0000: 553786: debug : qemuDomainObjBeginJobInternal:9798 : Starting job: job=modify agentJob=none asyncJob=none (vm=0x7fe6741bf420 name=avocado-vt-vm1, current job=none agentJob=none async=none)
2020-05-08 12:38:03.034+0000: 553687: info : virEventPollRunOnce:635 : EVENT_POLL_RUN: nhandles=11 timeout=118525
2020-05-08 12:38:03.034+0000: 553786: debug : qemuDomainObjBeginJobInternal:9847 : Started job: modify (async=none vm=0x7fe6741bf420 name=avocado-vt-vm1)
2020-05-08 12:38:03.034+0000: 553786: info : virObjectUnref:348 : OBJECT_UNREF: obj=0x7fe674145560
2020-05-08 12:38:03.034+0000: 553786: debug : qemuDomainRemoveHostDevice:4426 : Removing host device hostdev0 from domain 0x7fe6741bf420 avocado-vt-vm1
2020-05-08 12:38:03.035+0000: 553786: debug : virFileClose:110 : Closed fd 34
2020-05-08 12:38:03.035+0000: 553786: debug : virFileClose:110 : Closed fd 34
2020-05-08 12:38:03.035+0000: 553786: debug : virPCIDeviceNew:1418 : 8086 10ed 0000:05:10.1: initialized
2020-05-08 12:38:03.035+0000: 553786: debug : virPCIDeviceFree:1449 : 8086 10ed 0000:05:10.1: freeing
2020-05-08 12:38:03.035+0000: 553786: debug : qemuTeardownHostdevCgroup:479 : Cgroup deny /dev/vfio/58
2020-05-08 12:38:03.035+0000: 553786: error : virCgroupV2DevicesDetectProg:423 : failed to count cgroup BPF map items: No such file or directory
2020-05-08 12:38:03.035+0000: 553786: debug : virFileClose:110 : Closed fd 34
2020-05-08 12:38:03.035+0000: 553786: warning : qemuDomainRemoveHostDevice:4480 : Failed to remove host device cgroup ACL
2020-05-08 12:38:03.035+0000: 553786: debug : virFileClose:110 : Closed fd 34
2020-05-08 12:38:03.035+0000: 553786: debug : virFileClose:110 : Closed fd 34

Comment 2 Jaroslav Suchanek 2020-05-11 07:29:05 UTC
Pavel, is there something libvirt should fix or is it likely a kernel problem? Thanks.

Comment 3 Pavel Hrdina 2020-05-13 09:25:15 UTC
I'll have to investigate if it's a libvirt of kernel issue.

Comment 4 Pavel Hrdina 2020-05-13 16:57:05 UTC
So the issue is in libvirt in the code that loads BPF map of running QEMU process after the daemon was restarted.  I'll post a patch to upstream and back-port it to downstream.

Comment 6 Pavel Hrdina 2020-08-11 10:14:17 UTC
Upstream patch posted:

https://www.redhat.com/archives/libvir-list/2020-August/msg00429.html

Comment 7 Pavel Hrdina 2020-08-13 14:04:03 UTC
Upstream commit:

commit 7e574d1a079bd13aeeedb7024cc45f85b1843fcc
Author: Pavel Hrdina <phrdina>
Date:   Tue Aug 11 11:07:06 2020 +0200

    vircgroupv2devices: fix counting entries in BPF map

Comment 13 yisun 2020-09-03 10:32:29 UTC
Test with: libvirt-6.6.0-4.module+el8.3.0+7883+3d717aa8.x86_64
Result: PASS

1. prepare a scsi device on host
[root@dell-per740xd-11 ~]# lsscsi
[0:2:0:0]    disk    DELL     PERC H730P Adp   4.30  /dev/sda 
[17:0:0:0]   disk    LIO-ORG  device.logical-  4.0   /dev/sdb 

2. prepare a device xml to be attached
[root@dell-per740xd-11 ~]# cat scsi.xml 
  <hostdev mode='subsystem' type='scsi'>
    <source>
      <adapter name='scsi_host17'/>
      <address bus='0' target='0' unit='0'/>
    </source>
  </hostdev>

3. start vm
[root@dell-per740xd-11 ~]# virsh start vm1
Domain vm1 started

4. restart libvirtd
[root@dell-per740xd-11 ~]# systemctl restart libvirtd

5. attach the device
[root@dell-per740xd-11 ~]#  virsh attach-device vm1 scsi.xml
Device attached successfully

6. check the device actually attched into vm
localhost login: root
Password: 
Last login: Thu Sep  3 16:47:54 on ttyS0
[root@localhost ~]# lsscsi
[root@localhost ~]# lsblk
NAME          MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
...
vdb           252:16   0  100M  0 disk

Comment 16 errata-xmlrpc 2020-11-17 17:48:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5137


Note You need to log in before you can comment on or make changes to this bug.