Description of problem: It is impossible to obtain CPU stats for existing instances after tripleo_nova_libvirt service is restarted: [root@compute-0 ~]# podman exec -it nova_libvirt virsh cpu-stats instance-0000001e error: Failed to retrieve CPU statistics for domain 'instance-0000001e' error: Requested operation is not valid: cgroup CPUACCT controller is not mounted This looks similar to https://answers.launchpad.net/ubuntu/+question/665132 Version-Release number of selected component (if applicable): RHOSP 16.2 Steps to Reproduce: 1. Start fresh instance 2. Run virsh cpu-stats for new instance to confirm that it works fine 3. Restart tripleo_nova_libvirt 4. Repeat virsh cpu-stats command Actual results: Command fails to provide stats, the following errors are in output: error: Failed to retrieve CPU statistics for domain 'instance-0000001e' error: Requested operation is not valid: cgroup CPUACCT controller is not mounted Expected results: virsh cpu-stats command works fine after libvirt is restarted
I have managed to workaround this issue by appending the following lines to /etc/systemd/system/tripleo_nova_libvirt.service and running systemctl daemon-reload: new VMs were no longer affected by libvirt restarts. [Service] Delegate=yes I am not sure if this should be used as official workaround until fix is delivered.
One of the libvirt developers pointed me to this upstream libvirt fix: https://gitlab.com/libvirt/libvirt/-/commit/8865c42771600a4 -- qemu: Provide virDomainGetCPUStats() implementation for session connection That said, the above fix is for the libvirt "session" connection (i.e. non-root), and the libvirt commit message says "system" connections (i.e. root) work fine. The nova_libvirt container (which is running the libvirt daemon and QEMU instances) runs with "system" connection (i.e. privileged). So I'm not sure if it's the above libvirt fix is the solution here. @mprivzon: Hi, Michal; does the above make sense to you? (Including Alex's workaround in comment#2.) Meanwhile, I learnt something interesting about libvirt 'system' connections inside a container from Andrea Bolognani from the libvirt team: libvirt's 'system' connection inside a container can sometimes behave somewhat more similar to 'session' connections _outside_ the container. That's because, regardless of the shenanigans the container runtime will perform to make processes believe that they are running with elevated privileges, a bunch of the actual capabilities will be missing. We're not saying this is the case here, just a potential angle to consider. (For those following along, here's a quick primer on system vs session operating modes: https://libvirt.org/daemons.html#operating-modes.)
(In reply to Kashyap Chamarthy from comment #3) > One of the libvirt developers pointed me to this upstream libvirt fix: > > https://gitlab.com/libvirt/libvirt/-/commit/8865c42771600a4 -- qemu: > Provide virDomainGetCPUStats() implementation for session connection This is unrelated, because OSP runs priviledged libvirt, i.e. connects to qemu:///system. The commit above was made specifically for KubeVirt needs as they connect to qemu:///session. Though it may help, because the same code is used for both connections. > > That said, the above fix is for the libvirt "session" connection (i.e. > non-root), and the libvirt commit message says "system" connections > (i.e. root) work fine. > > The nova_libvirt container (which is running the libvirt daemon and QEMU > instances) runs with "system" connection (i.e. privileged). So I'm not > sure if it's the above libvirt fix is the solution here. You are correct and the commit is unrelated. The real problem here is why libvirt fails to detect CGroups after it was restarted. For that I'd need to see debug logs.
(In reply to Michal Privoznik from comment #5) > (In reply to Kashyap Chamarthy from comment #3) > > One of the libvirt developers pointed me to this upstream libvirt fix: > > > > https://gitlab.com/libvirt/libvirt/-/commit/8865c42771600a4 -- qemu: > > Provide virDomainGetCPUStats() implementation for session connection > > This is unrelated, because OSP runs priviledged libvirt, i.e. connects to > qemu:///system. The commit above was made specifically for KubeVirt needs as > they connect to qemu:///session. Though it may help, because the same code > is used for both connections. What I forgot to mention is: plain 'virsh cpu-stats $domain' won't work for qemu:///session, because by default it tries to get per-host-cpu stats for the guest (i.e. how much time the guest consumed on host CPU#0, CPU#1, etc.). And without cpu accounting controller there's (possibly) no way to get that information. BUT, what is possible is to get overall statistics (i.e. how much time the guest consumed on all host CPUs combined). Long story short, use: 'virsh cpu-stats --total $domain'. But as I've said earlier, this is just a workaround and the real reason is libvirt failed to detect CGroup after it was restarted.
Created attachment 1983541 [details] libvirtd and instance logs Here are libvirt logs and instance logs from my lab. Logs cover the following calls: 57 podman exec -it nova_libvirt virsh cpu-stats instance-0000002a 58 systemctl restart tripleo_nova_libvirt 59 podman exec -it nova_libvirt virsh cpu-stats instance-0000002a 60 podman restart nova_libvirt 61 podman exec -it nova_libvirt virsh cpu-stats instance-0000002a I was a bit wrong about reproducer: for clean compute node restart of tripleo_nova_libvirt systemd unit doesn't help to reproduce, nova_libvirt container should be restarted using podman command first (looks like originally compute wasn't clean and although I haven't used podman command, problem still was reproduced).
And after another try on fresh compute I figured out that first "systemctl restart tripleo_nova_libvirt" doesn't break things, but second "systemctl restart tripleo_nova_libvirt" does: [root@compute-2 ~]# podman exec -it nova_libvirt virsh list Id Name State ----------------------------------- 3 instance-00000028 running [root@compute-2 ~]# podman exec -it nova_libvirt virsh cpu-stats instance-00000028 CPU0: cpu_time 0.631494967 seconds vcpu_time 0.000000000 seconds [root@compute-2 ~]# systemctl restart tripleo_nova_libvirt [root@compute-2 ~]# podman exec -it nova_libvirt virsh cpu-stats instance-00000028 CPU0: cpu_time 0.650575681 seconds vcpu_time 0.000000000 seconds [root@compute-2 ~]# systemctl restart tripleo_nova_libvirt [root@compute-2 ~]# podman exec -it nova_libvirt virsh cpu-stats instance-00000028 error: Failed to retrieve CPU statistics for domain 'instance-00000028' error: Requested operation is not valid: cgroup CPUACCT controller is not mounted
Please let me know if anything else is needed.
So this is CGroups detection after the first restart: 2023-08-16 09:06:48.920+0000: 986222: debug : virGDBusIsServiceEnabled:387 : Service org.freedesktop.machine1 is unavailable 2023-08-16 09:06:48.920+0000: 986222: debug : virCgroupNewDetect:1152 : pid=982717 controllers=-1 group=0x7fb32fffe828 2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupDetectPlacement:349 : Detecting placement for pid 982717 path 2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 0:cpu at /sys/fs/cgroup/cpu,cpuacct in /machine/qemu-1-instance-0000002a.libvirt-qemu/emulator for pid 982717 2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 1:cpuacct at /sys/fs/cgroup/cpu,cpuacct in /machine/qemu-1-instance-0000002a.libvirt-qemu/emulator for pid 982717 2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 2:cpuset at /sys/fs/cgroup/cpuset in /machine/qemu-1-instance-0000002a.libvirt-qemu/emulator for pid 982717 2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 3:memory at /sys/fs/cgroup/memory in /machine/qemu-1-instance-0000002a.libvirt-qemu for pid 982717 2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 4:devices at /sys/fs/cgroup/devices in /machine/qemu-1-instance-0000002a.libvirt-qemu for pid 982717 2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 5:freezer at /sys/fs/cgroup/freezer in /machine/qemu-1-instance-0000002a.libvirt-qemu for pid 982717 2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 6:blkio at /sys/fs/cgroup/blkio in /machine/qemu-1-instance-0000002a.libvirt-qemu for pid 982717 2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 7:net_cls at /sys/fs/cgroup/net_cls,net_prio in /machine/qemu-1-instance-0000002a.libvirt-qemu for pid 982717 2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 8:perf_event at /sys/fs/cgroup/perf_event in /machine/qemu-1-instance-0000002a.libvirt-qemu for pid 982717 2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 9:name=systemd at /sys/fs/cgroup/systemd in /system.slice/run-r22a2e1da04734a7aa857f08f8b7c4284.scope for pid 982717 2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1DetectControllers:493 : Auto-detecting controllers 2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1DetectControllers:503 : Controller 'cpu' present=yes 2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1DetectControllers:503 : Controller 'cpuacct' present=yes 2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1DetectControllers:503 : Controller 'cpuset' present=yes 2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1DetectControllers:503 : Controller 'memory' present=yes 2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1DetectControllers:503 : Controller 'devices' present=yes 2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1DetectControllers:503 : Controller 'freezer' present=yes 2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1DetectControllers:503 : Controller 'blkio' present=yes 2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1DetectControllers:503 : Controller 'net_cls' present=yes 2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1DetectControllers:503 : Controller 'perf_event' present=yes 2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1DetectControllers:503 : Controller 'name=systemd' present=yes Please note the placement of controllers: /sys/fs/cgroup/$controller/machine/qemu-1-instance-0000002a.libvirt-qemu. Now, this is CGroups detection after the second restart: 2023-08-16 09:07:19.041+0000: 986726: debug : virGDBusIsServiceEnabled:387 : Service org.freedesktop.machine1 is unavailable 2023-08-16 09:07:19.041+0000: 986726: debug : virCgroupNewDetect:1152 : pid=982717 controllers=-1 group=0x7f22f1137828 2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupDetectPlacement:349 : Detecting placement for pid 982717 path 2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 0:cpu at /sys/fs/cgroup/cpu,cpuacct in /system.slice/run-r22a2e1da04734a7aa857f08f8b7c4284.scope for pid 982717 2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 1:cpuacct at /sys/fs/cgroup/cpu,cpuacct in /system.slice/run-r22a2e1da04734a7aa857f08f8b7c4284.scope for pid 982717 2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 2:cpuset at /sys/fs/cgroup/cpuset in /machine/qemu-1-instance-0000002a.libvirt-qemu/emulator for pid 982717 2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 3:memory at /sys/fs/cgroup/memory in /system.slice/run-r22a2e1da04734a7aa857f08f8b7c4284.scope for pid 982717 2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 4:devices at /sys/fs/cgroup/devices in /system.slice for pid 982717 2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 5:freezer at /sys/fs/cgroup/freezer in /machine/qemu-1-instance-0000002a.libvirt-qemu for pid 982717 2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 6:blkio at /sys/fs/cgroup/blkio in /system.slice/run-r22a2e1da04734a7aa857f08f8b7c4284.scope for pid 982717 2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 7:net_cls at /sys/fs/cgroup/net_cls,net_prio in /machine/qemu-1-instance-0000002a.libvirt-qemu for pid 982717 2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 8:perf_event at /sys/fs/cgroup/perf_event in /machine/qemu-1-instance-0000002a.libvirt-qemu for pid 982717 2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 9:name=systemd at /sys/fs/cgroup/systemd in /system.slice/run-r22a2e1da04734a7aa857f08f8b7c4284.scope for pid 982717 2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1DetectControllers:493 : Auto-detecting controllers 2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1DetectControllers:503 : Controller 'cpu' present=yes 2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1DetectControllers:503 : Controller 'cpuacct' present=yes 2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1DetectControllers:503 : Controller 'cpuset' present=yes 2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1DetectControllers:503 : Controller 'memory' present=yes 2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1DetectControllers:503 : Controller 'devices' present=yes 2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1DetectControllers:503 : Controller 'freezer' present=yes 2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1DetectControllers:503 : Controller 'blkio' present=yes 2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1DetectControllers:503 : Controller 'net_cls' present=yes 2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1DetectControllers:503 : Controller 'perf_event' present=yes 2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1DetectControllers:503 : Controller 'name=systemd' present=yes 2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1ValidateMachineGroup:161 : Name 'run-r22a2e1da04734a7aa857f08f8b7c4284.scope' for controller 'cpu' does not match 'instance-0000002a', 'qemu-1-instance-0000002a', 'instance-0000002a.libvirt-qemu', 'machine-qemu\x2dinstance\x2d0000002a.scope' or 'machine-qemu\x2d1\x2dinstance\x2d0000002a.scope' 2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupNewDetectMachine:1203 : Failed to validate machine name for 'instance-0000002a' driver 'qemu' Now, do you notice the change? SOME controllers are now in unexpected /sys/fs/cgroup/$controller/system.slice/run-r22a2e1da04734a7aa857f08f8b7c4284.scope which then fails libvirt validation, which then leads libvirt think CGroups were mangled with behind its back and thus it refuses to work with such controller(s). And unfortunately, the cpuacct controller is one of those affected. I have no idea who mangled the placement and what is going on, but this is NOT a libvirt bug. If something is messing up CGroup placement, well it needs to stop doing that.
Thank you very much Michal!
After discussion with Kashyap we have decided to ask Michal why Delegate flag changes systemd/cgroup behavior. I have decided to take a second look at the lab to confirm that there are no possible disparities. It turns out that there are: as mentioned in comment #10, it takes two libvirt restarts using "systemctl restart tripleo_nova_libvirt" command to reproduce the issue for new VM, but originally I believed that only one is needed. So I got false positive outcome for workaround testing: after single restart VM's CPU stats were still printed, while in reality behavior is still the same. As a result, we need to figure out why some controllers end up in unexpected /sys/fs/cgroup/$controller/system.slice/run-*.scope after second restart. I will try to ask support experts to help us understand this, but engineering engagement is possibly needed here.
Created attachment 1990585 [details] dbus-monitor output + reproducer commands with timestamps Adding requested "dbus-monitor --system" output + commands I have used to reproduce with timestamps.
*** Bug 2233089 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.2.6 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:1519