Bug 2228376 - "virsh cpu-stats" command fails after tripleo_nova_libvirt service is restarted
Summary: "virsh cpu-stats" command fails after tripleo_nova_libvirt service is restarted
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.2 (Train)
Hardware: All
OS: All
high
high
Target Milestone: async
: 16.2 (Train on RHEL 8.4)
Assignee: Bogdan Dobrelya
QA Contact: Jason Grosso
URL:
Whiteboard:
: 2233089 (view as bug list)
Depends On:
Blocks: 2244501 2257419
TreeView+ depends on / blocked
 
Reported: 2023-08-02 08:49 UTC by Alex Stupnikov
Modified: 2024-09-09 16:10 UTC (History)
19 users (show)

Fixed In Version: openstack-tripleo-heat-templates-11.6.1-2.20230808225219.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2244501 2257419 (view as bug list)
Environment:
Last Closed: 2024-03-26 12:25:32 UTC
Target Upstream Version:
Embargoed:
bdobreli: needinfo-
bdobreli: needinfo-


Attachments (Terms of Use)
libvirtd and instance logs (1.13 MB, application/x-tar)
2023-08-16 09:15 UTC, Alex Stupnikov
no flags Details
dbus-monitor output + reproducer commands with timestamps (1.04 MB, application/x-tar)
2023-09-26 12:37 UTC, Alex Stupnikov
no flags Details


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 897241 0 None MERGED Manage QEMU driver cgroups by monolithic libvirtd 2024-03-12 19:14:45 UTC
Red Hat Issue Tracker OSP-27151 0 None None None 2023-08-02 15:15:11 UTC
Red Hat Knowledge Base (Solution) 7026599 0 None None None 2023-08-02 08:52:52 UTC
Red Hat Product Errata RHBA-2024:1519 0 None None None 2024-03-26 12:25:48 UTC

Description Alex Stupnikov 2023-08-02 08:49:44 UTC
Description of problem:
It is impossible to obtain CPU stats for existing instances after tripleo_nova_libvirt service is restarted:

[root@compute-0 ~]# podman exec -it nova_libvirt virsh cpu-stats instance-0000001e
error: Failed to retrieve CPU statistics for domain 'instance-0000001e'
error: Requested operation is not valid: cgroup CPUACCT controller is not mounted

This looks similar to https://answers.launchpad.net/ubuntu/+question/665132


Version-Release number of selected component (if applicable):
RHOSP 16.2


Steps to Reproduce:
1. Start fresh instance
2. Run virsh cpu-stats for new instance to confirm that it works fine
3. Restart tripleo_nova_libvirt
4. Repeat virsh cpu-stats command

Actual results:
Command fails to provide stats, the following errors are in output:
error: Failed to retrieve CPU statistics for domain 'instance-0000001e'
error: Requested operation is not valid: cgroup CPUACCT controller is not mounted

Expected results:
virsh cpu-stats command works fine after libvirt is restarted

Comment 2 Alex Stupnikov 2023-08-02 09:10:19 UTC
I have managed to workaround this issue by appending the following lines to /etc/systemd/system/tripleo_nova_libvirt.service and running systemctl daemon-reload: new VMs were no longer affected by libvirt restarts.

[Service]
Delegate=yes

I am not sure if this should be used as official workaround until fix is delivered.

Comment 3 Kashyap Chamarthy 2023-08-09 13:54:54 UTC
One of the libvirt developers pointed me to this upstream libvirt fix:

    https://gitlab.com/libvirt/libvirt/-/commit/8865c42771600a4 -- qemu:
    Provide virDomainGetCPUStats() implementation for session connection

That said, the above fix is for the libvirt "session" connection (i.e.
non-root), and the libvirt commit message says "system" connections
(i.e. root) work fine. 

The nova_libvirt container (which is running the libvirt daemon and QEMU
instances) runs with "system" connection (i.e.  privileged).  So I'm not
sure if it's the above libvirt fix is the solution here.

@mprivzon: Hi, Michal; does the above make sense to you?  (Including 
Alex's workaround in comment#2.)

Meanwhile, I learnt something interesting about libvirt 'system' 
connections inside a container from Andrea Bolognani from the libvirt
team:

    libvirt's 'system' connection inside a container can sometimes 
    behave somewhat more similar to 'session' connections _outside_ the
    container.  That's because, regardless of the shenanigans the
    container runtime will perform to make processes believe that they
    are running with elevated privileges, a bunch of the actual
    capabilities will be missing.

We're not saying this is the case here, just a potential angle to
consider.

(For those following along, here's a quick primer on system vs session
operating modes: https://libvirt.org/daemons.html#operating-modes.)

Comment 5 Michal Privoznik 2023-08-14 12:59:36 UTC
(In reply to Kashyap Chamarthy from comment #3)
> One of the libvirt developers pointed me to this upstream libvirt fix:
> 
>     https://gitlab.com/libvirt/libvirt/-/commit/8865c42771600a4 -- qemu:
>     Provide virDomainGetCPUStats() implementation for session connection

This is unrelated, because OSP runs priviledged libvirt, i.e. connects to qemu:///system. The commit above was made specifically for KubeVirt needs as they connect to qemu:///session. Though it may help, because the same code is used for both connections.

> 
> That said, the above fix is for the libvirt "session" connection (i.e.
> non-root), and the libvirt commit message says "system" connections
> (i.e. root) work fine. 
> 
> The nova_libvirt container (which is running the libvirt daemon and QEMU
> instances) runs with "system" connection (i.e.  privileged).  So I'm not
> sure if it's the above libvirt fix is the solution here.

You are correct and the commit is unrelated. The real problem here is why libvirt fails to detect CGroups after it was restarted. For that I'd need to see debug logs.

Comment 8 Michal Privoznik 2023-08-15 08:40:59 UTC
(In reply to Michal Privoznik from comment #5)
> (In reply to Kashyap Chamarthy from comment #3)
> > One of the libvirt developers pointed me to this upstream libvirt fix:
> > 
> >     https://gitlab.com/libvirt/libvirt/-/commit/8865c42771600a4 -- qemu:
> >     Provide virDomainGetCPUStats() implementation for session connection
> 
> This is unrelated, because OSP runs priviledged libvirt, i.e. connects to
> qemu:///system. The commit above was made specifically for KubeVirt needs as
> they connect to qemu:///session. Though it may help, because the same code
> is used for both connections.

What I forgot to mention is: plain 'virsh cpu-stats $domain' won't work for qemu:///session, because by default it tries to get per-host-cpu stats for the guest (i.e. how much time the guest consumed on host CPU#0, CPU#1, etc.). And without cpu accounting controller there's (possibly) no way to get that information. BUT, what is possible is to get overall statistics (i.e. how much time the guest consumed on all host CPUs combined). Long story short, use: 'virsh cpu-stats --total $domain'.

But as I've said earlier, this is just a workaround and the real reason is libvirt failed to detect CGroup after it was restarted.

Comment 9 Alex Stupnikov 2023-08-16 09:15:13 UTC
Created attachment 1983541 [details]
libvirtd and instance logs

Here are libvirt logs and instance logs from my lab. Logs cover the following calls:

   57  podman exec -it nova_libvirt virsh cpu-stats instance-0000002a
   58  systemctl restart tripleo_nova_libvirt
   59  podman exec -it nova_libvirt virsh cpu-stats instance-0000002a
   60  podman restart nova_libvirt
   61  podman exec -it nova_libvirt virsh cpu-stats instance-0000002a

I was a bit wrong about reproducer: for clean compute node restart of tripleo_nova_libvirt systemd unit doesn't help to reproduce, nova_libvirt container should be restarted using podman command first (looks like originally compute wasn't clean and although I haven't used podman command, problem still was reproduced).

Comment 10 Alex Stupnikov 2023-08-16 09:18:32 UTC
And after another try on fresh compute I figured out that first "systemctl restart tripleo_nova_libvirt" doesn't break things, but second "systemctl restart tripleo_nova_libvirt" does:

[root@compute-2 ~]# podman exec -it nova_libvirt virsh list
 Id   Name                State
-----------------------------------
 3    instance-00000028   running

[root@compute-2 ~]# podman exec -it nova_libvirt virsh cpu-stats instance-00000028
CPU0:
        cpu_time             0.631494967 seconds
        vcpu_time            0.000000000 seconds


[root@compute-2 ~]# systemctl restart tripleo_nova_libvirt
[root@compute-2 ~]# podman exec -it nova_libvirt virsh cpu-stats instance-00000028
CPU0:
        cpu_time             0.650575681 seconds
        vcpu_time            0.000000000 seconds


[root@compute-2 ~]# systemctl restart tripleo_nova_libvirt
[root@compute-2 ~]# podman exec -it nova_libvirt virsh cpu-stats instance-00000028
error: Failed to retrieve CPU statistics for domain 'instance-00000028'
error: Requested operation is not valid: cgroup CPUACCT controller is not mounted

Comment 11 Alex Stupnikov 2023-08-16 09:18:54 UTC
Please let me know if anything else is needed.

Comment 12 Michal Privoznik 2023-08-16 11:55:23 UTC
So this is CGroups detection after the first restart:

2023-08-16 09:06:48.920+0000: 986222: debug : virGDBusIsServiceEnabled:387 : Service org.freedesktop.machine1 is unavailable
2023-08-16 09:06:48.920+0000: 986222: debug : virCgroupNewDetect:1152 : pid=982717 controllers=-1 group=0x7fb32fffe828
2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupDetectPlacement:349 : Detecting placement for pid 982717 path 
2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 0:cpu at /sys/fs/cgroup/cpu,cpuacct in /machine/qemu-1-instance-0000002a.libvirt-qemu/emulator for pid 982717
2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 1:cpuacct at /sys/fs/cgroup/cpu,cpuacct in /machine/qemu-1-instance-0000002a.libvirt-qemu/emulator for pid 982717
2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 2:cpuset at /sys/fs/cgroup/cpuset in /machine/qemu-1-instance-0000002a.libvirt-qemu/emulator for pid 982717
2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 3:memory at /sys/fs/cgroup/memory in /machine/qemu-1-instance-0000002a.libvirt-qemu for pid 982717
2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 4:devices at /sys/fs/cgroup/devices in /machine/qemu-1-instance-0000002a.libvirt-qemu for pid 982717
2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 5:freezer at /sys/fs/cgroup/freezer in /machine/qemu-1-instance-0000002a.libvirt-qemu for pid 982717
2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 6:blkio at /sys/fs/cgroup/blkio in /machine/qemu-1-instance-0000002a.libvirt-qemu for pid 982717
2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 7:net_cls at /sys/fs/cgroup/net_cls,net_prio in /machine/qemu-1-instance-0000002a.libvirt-qemu for pid 982717
2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 8:perf_event at /sys/fs/cgroup/perf_event in /machine/qemu-1-instance-0000002a.libvirt-qemu for pid 982717
2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 9:name=systemd at /sys/fs/cgroup/systemd in /system.slice/run-r22a2e1da04734a7aa857f08f8b7c4284.scope for pid 982717
2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1DetectControllers:493 : Auto-detecting controllers
2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1DetectControllers:503 : Controller 'cpu' present=yes
2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1DetectControllers:503 : Controller 'cpuacct' present=yes
2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1DetectControllers:503 : Controller 'cpuset' present=yes
2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1DetectControllers:503 : Controller 'memory' present=yes
2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1DetectControllers:503 : Controller 'devices' present=yes
2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1DetectControllers:503 : Controller 'freezer' present=yes
2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1DetectControllers:503 : Controller 'blkio' present=yes
2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1DetectControllers:503 : Controller 'net_cls' present=yes
2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1DetectControllers:503 : Controller 'perf_event' present=yes
2023-08-16 09:06:48.921+0000: 986222: debug : virCgroupV1DetectControllers:503 : Controller 'name=systemd' present=yes


Please note the placement of controllers: /sys/fs/cgroup/$controller/machine/qemu-1-instance-0000002a.libvirt-qemu.

Now, this is CGroups detection after the second restart:

2023-08-16 09:07:19.041+0000: 986726: debug : virGDBusIsServiceEnabled:387 : Service org.freedesktop.machine1 is unavailable
2023-08-16 09:07:19.041+0000: 986726: debug : virCgroupNewDetect:1152 : pid=982717 controllers=-1 group=0x7f22f1137828
2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupDetectPlacement:349 : Detecting placement for pid 982717 path 
2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 0:cpu at /sys/fs/cgroup/cpu,cpuacct in /system.slice/run-r22a2e1da04734a7aa857f08f8b7c4284.scope for pid 982717
2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 1:cpuacct at /sys/fs/cgroup/cpu,cpuacct in /system.slice/run-r22a2e1da04734a7aa857f08f8b7c4284.scope for pid 982717
2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 2:cpuset at /sys/fs/cgroup/cpuset in /machine/qemu-1-instance-0000002a.libvirt-qemu/emulator for pid 982717
2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 3:memory at /sys/fs/cgroup/memory in /system.slice/run-r22a2e1da04734a7aa857f08f8b7c4284.scope for pid 982717
2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 4:devices at /sys/fs/cgroup/devices in /system.slice for pid 982717
2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 5:freezer at /sys/fs/cgroup/freezer in /machine/qemu-1-instance-0000002a.libvirt-qemu for pid 982717
2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 6:blkio at /sys/fs/cgroup/blkio in /system.slice/run-r22a2e1da04734a7aa857f08f8b7c4284.scope for pid 982717
2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 7:net_cls at /sys/fs/cgroup/net_cls,net_prio in /machine/qemu-1-instance-0000002a.libvirt-qemu for pid 982717
2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 8:perf_event at /sys/fs/cgroup/perf_event in /machine/qemu-1-instance-0000002a.libvirt-qemu for pid 982717
2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1ValidatePlacement:423 : Detected mount/mapping 9:name=systemd at /sys/fs/cgroup/systemd in /system.slice/run-r22a2e1da04734a7aa857f08f8b7c4284.scope for pid 982717
2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1DetectControllers:493 : Auto-detecting controllers
2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1DetectControllers:503 : Controller 'cpu' present=yes
2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1DetectControllers:503 : Controller 'cpuacct' present=yes
2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1DetectControllers:503 : Controller 'cpuset' present=yes
2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1DetectControllers:503 : Controller 'memory' present=yes
2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1DetectControllers:503 : Controller 'devices' present=yes
2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1DetectControllers:503 : Controller 'freezer' present=yes
2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1DetectControllers:503 : Controller 'blkio' present=yes
2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1DetectControllers:503 : Controller 'net_cls' present=yes
2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1DetectControllers:503 : Controller 'perf_event' present=yes
2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1DetectControllers:503 : Controller 'name=systemd' present=yes
2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupV1ValidateMachineGroup:161 : Name 'run-r22a2e1da04734a7aa857f08f8b7c4284.scope' for controller 'cpu' does not match 'instance-0000002a', 'qemu-1-instance-0000002a', 'instance-0000002a.libvirt-qemu', 'machine-qemu\x2dinstance\x2d0000002a.scope' or 'machine-qemu\x2d1\x2dinstance\x2d0000002a.scope'
2023-08-16 09:07:19.042+0000: 986726: debug : virCgroupNewDetectMachine:1203 : Failed to validate machine name for 'instance-0000002a' driver 'qemu'


Now, do you notice the change? SOME controllers are now in unexpected /sys/fs/cgroup/$controller/system.slice/run-r22a2e1da04734a7aa857f08f8b7c4284.scope which then fails libvirt validation, which then leads libvirt think CGroups were mangled with behind its back and thus it refuses to work with such controller(s). And unfortunately, the cpuacct controller is one of those affected. I have no idea who mangled the placement and what is going on, but this is NOT a libvirt bug. If something is messing up CGroup placement, well it needs to stop doing that.

Comment 13 Alex Stupnikov 2023-08-16 12:07:33 UTC
Thank you very much Michal!

Comment 14 Alex Stupnikov 2023-08-16 14:47:51 UTC
After discussion with Kashyap we have decided to ask Michal why Delegate flag changes systemd/cgroup behavior. I have decided to take a second look at the lab to confirm that there are no possible disparities. It turns out that there are: as mentioned in comment #10, it takes two libvirt restarts using "systemctl restart tripleo_nova_libvirt" command to reproduce the issue for new VM, but originally I believed that only one is needed. So I got false positive outcome for workaround testing: after single restart VM's CPU stats were still printed, while in reality behavior is still the same.

As a result, we need to figure out why some controllers end up in unexpected /sys/fs/cgroup/$controller/system.slice/run-*.scope after second restart. I will try to ask support experts to help us understand this, but engineering engagement is possibly needed here.

Comment 24 Alex Stupnikov 2023-09-26 12:37:50 UTC
Created attachment 1990585 [details]
dbus-monitor output + reproducer commands with timestamps

Adding requested "dbus-monitor --system" output + commands I have used to reproduce with timestamps.

Comment 32 Bogdan Dobrelya 2023-09-26 15:19:31 UTC
*** Bug 2233089 has been marked as a duplicate of this bug. ***

Comment 101 errata-xmlrpc 2024-03-26 12:25:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.2.6 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:1519


Note You need to log in before you can comment on or make changes to this bug.