RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2148266 - Backport the qemuDomainGetStatsCpu fallback Implementation
Summary: Backport the qemuDomainGetStatsCpu fallback Implementation
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: libvirt
Version: 9.1
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Michal Privoznik
QA Contact: Luyao Huang
URL:
Whiteboard:
Depends On:
Blocks: 2157094 2157095
TreeView+ depends on / blocked
 
Reported: 2022-11-24 18:40 UTC by Aviv Litman
Modified: 2023-05-31 03:43 UTC (History)
15 users (show)

Fixed In Version: libvirt-9.0.0-3.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2157094 2157095 (view as bug list)
Environment:
Last Closed: 2023-05-09 07:27:43 UTC
Type: Bug
Target Upstream Version: 8.8.0
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker CNV-22934 0 None None None 2022-11-28 11:19:38 UTC
Red Hat Issue Tracker LIBVIRTAT-14023 0 None None None 2023-03-09 02:24:01 UTC
Red Hat Issue Tracker RHELPLAN-140417 0 None None None 2022-12-18 11:38:40 UTC
Red Hat Product Errata RHBA-2023:2171 0 None None None 2023-05-09 07:29:09 UTC

Description Aviv Litman 2022-11-24 18:40:34 UTC
Description of problem:
I'm working on adding cpu metrics to kubevirt: https://github.com/kubevirt/kubevirt/pull/8774, and we think there is a bug on libvirt since we can't see all cpu metrics when running `virsh domstats 1, the missing metrics are:
cpu.time
cpu.user
cpu.system

and when running `virsh cpu-stats 1`:
virsh #  cpu-stats 1     
error: Failed to retrieve CPU statistics for domain 'default_vm-cirros'
error: Requested operation is not valid: cgroup CPUACCT controller is not mounted


Version-Release number of selected component (if applicable):
bash-5.1$ virsh version
Authorization not available. Check if polkit service is running or see debug message for more information.
Compiled against library: libvirt 8.7.0
Using library: libvirt 8.7.0
Using API: QEMU 8.7.0
Running hypervisor: QEMU 7.1.0

How reproducible:
100%

Steps to Reproduce:
1.create kubevirt cluster:
    -git clone https://github.com/kubevirt/kubevirt.git
    -cd kubevirt
    -make cluster-up
    -make cluster-sync
2. create a vm
    - k apply -f examples/vm-cirros.yaml (or other vm)
3. exec into the launcher
    - k get pods
    - k exec -it virt-launcher-vm-cirros-6kzfd /bin/bash
4. query libvirt metrics with virsh
    - virsh domstats 1
    - virsh cpu-stats 1

Actual results:
Stage 4 missing cpu metrics mentioned here: https://libvirt.org/html/libvirt-libvirt-domain.html#virConnectGetAllDomainStats:~:text=%22cpu.time%22%20%2D%20total%20cpu%20time%20spent%20for%20this%20domain%20in%20nanoseconds%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20as%20unsigned%20long%20long.%0A%20%20%22cpu.user%22%20%2D%20user%20cpu%20time%20spent%20in%20nanoseconds%20as%20unsigned%20long%20long.%0A%20%20%22cpu.system%22%20%2D%20system%20cpu%20time%20spent%20in%20nanoseconds%20as%20unsigned%20long%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20long.

and we get the error:
error: Failed to retrieve CPU statistics for domain 'default_vm-cirros'
error: Requested operation is not valid: cgroup CPUACCT controller is not mounted

Expected results:
Stage 4 will show all cpu metrics mentioned here: https://libvirt.org/html/libvirt-libvirt-domain.html#virConnectGetAllDomainStats:~:text=VIR_DOMAIN_STATS_CPU_TOTAL%3A%20Return%20CPU,cache%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20bank%20%3Cindex%3E

And CPUACCT will be mounted.

Additional info:
Luboslav Pivarc suggested that the issue is related to the fact there is cpu and cpuacct:
https://github.com/libvirt/libvirt/blob/master/src/qemu/qemu_driver.c#L16266
https://github.com/libvirt/libvirt/blob/master/src/util/vircgroupv1.c#L292

Not sure how to add the xml of the VM, and libvirt debug logs, can you share the steps?

Comment 2 Luyao Huang 2022-11-25 08:46:47 UTC
Hi Aviv,

I think maybe the root cause is the wrong configuration in qemu.conf

I checked the qemu.conf file in the launcher container:

bash-5.1$ cat /etc/libvirt/qemu.conf 
stdio_handler = "logd"
vnc_listen = "0.0.0.0"
vnc_tls = 0
vnc_sasl = 0
user = "qemu"
group = "qemu"
dynamic_ownership = 1
remember_owner = 0
namespaces = [ ]
cgroup_controllers = [ ]

The cgroup_controllers is empty that means all cgroup controllers won't be used on QEMU guests.

To make command domstats (cpu.time cpu.user cpu.system) and cpu-stats working, you need set "cpu", "cpuacct" in cgroup_controllers.

BTW, I put the document of cgroup_controllers here for your reference:

# What cgroup controllers to make use of with QEMU guests
#
#  - 'cpu' - use for scheduler tunables
#  - 'devices' - use for device access control
#  - 'memory' - use for memory tunables
#  - 'blkio' - use for block devices I/O tunables
#  - 'cpuset' - use for CPUs and memory nodes
#  - 'cpuacct' - use for CPUs statistics.
#
# NB, even if configured here, they won't be used unless
# the administrator has mounted cgroups, e.g.:
#
#  mkdir /dev/cgroup
#  mount -t cgroup -o devices,cpu,memory,blkio,cpuset none /dev/cgroup
#
# They can be mounted anywhere, and different controllers
# can be mounted in different locations. libvirt will detect
# where they are located.
#
#cgroup_controllers = [ "cpu", "devices", "memory", "blkio", "cpuset", "cpuacct" ]

Comment 3 Peter Krempa 2022-11-25 09:01:25 UTC
The cpu.time/cpu.user/cpu.system fields as reproted by 'virsh domstats' are fetched from cgroups so disabling cgroups will indeed make the statistics unavailable.

Comment 4 Aviv Litman 2022-11-27 20:25:13 UTC
Hi Luyao, and Peter!
Thanks for the quick and detailed response, your help is greatly appreciated!

I cannot edit the /etc/libvirt/qemu.conf file, it is read only.
Is there a way to edit the file? Or can `cgroup_controllers` be defined in the yaml file of the VM? 

Additional info:
Maybe related to this bug https://bugzilla.redhat.com/show_bug.cgi?id=1985670?

Comment 5 Luyao Huang 2022-11-28 08:18:47 UTC
Hi Aviv,

I am not familiar with Kubevirt and I think it's better to ask a kubevirt expert. And I guess because that file's permission was changed to 0555 when build the container image. Maybe you can change the file cmd/virt-launcher/qemu.conf (https://github.com/kubevirt/kubevirt/blob/main/cmd/virt-launcher/qemu.conf) to override the cgroup_controllers' value and rebuild related container image.

And I noticed that all the cgroup dirs are mounted as RO, that also break libvirt cgroup functions since libvirt required write permission for creating file and dir/writing pids. I think that is another complex issue need be fixed.

Comment 6 Aviv Litman 2022-11-28 10:49:09 UTC
Thanks Luyao,
I'm adding Luboslav from kubevirt @lpivarc.

Luboslav can you please take a look on the discussion here, and let me know if I can move this bug to kubevirt?

Comment 7 Shirly Radco 2022-11-28 11:15:32 UTC
@שהן

Comment 8 Shirly Radco 2022-11-28 11:18:57 UTC
(In reply to Shirly Radco from comment #7)
> @שהן

Sorry for that.

@alitman I'm moving this to the CNV virt team for updating the qemu.conf file in the launcher container.

Comment 9 lpivarc 2022-11-28 11:45:47 UTC
Hi Luyao, Peter,

Is the configuration really necessary? We run in a container and there Libvirt doesn't have the necessary permissions. Because we run in a container we are constrained by cgroups and the cpuacct controller is available.

Do you suggest that libvirt detects the availability purely on the configuration? From what I could see Libvirt should be querying /proc/mounts (where it should find out that cpuacct is enabled).

Comment 10 Peter Krempa 2022-11-28 12:36:05 UTC
cgroup controllers are auto-detected, based on the contents of 'cgroup.controllers' file in the auto-detected mount point of the 'cgroup' filesystem. That is unless you explicitly disable them in the config file as is the case above. In such case we honour the configuration and don't enable any controllers which was not configured.

Note that with cgroupsv2 the cpuacct controller doesn't exist any more but libvirt preserves the logic as the cgroupv1 driver requires it. Thus enabling the cpuacct controller in the config should be sufficient to make the feature work if cgroups are available inside the container.

I don't know though why kubevirt decided to explicitly disable cgroups, but it might have been a deliberate decision so I will not comment on whether simply removing the configuration is a good idea in your case.

Comment 11 lpivarc 2022-11-28 15:24:17 UTC
Hi Peter,
I just forgot to mention we are running in session mode. Does your advice still hold? Note: Libvirt should be able to query cgroup fs for the stats. The problem is it refuses to do so. Meanwhile, I will check your recommendation.

Comment 12 Peter Krempa 2022-11-28 16:18:07 UTC
(In reply to lpivarc from comment #11)
> Hi Peter,
> I just forgot to mention we are running in session mode. Does your advice
> still hold? Note: Libvirt should be able to query cgroup fs for the stats.
> The problem is it refuses to do so. Meanwhile, I will check your
> recommendation.

Stats work for session-mode VMs too as long as you don't explicitly disable the required cgroup controllers:

~ $ virsh -c qemu:///session domstats --domain cd --cpu-total --vcpu
Domain: 'cd'
  cpu.time=880000000
  cpu.user=480000000
  cpu.system=400000000
  cpu.cache.monitor.count=0
  cpu.haltpoll.success.time=0
  cpu.haltpoll.fail.time=0
  vcpu.current=1
  vcpu.maximum=8
  vcpu.0.state=1
  vcpu.0.time=280000000
  vcpu.0.wait=0
 [snipped]

Comment 14 lpivarc 2022-11-30 15:51:43 UTC
```
bash-5.1$ cat /etc/libvirt/qemu.conf 
stdio_handler = "logd"
vnc_listen = "0.0.0.0"
vnc_tls = 0
vnc_sasl = 0
user = "qemu"
group = "qemu"
dynamic_ownership = 1
remember_owner = 0
namespaces = [ ]
cgroup_controllers = ["cpu", "cpuacct"]
```

```
bash-5.1$ virsh cpu-stats 1
Authorization not available. Check if polkit service is running or see debug message for more information.
error: Failed to retrieve CPU statistics for domain 'default_vmi-ephemeral'
error: Requested operation is not valid: cgroup CPUACCT controller is not mounted
```

```
bash-5.1$ virsh -c qemu:///session domstats --domain default_vmi-ephemeral --cpu-total --vcpu
Authorization not available. Check if polkit service is running or see debug message for more information.
Domain: 'default_vmi-ephemeral'
  cpu.cache.monitor.count=0
  cpu.haltpoll.success.time=1204288
  cpu.haltpoll.fail.time=2247141
  vcpu.current=1
  vcpu.maximum=1
  vcpu.0.state=1
  vcpu.0.time=7770000000
  vcpu.0.wait=0
  vcpu.0.delay=45179773
```

Comment 15 lpivarc 2022-11-30 15:53:04 UTC
I tried the suggested configuration but the problem is still there. Anything else I could miss?

Comment 16 Peter Krempa 2022-11-30 16:09:59 UTC
So as I thought, you don't have the proper cgroups present in the container.

But in fact that is no longer a problem. As of libvirt-8.8. (Note you are declaring to use libvirt-8.7 so you'll have to upgrade) Michal Privoznik implemented also a fallback mechanism:

commit 044b8744d65f8571038f85685b3c4b241162977b
Author: Michal Prívozník <mprivozn>
Date:   Tue Aug 9 16:16:09 2022 +0200

    qemu: Implement qemuDomainGetStatsCpu fallback for qemu:///session
    
    For domains started under session URI, we don't set up CGroups
    (well, how could we since we're not running as root anyways).
    Nevertheless, fetching CPU statistics exits early because of
    lacking cpuacct controller. But with recent extension to
    virProcessGetStatInfo() we can get the values we need from the
    proc filesystem. Implement the fallback for the session URI as
    some of virt tools rely on cpu.* stats to be reported (virt-top,
    virt-manager).
    
    Resolves: https://gitlab.com/libvirt/libvirt/-/issues/353
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1693707
    Signed-off-by: Michal Privoznik <mprivozn>
    Reviewed-by: Ján Tomko <jtomko>

It obviously worked for me as I was using the fixed version and forgot that this thing was so recent.

Please update to libvirt-8.8 to fix the issue.

Comment 17 lpivarc 2022-11-30 16:21:12 UTC
I am quite not sure that we would not have the cgroups.

See:

bash-5.1$ cat /proc/mounts 
***hide***
cgroup /sys/fs/cgroup/pids cgroup ro,seclabel,nosuid,nodev,noexec,relatime,pids 0 0
cgroup /sys/fs/cgroup/cpu,cpuacct cgroup ro,seclabel,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0
cgroup /sys/fs/cgroup/devices cgroup ro,seclabel,nosuid,nodev,noexec,relatime,devices 0 0
***hide***

Also, I can do the following which I assume libvirt is doing when one does "virsh cpu-stats 1"

bash-5.1$ cat /sys/fs/cgroup/cpu,cpuacct/cpuacct.usage
19987827479

Comment 18 Peter Krempa 2022-12-02 12:22:41 UTC
Cgroups is a bit of a red herring in this BZ as the original report mentioned it.

For session mode you should be fine with the fallback code that Michal implemented. Update to libvirt-8.8 to address the issue for session mode.

Comment 19 lpivarc 2022-12-02 13:06:08 UTC
Hey Aviv and Shirly,
updating libvirt to 8.8 would mean this gets to RHEL 9.2 (my best guess is Openshift 4.14/15). Would this be sufficient? Otherwise, you need to ask for backport.

Peter,
This is a more general problem for us. While we are going away from system mode I can tell that this is a problem also there. Running libvirt in a container means you have ro cgroup filesystem and so you don't get to modify it from within the container. (understandably...) Therefore even with system mode, we fail the same way as with session mode. 

cc Stu, Kedar for awareness.

Comment 22 Aviv Litman 2022-12-13 14:27:09 UTC
Hi libvirt team,
Can you point me to the RPM with the fix?

Thanks!

Comment 37 Luyao Huang 2023-01-12 06:32:46 UTC
Verify this bug on libvirt-8.10.0-2.el9.x86_64 with the same steps in bug 2157094 comment 5.

Comment 41 Luyao Huang 2023-02-07 09:49:05 UTC
Verify this bug with libvirt-9.0.0-3.el9.x86_64:

1. switch to a unprivileged user:
# su - test

2. start a guest
$ virsh start vm1
Domain 'vm1' started

3. use domstats get vm cpu.time, cpu.user and cpu.system and compare the value in /proc/pid/stat:

$ virsh domstats vm1; cat /proc/`pidof qemu-kvm`/stat
$ virsh domstats vm1; cat /proc/`pidof qemu-kvm`/stat
Domain: 'vm1'
  state.state=1
  state.reason=1
  cpu.time=470000000
  cpu.user=260000000
  cpu.system=210000000
...

33940 (qemu-kvm) S 1 33939 33939 0 -1 138428544 11821 0 0 0 26 21 0 0 20 0 6 0 2841590 2672160768 23974 18446744073709551615 94348725125120 94348732038125 140732301096400 0 0 0 268444224 4096 16963 0 0 0 17 19 0 0 0 10 0 94348734432720 94348739158896 94348768882688 140732301104066 140732301106959 140732301106959 140732301107170 0

4. use cpu-stats --total get vm cpu.time, cpu.user and cpu.system and compare the value in /proc/pid/stat:
$ virsh cpu-stats vm1 --total; cat /proc/`pidof qemu-kvm`/stat
Total:
	cpu_time             0.530000000 seconds
	user_time            0.290000000 seconds
	system_time          0.240000000 seconds

33940 (qemu-kvm) S 1 33939 33939 0 -1 138428544 11821 0 0 0 29 24 0 0 20 0 6 0 2841590 2672160768 24405 18446744073709551615 94348725125120 94348732038125 140732301096400 0 0 0 268444224 4096 16963 0 0 0 17 19 0 0 0 11 0 94348734432720 94348739158896 94348768882688 140732301104066 140732301106959 140732301106959 140732301107170 0

Comment 43 errata-xmlrpc 2023-05-09 07:27:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (libvirt bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2171


Note You need to log in before you can comment on or make changes to this bug.