Description of problem: sometimes in v.4.4.3 the cpu.current.guest statistics values returned by GET https://{{host}}/ovirt-engine/api/vms/{{vm_id}}/statistics are negative Version-Release number of selected component (if applicable): ovirt-engine-4.4.3.2-0.19.el8ev.noarch vdsm-http-4.40.28-1.el8ev.noarch python3-libvirt-6.6.0-1.module+el8.3.0+7572+bcbf6b90.x86_64 ovirt-engine-4.4.3.1-0.7.el8ev.noarch qemu-kvm-5.1.0-4.module+el8.3.0+7846+ae9b566f.x86_64 libvirt-6.6.0-4.module+el8.3.0+7883+3d717aa8.x86_64 How reproducible:sometimes Steps to Reproduce: 1. Configure VM created on the base of the last infra template pin to host and cpu topology 0#0 (Host Resources tab): pin to host and cpu topology 0#0 (Host Resources tab) 2.Load CPU of the VM (could be done with while loop) 3.Send GET https://{{host}}/ovirt-engine/api/vms/{{vm_id}}/statistics Actual results: negative value for cpu.current.guest <statistic href="/ovirt-engine/api/vms/1d7c99b8-c636-4780-92ec-e3c56132de75/statistics/ef802239-b74a-329f-9955-be8fea6b50a4" id="ef802239-b74a-329f-9955-be8fea6b50a4"> <name>cpu.current.guest</name> <description>CPU used by guest</description> <kind>gauge</kind> <type>decimal</type> <unit>percent</unit> <values> <value> <datum>-0.010</datum> </value> </values> <vm href="/ovirt-engine/api/vms/1d7c99b8-c636-4780-92ec-e3c56132de75" id="1d7c99b8-c636-4780-92ec-e3c56132de75"/> the whole response http://pastebin.test.redhat.com/901923 Expected results: Additional info:
We saw it possible in our VDSM code, depends on the ratio. The values are monotonic increasing. An example given by Arik: Sample1: cpu.time = 1000 cpu.user = 200 cpu.system=700 so cpu.guest = 100 Sample2: cpu.time = 2000 cpu.user =300 cpu.sys = 1650 so cpu.guest=50 In VDSM calculation: cpuUsage = last sys + last user = 1950. cpu_sys = (last user - first user) + (last sys - first sys) = 1050. cpuUser = last time - first time - cpu_sys = -50 It's not new, the question rising: Do we tell that it may be negative (although it doesn't make sense) and should treat as 0 or to return 0 ourselves.
(In reply to Liran Rotenberg from comment #1) > We saw it possible in our VDSM code, depends on the ratio. The values are > monotonic increasing. > An example given by Arik: > Sample1: > cpu.time = 1000 > cpu.user = 200 > cpu.system=700 > so cpu.guest = 100 > > Sample2: > cpu.time = 2000 > cpu.user =300 > cpu.sys = 1650 > so cpu.guest=50 I fail to understand what cpu.user and cpu.system values mean. libvirt documentation is unclear how the values are related to the VM. > In VDSM calculation: > cpuUsage = last sys + last user = 1950. > cpu_sys = (last user - first user) + (last sys - first sys) = 1050. > cpuUser = last time - first time - cpu_sys = -50 The fact that cpuUser is negative for a loaded VM and cpuSys is 100 makes me to suspect that the computation in Vdsm is wrong. I can't reproduce the problem though and I get high cpuUser values and low cpuSys values as expected (with an older libvirt version). > It's not new, the question rising: > Do we tell that it may be negative (although it doesn't make sense) and > should treat as 0 or to return 0 ourselves. A negative value makes no sense but before trying to fix it, we should understand the exact meaning of the values and under which circumstances the problem can be reproduced.
(In reply to Milan Zamazal from comment #2) > The fact that cpuUser is negative for a loaded VM and cpuSys is 100 makes me > to suspect that the computation in Vdsm is wrong. I can't reproduce the > problem though and I get high cpuUser values and low cpuSys values as > expected (with an older libvirt version). After upgrading libvirt and QEMU, I can reproduce the error. So I suspect a platform regression.
(In reply to Milan Zamazal from comment #3) > After upgrading libvirt and QEMU, I can reproduce the error. So I suspect a > platform regression. Indeed *** This bug has been marked as a duplicate of bug 1876937 ***