Bug 2026263 - getStats should report if the data is real or initial
Summary: getStats should report if the data is real or initial
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: ---
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ovirt-4.5.0
: 4.50.0.4
Assignee: Milan Zamazal
QA Contact: Polina
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-11-24 08:10 UTC by Yedidyah Bar David
Modified: 2022-04-20 06:33 UTC (History)
3 users (show)

Fixed In Version: vdsm-4.50.0.4
Clone Of:
Environment:
Last Closed: 2022-04-20 06:33:59 UTC
oVirt Team: Virt
Embargoed:
pm-rhel: ovirt-4.5?


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-44072 0 None None None 2021-11-24 08:13:36 UTC
oVirt gerrit 117869 0 master MERGED virt: Indicate whether VM CPU stats are real or initial 2021-12-10 10:24:56 UTC

Description Yedidyah Bar David 2021-11-24 08:10:24 UTC
Description of problem:

The data returned by getStats is first set to some initial values, and once libvirt reports real data it's updated, starting from the next report.

Right now, there is no simple way to know if the returned data is real or initial.

This caused bug 1993957 in ovirt-hosted-engine-ha, which relied on the cpu usage data to be accurate. For fixing it, we relied on what seems like a bug in vdsm, which is that cpuUsage is initialized to '0.00' whereas its real values from libvirt are always integer. If vdsm fixes/changes this, the fix for bug 1993957 will not work anymore.

Version-Release number of selected component (if applicable):
Current 4.4

How reproducible:
Not sure, perhaps always

Steps to Reproduce:
1. Start a VM
2. Call VDSM's client.connect(host='localhost').VM.getStats on the VM, in a loop
3.

Actual results:
cpuUser, cpuSys, cpuUsage are 0 (or 0.00) in the beginning, and after some time they represent the actual usage. If this usage is still 0, because the VM is idle, there is no way to differentiate between the states.

Expected results:
getStats returns, perhaps per item, or a group of items (based on functionality, or source (because some items are from sources other than libvirt)), whether they are initial values or actual ones.

Additional info:
In principle getStats can decide to not return these items at all if it does not know them. This is already what it does with other items. Perhaps this will break some clients, so perhaps not an option - didn't check. HE-HA is already prepared for this, so from its POV that's the best option.

When this bug is fixed, please verify also that HE-HA continues to work reliably, and open a bug on it otherwise. I am not opening one right now, because it's not clear it will be needed - this depends on the actual design/fix for current bug.

In certain tests, it took up to around 90 seconds to get actual cpu usage on a VM after starting it.

Comment 1 Arik 2021-11-29 12:27:15 UTC
Summarizing offline discussion on this - it would be best to remove the fields altogether from the stats when we don't receive them from libvirt
But before doing that we should check MOM and ovirt-engine

Comment 2 Milan Zamazal 2021-11-29 21:59:38 UTC
As for MoM, these values aren't used anywhere so nothing bad should happen if they are not available. And MoM doesn't log any error if they are not provided.

A bit of confusion arises from the fact that the presence of at least cpuUsage was ensured with a specific reference to MoM. But a little more gerrit archeology reveals the reason: "But what happens is that getAllVmStats crashes [when cpuUsage is unavailable] and MOM gets exception instead of a missing value." So it should be all right with respect to MoM as long as everything works correctly on the Vdsm side.

Comment 3 Liran Rotenberg 2021-11-30 12:32:52 UTC
The engine usage is mostly for representing the data in API.
The only one differs is the cpuUsage which in use for other flows(HA, policies, scheduling). But we do check if it's null first and have another calculation when it's unavailable.
I think we can try to omit those and see if we encounter any problem. Initially the engine set it with 0 in the DB.
From comment #0 is sounds that HE-HA uses VDSM directly. So I think it should be fine to all parties.

Comment 4 Arik 2021-12-09 22:24:11 UTC
No need to verify this, it's done for the hosted engine ha-agent

Comment 5 Polina 2022-03-20 12:57:56 UTC
Verified on ovirt-engine-4.5.0.1-601.f26e9ea8cac5.3.el8ev.noarch, vdsm-4.50.0.10-1.el8ev.x86_64

start VM (tested for regular and HE VM) and run the following command:

[root@ocelot06 ~]# for i in {1..20}; do vdsm-client VM getStats vmID="64465398-6d7c-475d-8a70-729e2440fc92" |grep cpu;sleep 0.5; done
        "cpuActual": false,
        "cpuSys": "0.00",
        "cpuUsage": "0.00",
        "cpuUser": "0.00",
        "vcpuPeriod": 100000,
        "vcpuQuota": "-1",
        ...
        ...
        ...
        "cpuActual": true,
        "cpuSys": "0.00",
        "cpuUsage": "480000000",
        "cpuUser": "0.23",
        "vcpuCount": "1",
        "vcpuPeriod": 100000,
        "vcpuQuota": "-1",

Comment 6 Sandro Bonazzola 2022-04-20 06:33:59 UTC
This bugzilla is included in oVirt 4.5.0 release, published on April 20th 2022.

Since the problem described in this bug report should be resolved in oVirt 4.5.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.