Bug 2018925 - Metric kubevirt_vmi_memory_used_total_bytes is not reporting correct value
Summary: Metric kubevirt_vmi_memory_used_total_bytes is not reporting correct value
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Metrics
Version: 4.9.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.10.0
Assignee: Shirly Radco
QA Contact: Satyajit Bulage
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-11-01 09:34 UTC by Shirly Radco
Modified: 2022-03-16 15:56 UTC (History)
4 users (show)

Fixed In Version: v4.10.0-604
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-16 15:56:33 UTC
Target Upstream Version:
Embargoed:
eerol: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github https://github.com/kubevirt kubevirt pull 6973 0 None None None 2022-01-07 08:42:56 UTC
Github kubevirt kubevirt pull 6973 0 None open Fix misleading domain memory metrics 2021-12-23 17:05:55 UTC
Github kubevirt kubevirt pull 7107 0 None open [release-0.49] Fix misleading domain memory metrics 2022-01-19 10:02:30 UTC
Red Hat Product Errata RHSA-2022:0947 0 None None None 2022-03-16 15:56:49 UTC

Description Shirly Radco 2021-11-01 09:34:10 UTC
Description of problem:
Metric kubevirt_vmi_memory_used_total_bytes seems to report total vm memory and not the used memory.



Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. Check vm memory usage from within the vm and compare to the metric in Prometheus
2. Load the vm, check vm memory usage from within the vm and compare to the metric in Prometheus
3.

Actual results:
kubevirt_vmi_memory_used_total_bytes returns a static result that seems like the total memory allocated to the vm.

Expected results:
kubevirt_vmi_memory_used_total_bytes should report the used memory.
Memory from within the vms with tools like top/sar/mpstat should provide similar results to the kubevirt_vmi_memory_used_total_bytes metric

Additional info:

Comment 1 Erkan Erol 2021-11-24 10:07:30 UTC
I checked all details about memory metrics and here are my findings.

First of all, "kubevirt_vmi_memory_used_total_bytes" refers the amount of memory allocated to domain in domain xml file for libvirt. That is why it is a constant value for a VMI and it is not supposed to change in time.  The description in kubevirt repo is "The amount of memory in bytes used by the domain." [1] and it seems correct.


Secondly, we have some other VMI metrics regarding memory. All of them are provided by libvirt with virDomainMemoryStatTags struct [2]. I checked their descriptions. They are similar to descriptions in libvirt's doc. The ones related to swap and page faults are clear&dynamic. These are a little ambiguous.

- kubevirt_vmi_memory_available_bytes 
  -> DOMAIN_MEMORY_STAT_AVAILABLE in virDomainMemoryStatTags
  -> "total" memory inside virtual machine. "total" column in output of "free" command
  -> Static value since balloning is not active and total memory doesn't change in time
  -> Metric description: "amount of `usable` memory as seen by the domain."
  -> Description in libvirt doc: The total amount of usable memory as seen by the domain. This value may be less than the amount of memory assigned to the domain if a balloon driver is in use or if the guest OS does not initialize all assigned pages. This value is expressed in kB.


- kubevirt_vmi_memory_unused_bytes 
  -> DOMAIN_MEMORY_STAT_UNUSED  in virDomainMemoryStatTags
  -> unused memory inside virtual machine. "free" column in output of "free" command
  -> Metric description: "amount of `unused` memory as seen by the domain."
  -> Description in libvirt doc: "The amount of memory left completely unused by the system. Memory that is available but used for reclaimable caches should NOT be reported as free. This value is expressed in kB."

- kubevirt_vmi_memory_usable_bytes 
  -> DOMAIN_MEMORY_STAT_USABLE  in virDomainMemoryStatTags
  -> available memory inside virtual machine. "available" column in output of "free" command
  -> Metric description: "The amount of memory which can be reclaimed by balloon without causing host swapping in bytes."
  -> Description in libvirt doc: "How much the balloon can be inflated without pushing the guest system to swap, corresponds to 'Available' in /proc/meminfo"



Shortly:
kubevirt_vmi_memory_used_total_bytes -> total memory used by whole domain
kubevirt_vmi_memory_available_bytes -> total memory the vm really has
kubevirt_vmi_memory_unused_bytes -> free/unused memory -> not important so much
kubevirt_vmi_memory_usable_bytes -> available memory -> important since it is the amount of memory that new applications can use.

percentage of memory usage =  (kubevirt_vmi_memory_available_bytes-kubevirt_vmi_memory_usable_bytes)/kubevirt_vmi_memory_available_bytes
percentage of remaining memory = kubevirt_vmi_memory_usable_bytes/kubevirt_vmi_memory_available_bytes

@shirly what do you think? Which points should we improve? 


[1] https://github.com/kubevirt/kubevirt/blob/debf29e7bf0df011248b3662c46bfa55cf3f6750/pkg/monitoring/domainstats/prometheus/prometheus.go#L155
[2] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainMemoryStatTags

Comment 2 Shirly Radco 2021-12-01 10:55:01 UTC
I don't think the amount of memory allocated to the domain in domain xml file for libvirt is an interesting metric.
It was not what I intended.

I think we should consider replacing this metric
with a recording rule with the same name that will calculate the memory the VM uses.
Expr: kubevirt_vmi_memory_available_bytes-kubevirt_vmi_memory_usable_bytes

Comment 3 Erkan Erol 2021-12-01 11:20:14 UTC
@rmohr Are you ok with Shirly's proposal?

Comment 4 Roman Mohr 2021-12-13 09:48:24 UTC
I guess all three value are interesting based on what you are looking for, but probably used and free memory are the most pressing ones to answer these questions:

 * do VMs potentially run out of memory?
 * which VMs are consuming all my node memory?
 * for custom horizontal autoscaling where one could potentially use low/high cpu and memory usage to scale a set of VMs

Getting the total amount of memory may be interesting for the following:

 * trying to understand wird node characteristics which may be influenced indirectly by the size of the scheduled VMs (sounds plausible, but just made that up, not sure if this would ever be relevant)
 * possibly vertical autoscaling and custom autoscale logic based on metrics

Do you have other use-cases? 

Since we don't have ballooning active right now I would expect that `kubevirt_vmi_memory_available_bytes` is kind of static too because of this I guess we can still cover all the cases.
But what if we start using ballooning, do I then loose information?

If we don't loose anything if ballooning is enabled, I think we are good. Otherwise I would consider keeping it.

Comment 5 Erkan Erol 2021-12-14 08:57:35 UTC
kubevirt_vmi_memory_used_total_bytes -> the amount of memory stated in domain xml file for libvirt -> it contains overhead of virtualization
kubevirt_vmi_memory_available_bytes  -> the amount of memory the VM really has. it is what you see in the VM

source of kubevirt_vmi_memory_used_total_bytes is not virDomainMemoryStatTags[1]. It is virDomainInfo[2]. 


1. https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainMemoryStatTags
2. https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainInfo


Also, we have another metric `kubevirt_vmi_memory_actual_balloon_bytes` which gives you current balloon value in KB.


When we enable ballooning, kubevirt_vmi_memory_actual_balloon_bytes will be non-zero value and kubevirt_vmi_memory_available_bytes will change in time. However, I guess kubevirt_vmi_memory_used_total_bytes will not be dynamic. 


Me+Shirly's new proposal:
- Renaming `kubevirt_vmi_memory_used_total_bytes` as `kubevirt_vmi_memory_domain_total_bytes` and keeping it as it is. It will be always static and give you the amount of memory which is consumed by the whole domain.
- Introducing a new recording rule `kubevirt_vmi_memory_used_bytes=kubevirt_vmi_memory_available_bytes - kubevirt_vmi_memory_usable_bytes`. It will be affected by ballooning but it is expected behavior.
- After these two changes, we will not loose any data we had previously.

@rmohr What do you think?

Comment 6 Roman Mohr 2021-12-15 13:52:48 UTC
(In reply to Erkan Erol from comment #5)
> kubevirt_vmi_memory_used_total_bytes -> the amount of memory stated in
> domain xml file for libvirt -> it contains overhead of virtualization
> kubevirt_vmi_memory_available_bytes  -> the amount of memory the VM really
> has. it is what you see in the VM
> 
> source of kubevirt_vmi_memory_used_total_bytes is not
> virDomainMemoryStatTags[1]. It is virDomainInfo[2]. 
> 
> 
> 1.
> https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainMemoryStatTags
> 2. https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainInfo
> 
> 
> Also, we have another metric `kubevirt_vmi_memory_actual_balloon_bytes`
> which gives you current balloon value in KB.
> 
> 
> When we enable ballooning, kubevirt_vmi_memory_actual_balloon_bytes will be
> non-zero value and kubevirt_vmi_memory_available_bytes will change in time.
> However, I guess kubevirt_vmi_memory_used_total_bytes will not be dynamic. 
> 
> 
> Me+Shirly's new proposal:
> - Renaming `kubevirt_vmi_memory_used_total_bytes` as
> `kubevirt_vmi_memory_domain_total_bytes` and keeping it as it is. It will be
> always static and give you the amount of memory which is consumed by the
> whole domain.

The rename sounds reasonable. Apart from the possibility that some people may already use it, no objections.

> - Introducing a new recording rule
> `kubevirt_vmi_memory_used_bytes=kubevirt_vmi_memory_available_bytes -
> kubevirt_vmi_memory_usable_bytes`. It will be affected by ballooning but it
> is expected behavior.
> - After these two changes, we will not loose any data we had previously.
> 

If we need it, sounds great to add it.

> @rmohr What do you think?

I am convinced now that no collected information gets lost. No objections from my side.

Comment 7 Erkan Erol 2021-12-20 10:19:42 UTC
PR: https://github.com/kubevirt/kubevirt/pull/6973

Comment 9 Shirly Radco 2022-01-19 09:29:50 UTC
kubevirt_vmi_memory_used_total_bytes refers the amount of
memory declared in libvirt domain xml file. It is misleading.
We decided to rename it as "kubevirt_vmi_memory_domain_total_bytes".

We also think it is valuable to have a metric which gives
the amount of memory used in the VM. We define a new metric
"kubevirt_vmi_memory_used_bytes" for this purpose. It is computed as
"kubevirt_vmi_memory_available_bytes-kubevirt_vmi_memory_usable_bytes"

Comment 10 Shirly Radco 2022-01-19 09:31:02 UTC
Note for QE: Please verify that you can see:
1. kubevirt_vmi_memory_domain_total_bytes
2. kubevirt_vmi_memory_used_bytes

Comment 11 Satyajit Bulage 2022-02-09 08:34:10 UTC
Metrics: "kubevirt_vmi_memory_used_bytes" is verified successfully with the help from Shirly. 
Metric: "kubevirt_vmi_memory_used_bytes" is in progress.

Comment 12 Satyajit Bulage 2022-02-09 08:35:02 UTC
*Typo: "kubevirt_vmi_memory_domain_total_bytes" instead of "kubevirt_vmi_memory_used_bytes"

Comment 13 Satyajit Bulage 2022-02-09 10:39:02 UTC
I am able to see the new metrics with correct values.

Verifying it.

Comment 18 errata-xmlrpc 2022-03-16 15:56:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.10.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0947


Note You need to log in before you can comment on or make changes to this bug.