Description of problem: Metric kubevirt_vmi_memory_used_total_bytes seems to report total vm memory and not the used memory. Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. Check vm memory usage from within the vm and compare to the metric in Prometheus 2. Load the vm, check vm memory usage from within the vm and compare to the metric in Prometheus 3. Actual results: kubevirt_vmi_memory_used_total_bytes returns a static result that seems like the total memory allocated to the vm. Expected results: kubevirt_vmi_memory_used_total_bytes should report the used memory. Memory from within the vms with tools like top/sar/mpstat should provide similar results to the kubevirt_vmi_memory_used_total_bytes metric Additional info:
I checked all details about memory metrics and here are my findings. First of all, "kubevirt_vmi_memory_used_total_bytes" refers the amount of memory allocated to domain in domain xml file for libvirt. That is why it is a constant value for a VMI and it is not supposed to change in time. The description in kubevirt repo is "The amount of memory in bytes used by the domain." [1] and it seems correct. Secondly, we have some other VMI metrics regarding memory. All of them are provided by libvirt with virDomainMemoryStatTags struct [2]. I checked their descriptions. They are similar to descriptions in libvirt's doc. The ones related to swap and page faults are clear&dynamic. These are a little ambiguous. - kubevirt_vmi_memory_available_bytes -> DOMAIN_MEMORY_STAT_AVAILABLE in virDomainMemoryStatTags -> "total" memory inside virtual machine. "total" column in output of "free" command -> Static value since balloning is not active and total memory doesn't change in time -> Metric description: "amount of `usable` memory as seen by the domain." -> Description in libvirt doc: The total amount of usable memory as seen by the domain. This value may be less than the amount of memory assigned to the domain if a balloon driver is in use or if the guest OS does not initialize all assigned pages. This value is expressed in kB. - kubevirt_vmi_memory_unused_bytes -> DOMAIN_MEMORY_STAT_UNUSED in virDomainMemoryStatTags -> unused memory inside virtual machine. "free" column in output of "free" command -> Metric description: "amount of `unused` memory as seen by the domain." -> Description in libvirt doc: "The amount of memory left completely unused by the system. Memory that is available but used for reclaimable caches should NOT be reported as free. This value is expressed in kB." - kubevirt_vmi_memory_usable_bytes -> DOMAIN_MEMORY_STAT_USABLE in virDomainMemoryStatTags -> available memory inside virtual machine. "available" column in output of "free" command -> Metric description: "The amount of memory which can be reclaimed by balloon without causing host swapping in bytes." -> Description in libvirt doc: "How much the balloon can be inflated without pushing the guest system to swap, corresponds to 'Available' in /proc/meminfo" Shortly: kubevirt_vmi_memory_used_total_bytes -> total memory used by whole domain kubevirt_vmi_memory_available_bytes -> total memory the vm really has kubevirt_vmi_memory_unused_bytes -> free/unused memory -> not important so much kubevirt_vmi_memory_usable_bytes -> available memory -> important since it is the amount of memory that new applications can use. percentage of memory usage = (kubevirt_vmi_memory_available_bytes-kubevirt_vmi_memory_usable_bytes)/kubevirt_vmi_memory_available_bytes percentage of remaining memory = kubevirt_vmi_memory_usable_bytes/kubevirt_vmi_memory_available_bytes @shirly what do you think? Which points should we improve? [1] https://github.com/kubevirt/kubevirt/blob/debf29e7bf0df011248b3662c46bfa55cf3f6750/pkg/monitoring/domainstats/prometheus/prometheus.go#L155 [2] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainMemoryStatTags
I don't think the amount of memory allocated to the domain in domain xml file for libvirt is an interesting metric. It was not what I intended. I think we should consider replacing this metric with a recording rule with the same name that will calculate the memory the VM uses. Expr: kubevirt_vmi_memory_available_bytes-kubevirt_vmi_memory_usable_bytes
@rmohr Are you ok with Shirly's proposal?
I guess all three value are interesting based on what you are looking for, but probably used and free memory are the most pressing ones to answer these questions: * do VMs potentially run out of memory? * which VMs are consuming all my node memory? * for custom horizontal autoscaling where one could potentially use low/high cpu and memory usage to scale a set of VMs Getting the total amount of memory may be interesting for the following: * trying to understand wird node characteristics which may be influenced indirectly by the size of the scheduled VMs (sounds plausible, but just made that up, not sure if this would ever be relevant) * possibly vertical autoscaling and custom autoscale logic based on metrics Do you have other use-cases? Since we don't have ballooning active right now I would expect that `kubevirt_vmi_memory_available_bytes` is kind of static too because of this I guess we can still cover all the cases. But what if we start using ballooning, do I then loose information? If we don't loose anything if ballooning is enabled, I think we are good. Otherwise I would consider keeping it.
kubevirt_vmi_memory_used_total_bytes -> the amount of memory stated in domain xml file for libvirt -> it contains overhead of virtualization kubevirt_vmi_memory_available_bytes -> the amount of memory the VM really has. it is what you see in the VM source of kubevirt_vmi_memory_used_total_bytes is not virDomainMemoryStatTags[1]. It is virDomainInfo[2]. 1. https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainMemoryStatTags 2. https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainInfo Also, we have another metric `kubevirt_vmi_memory_actual_balloon_bytes` which gives you current balloon value in KB. When we enable ballooning, kubevirt_vmi_memory_actual_balloon_bytes will be non-zero value and kubevirt_vmi_memory_available_bytes will change in time. However, I guess kubevirt_vmi_memory_used_total_bytes will not be dynamic. Me+Shirly's new proposal: - Renaming `kubevirt_vmi_memory_used_total_bytes` as `kubevirt_vmi_memory_domain_total_bytes` and keeping it as it is. It will be always static and give you the amount of memory which is consumed by the whole domain. - Introducing a new recording rule `kubevirt_vmi_memory_used_bytes=kubevirt_vmi_memory_available_bytes - kubevirt_vmi_memory_usable_bytes`. It will be affected by ballooning but it is expected behavior. - After these two changes, we will not loose any data we had previously. @rmohr What do you think?
(In reply to Erkan Erol from comment #5) > kubevirt_vmi_memory_used_total_bytes -> the amount of memory stated in > domain xml file for libvirt -> it contains overhead of virtualization > kubevirt_vmi_memory_available_bytes -> the amount of memory the VM really > has. it is what you see in the VM > > source of kubevirt_vmi_memory_used_total_bytes is not > virDomainMemoryStatTags[1]. It is virDomainInfo[2]. > > > 1. > https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainMemoryStatTags > 2. https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainInfo > > > Also, we have another metric `kubevirt_vmi_memory_actual_balloon_bytes` > which gives you current balloon value in KB. > > > When we enable ballooning, kubevirt_vmi_memory_actual_balloon_bytes will be > non-zero value and kubevirt_vmi_memory_available_bytes will change in time. > However, I guess kubevirt_vmi_memory_used_total_bytes will not be dynamic. > > > Me+Shirly's new proposal: > - Renaming `kubevirt_vmi_memory_used_total_bytes` as > `kubevirt_vmi_memory_domain_total_bytes` and keeping it as it is. It will be > always static and give you the amount of memory which is consumed by the > whole domain. The rename sounds reasonable. Apart from the possibility that some people may already use it, no objections. > - Introducing a new recording rule > `kubevirt_vmi_memory_used_bytes=kubevirt_vmi_memory_available_bytes - > kubevirt_vmi_memory_usable_bytes`. It will be affected by ballooning but it > is expected behavior. > - After these two changes, we will not loose any data we had previously. > If we need it, sounds great to add it. > @rmohr What do you think? I am convinced now that no collected information gets lost. No objections from my side.
PR: https://github.com/kubevirt/kubevirt/pull/6973
kubevirt_vmi_memory_used_total_bytes refers the amount of memory declared in libvirt domain xml file. It is misleading. We decided to rename it as "kubevirt_vmi_memory_domain_total_bytes". We also think it is valuable to have a metric which gives the amount of memory used in the VM. We define a new metric "kubevirt_vmi_memory_used_bytes" for this purpose. It is computed as "kubevirt_vmi_memory_available_bytes-kubevirt_vmi_memory_usable_bytes"
Note for QE: Please verify that you can see: 1. kubevirt_vmi_memory_domain_total_bytes 2. kubevirt_vmi_memory_used_bytes
Metrics: "kubevirt_vmi_memory_used_bytes" is verified successfully with the help from Shirly. Metric: "kubevirt_vmi_memory_used_bytes" is in progress.
*Typo: "kubevirt_vmi_memory_domain_total_bytes" instead of "kubevirt_vmi_memory_used_bytes"
I am able to see the new metrics with correct values. Verifying it.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.10.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0947