Bug 2091976 - "KubevirtVmHighMemoryUsage" alert does not return the namespace
Summary: "KubevirtVmHighMemoryUsage" alert does not return the namespace
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Metrics
Version: 4.11.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.11.4
Assignee: João Vilaça
QA Contact: SATHEESARAN
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-31 12:47 UTC by Shirly Radco
Modified: 2023-05-30 15:38 UTC (History)
4 users (show)

Fixed In Version: hco-bundle-registry-v4.11.4-9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-05-30 15:37:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt kubevirt pull 7836 0 None Merged Fix KubevirtVmHighMemoryUsage alert 2023-01-25 10:32:21 UTC
Github kubevirt kubevirt pull 7909 0 None Merged [release-0.53] Fix KubevirtVmHighMemoryUsage alert 2023-01-25 10:32:21 UTC
Github kubevirt kubevirt pull 8599 0 None Merged Change KubevirtVmHighMemoryUsage threshold from 20MB to 50MB 2023-01-25 10:32:22 UTC
Github kubevirt kubevirt pull 9073 0 None open Fix incorrect KubevirtVmHighMemoryUsage description 2023-02-06 13:43:35 UTC
Github kubevirt kubevirt pull 9074 0 None Merged [release-0.53] Change KubevirtVmHighMemoryUsage threshold from 20MB to 50MB 2023-01-25 10:32:23 UTC
Red Hat Issue Tracker CNV-18740 0 None None None 2022-10-27 11:50:15 UTC
Red Hat Product Errata RHEA-2023:3352 0 None None None 2023-05-30 15:38:08 UTC

Description Shirly Radco 2022-05-31 12:47:01 UTC
Description of problem:
The  "KubevirtVmHighMemoryUsage" does not return the namespace. If there is the a pod with the same name in different namespaces, the alert might fire when it shouldn't. 

Version-Release number of selected component (if applicable):
4.11.0

How reproducible:
100%

Steps to Reproduce:
1. Create vms with identical names in different namespaces. 
2. Trigger the alert for one of the VMs and verify it includes the correct namespace.
3.

Actual results:
Alert doesn't include a namespace

Expected results:
Alert includes the correct namespace for the pod.

Additional info:

Comment 1 SATHEESARAN 2022-10-14 10:35:43 UTC
Tested with CNV-4.11.1-20 with OCP 4.11.8 with the following steps:

1. Created 2 VMs, each in its own namespace but with the same name 'test-vm'
2. Used the tool, 'stress-ng' trying to reach to the state when the free-memory is less than 20 MB, so
that the alert - 'KubevirtVmHighVmUsage' will be triggered.
3. Following stress commands are attempted:

a. stress-ng --vm 1 --vm-bytes=99% -t 30m
b. stress-ng --vm 2 --vm-bytes=106% -t 30m
c. stress-ng --vm 1 --vm-bytes=2048M -t 30m
d. stress-ng --malloc 1 --malloc-bytes=90% -t 30m
e. stress-ng --malloc 2 --malloc-bytes=90% -t 30m

In all the above attempts, it was seen that free memory as seen inside the VM was going as low as ~40-45MB, and
automatically the stress process is killed, and the console connection is reset. Not sure, what is happening in 
this context. With this situation, unable to hit the alert - 'KubevirtVmHighMemoryUsage'. This is blocking the
verification of this bug.

In the web console, 'Observe' -> 'Alerting' -> 'Alerting rules' -> searching for 'KubevirtVmHighMemoryUsage'
shows the description for the alert as:

<snip>
Description
    Container {{ $labels.container }} in pod {{ $labels.pod }} in namespace {{ $labels.namespace }} free memory is less than 20 MB and it is close to requested memory
</snip>
This clearly shows that the alert will display the information with the 'namespace' details.
But still the alert couldn't be captured to verify the same.  So moving this bug to ASSIGNED state.

@shirly, do you have any suggestions ?


Note:
On the other hand, this alert is triggered with 20MB margin of free-space, is too late for the user/admin. 
By the time user/admin is alerted, the application would have got already killed (evicted).
Now 10% margin of free-space is suggested for this alert. This is taken care of in a different JIRA story

Comment 2 Krzysztof Majcher 2022-10-27 10:13:12 UTC
There is some additional discussion on jira https://issues.redhat.com/browse/CNV-21883

Comment 6 SATHEESARAN 2023-01-09 15:06:55 UTC
Verified with CNV v4.11.2-30 and the fix is not yet available.

I see the same description with free memory less than 20 MB, though
with the latest patch this was updated to 50MB.

Moving this bug to ASSIGNED as the fix is not yet available in 4.11.2

<snip>
    Container {{ $labels.container }} in pod {{ $labels.pod }} in namespace {{ $labels.namespace }} free memory is less than 20 MB and it is close to requested memory
</snip>

Comment 8 Krzysztof Majcher 2023-01-17 13:46:57 UTC
Joao, please make sure this gets backported once 4.11.3 builds are happening (please check with Simone or Dominik Holler for details)

Comment 10 Krzysztof Majcher 2023-01-31 13:56:27 UTC
It seems that we have upstream PR still open - is this needed to consider fix completed or not?

Comment 11 João Vilaça 2023-01-31 13:59:07 UTC
I don't think so, it's just a description update on the main branch, all prs in release-0.53 are merged

Comment 12 SATHEESARAN 2023-05-17 01:19:40 UTC
Verified with  CNV build v4.11.4-106 with the following steps:

1. From OpenShift Web console go to 'observe' -> 'Alerting' -> 'Alerting Rules', search for 'KubevirtVmHighMemoryUsage',
and click on that rule after it gets displayed post filtering.

Description of Alert describes as:
"Container {{ $labels.container }} in pod {{ $labels.pod }} in namespace {{ $labels.namespace }} free memory is less than 50 MB and it is close to requested memory"
This is the updated description.

2. Created 2 VMs with the same name but in different namespace: one in custom-named namespace, other in 'default'.
3. Update VM resources and restart the VM. Set 'requests' to 1Gi and 'limits' to '2Gi'
4. Install 'stress-ng' in the VMs
5. Select one of the VM ( with a particular namespace) and run the 'stress-ng' tool to stress the memory.
   # stress-ng --malloc 10 --malloc-bytes 120% -t10m
6. Lookout for alert - 'KubevirtVmHighMemoryUsage' under 'observe' -> 'Alerting' -> 'Alerts'

Alert displayed the information of high memory usage of VM in that particular namespace, when the free memory of requested memory goes below 50MB.

ALERT displayed as:
"Container compute in pod virt-launcher-fedora-digital-lemur-6nmdx in namespace sas free memory is less than 50 MB and it is close to requested memory"

With this observation, verifying this bug

Comment 18 errata-xmlrpc 2023-05-30 15:37:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Virtualization 4.11.4 Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:3352


Note You need to log in before you can comment on or make changes to this bug.