2091976 – "KubevirtVmHighMemoryUsage" alert does not return the namespace

Bug 2091976 - "KubevirtVmHighMemoryUsage" alert does not return the namespace

Summary: "KubevirtVmHighMemoryUsage" alert does not return the namespace

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Metrics
Sub Component:
Version:	4.11.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	4.11.4
Assignee:	João Vilaça
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-05-31 12:47 UTC by Shirly Radco
Modified:	2023-05-30 15:38 UTC (History)
CC List:	4 users (show)
Fixed In Version:	hco-bundle-registry-v4.11.4-9
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-05-30 15:37:44 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	kubevirt kubevirt pull 7836	None	Merged	Fix KubevirtVmHighMemoryUsage alert	2023-01-25 10:32:21 UTC
Github	kubevirt kubevirt pull 7909	None	Merged	[release-0.53] Fix KubevirtVmHighMemoryUsage alert	2023-01-25 10:32:21 UTC
Github	kubevirt kubevirt pull 8599	None	Merged	Change KubevirtVmHighMemoryUsage threshold from 20MB to 50MB	2023-01-25 10:32:22 UTC
Github	kubevirt kubevirt pull 9073	None	open	Fix incorrect KubevirtVmHighMemoryUsage description	2023-02-06 13:43:35 UTC
Github	kubevirt kubevirt pull 9074	None	Merged	[release-0.53] Change KubevirtVmHighMemoryUsage threshold from 20MB to 50MB	2023-01-25 10:32:23 UTC
Red Hat Issue Tracker	CNV-18740	None	None	None	2022-10-27 11:50:15 UTC
Red Hat Product Errata	RHEA-2023:3352	None	None	None	2023-05-30 15:38:08 UTC

Description Shirly Radco 2022-05-31 12:47:01 UTC

Description of problem:
The  "KubevirtVmHighMemoryUsage" does not return the namespace. If there is the a pod with the same name in different namespaces, the alert might fire when it shouldn't. 

Version-Release number of selected component (if applicable):
4.11.0

How reproducible:
100%

Steps to Reproduce:
1. Create vms with identical names in different namespaces. 
2. Trigger the alert for one of the VMs and verify it includes the correct namespace.
3.

Actual results:
Alert doesn't include a namespace

Expected results:
Alert includes the correct namespace for the pod.

Additional info:

Comment 1 SATHEESARAN 2022-10-14 10:35:43 UTC

Tested with CNV-4.11.1-20 with OCP 4.11.8 with the following steps:

1. Created 2 VMs, each in its own namespace but with the same name 'test-vm'
2. Used the tool, 'stress-ng' trying to reach to the state when the free-memory is less than 20 MB, so
that the alert - 'KubevirtVmHighVmUsage' will be triggered.
3. Following stress commands are attempted:

a. stress-ng --vm 1 --vm-bytes=99% -t 30m
b. stress-ng --vm 2 --vm-bytes=106% -t 30m
c. stress-ng --vm 1 --vm-bytes=2048M -t 30m
d. stress-ng --malloc 1 --malloc-bytes=90% -t 30m
e. stress-ng --malloc 2 --malloc-bytes=90% -t 30m

In all the above attempts, it was seen that free memory as seen inside the VM was going as low as ~40-45MB, and
automatically the stress process is killed, and the console connection is reset. Not sure, what is happening in 
this context. With this situation, unable to hit the alert - 'KubevirtVmHighMemoryUsage'. This is blocking the
verification of this bug.

In the web console, 'Observe' -> 'Alerting' -> 'Alerting rules' -> searching for 'KubevirtVmHighMemoryUsage'
shows the description for the alert as:

<snip>
Description
    Container {{ $labels.container }} in pod {{ $labels.pod }} in namespace {{ $labels.namespace }} free memory is less than 20 MB and it is close to requested memory
</snip>
This clearly shows that the alert will display the information with the 'namespace' details.
But still the alert couldn't be captured to verify the same.  So moving this bug to ASSIGNED state.

@shirly, do you have any suggestions ?


Note:
On the other hand, this alert is triggered with 20MB margin of free-space, is too late for the user/admin. 
By the time user/admin is alerted, the application would have got already killed (evicted).
Now 10% margin of free-space is suggested for this alert. This is taken care of in a different JIRA story

Comment 2 Krzysztof Majcher 2022-10-27 10:13:12 UTC

There is some additional discussion on jira https://issues.redhat.com/browse/CNV-21883

Comment 6 SATHEESARAN 2023-01-09 15:06:55 UTC

Verified with CNV v4.11.2-30 and the fix is not yet available.

I see the same description with free memory less than 20 MB, though
with the latest patch this was updated to 50MB.

Moving this bug to ASSIGNED as the fix is not yet available in 4.11.2

<snip>
    Container {{ $labels.container }} in pod {{ $labels.pod }} in namespace {{ $labels.namespace }} free memory is less than 20 MB and it is close to requested memory
</snip>

Comment 8 Krzysztof Majcher 2023-01-17 13:46:57 UTC

Joao, please make sure this gets backported once 4.11.3 builds are happening (please check with Simone or Dominik Holler for details)

Comment 10 Krzysztof Majcher 2023-01-31 13:56:27 UTC

It seems that we have upstream PR still open - is this needed to consider fix completed or not?

Comment 11 João Vilaça 2023-01-31 13:59:07 UTC

I don't think so, it's just a description update on the main branch, all prs in release-0.53 are merged

Comment 12 SATHEESARAN 2023-05-17 01:19:40 UTC

Verified with  CNV build v4.11.4-106 with the following steps:

1. From OpenShift Web console go to 'observe' -> 'Alerting' -> 'Alerting Rules', search for 'KubevirtVmHighMemoryUsage',
and click on that rule after it gets displayed post filtering.

Description of Alert describes as:
"Container {{ $labels.container }} in pod {{ $labels.pod }} in namespace {{ $labels.namespace }} free memory is less than 50 MB and it is close to requested memory"
This is the updated description.

2. Created 2 VMs with the same name but in different namespace: one in custom-named namespace, other in 'default'.
3. Update VM resources and restart the VM. Set 'requests' to 1Gi and 'limits' to '2Gi'
4. Install 'stress-ng' in the VMs
5. Select one of the VM ( with a particular namespace) and run the 'stress-ng' tool to stress the memory.
   # stress-ng --malloc 10 --malloc-bytes 120% -t10m
6. Lookout for alert - 'KubevirtVmHighMemoryUsage' under 'observe' -> 'Alerting' -> 'Alerts'

Alert displayed the information of high memory usage of VM in that particular namespace, when the free memory of requested memory goes below 50MB.

ALERT displayed as:
"Container compute in pod virt-launcher-fedora-digital-lemur-6nmdx in namespace sas free memory is less than 50 MB and it is close to requested memory"

With this observation, verifying this bug

Comment 18 errata-xmlrpc 2023-05-30 15:37:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Virtualization 4.11.4 Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:3352

Note You need to log in before you can comment on or make changes to this bug.