Description of problem: The "KubevirtVmHighMemoryUsage" does not return the namespace. If there is the a pod with the same name in different namespaces, the alert might fire when it shouldn't. Version-Release number of selected component (if applicable): 4.11.0 How reproducible: 100% Steps to Reproduce: 1. Create vms with identical names in different namespaces. 2. Trigger the alert for one of the VMs and verify it includes the correct namespace. 3. Actual results: Alert doesn't include a namespace Expected results: Alert includes the correct namespace for the pod. Additional info:
Tested with CNV-4.11.1-20 with OCP 4.11.8 with the following steps: 1. Created 2 VMs, each in its own namespace but with the same name 'test-vm' 2. Used the tool, 'stress-ng' trying to reach to the state when the free-memory is less than 20 MB, so that the alert - 'KubevirtVmHighVmUsage' will be triggered. 3. Following stress commands are attempted: a. stress-ng --vm 1 --vm-bytes=99% -t 30m b. stress-ng --vm 2 --vm-bytes=106% -t 30m c. stress-ng --vm 1 --vm-bytes=2048M -t 30m d. stress-ng --malloc 1 --malloc-bytes=90% -t 30m e. stress-ng --malloc 2 --malloc-bytes=90% -t 30m In all the above attempts, it was seen that free memory as seen inside the VM was going as low as ~40-45MB, and automatically the stress process is killed, and the console connection is reset. Not sure, what is happening in this context. With this situation, unable to hit the alert - 'KubevirtVmHighMemoryUsage'. This is blocking the verification of this bug. In the web console, 'Observe' -> 'Alerting' -> 'Alerting rules' -> searching for 'KubevirtVmHighMemoryUsage' shows the description for the alert as: <snip> Description Container {{ $labels.container }} in pod {{ $labels.pod }} in namespace {{ $labels.namespace }} free memory is less than 20 MB and it is close to requested memory </snip> This clearly shows that the alert will display the information with the 'namespace' details. But still the alert couldn't be captured to verify the same. So moving this bug to ASSIGNED state. @shirly, do you have any suggestions ? Note: On the other hand, this alert is triggered with 20MB margin of free-space, is too late for the user/admin. By the time user/admin is alerted, the application would have got already killed (evicted). Now 10% margin of free-space is suggested for this alert. This is taken care of in a different JIRA story
There is some additional discussion on jira https://issues.redhat.com/browse/CNV-21883
Verified with CNV v4.11.2-30 and the fix is not yet available. I see the same description with free memory less than 20 MB, though with the latest patch this was updated to 50MB. Moving this bug to ASSIGNED as the fix is not yet available in 4.11.2 <snip> Container {{ $labels.container }} in pod {{ $labels.pod }} in namespace {{ $labels.namespace }} free memory is less than 20 MB and it is close to requested memory </snip>
Joao, please make sure this gets backported once 4.11.3 builds are happening (please check with Simone or Dominik Holler for details)
It seems that we have upstream PR still open - is this needed to consider fix completed or not?
I don't think so, it's just a description update on the main branch, all prs in release-0.53 are merged
Verified with CNV build v4.11.4-106 with the following steps: 1. From OpenShift Web console go to 'observe' -> 'Alerting' -> 'Alerting Rules', search for 'KubevirtVmHighMemoryUsage', and click on that rule after it gets displayed post filtering. Description of Alert describes as: "Container {{ $labels.container }} in pod {{ $labels.pod }} in namespace {{ $labels.namespace }} free memory is less than 50 MB and it is close to requested memory" This is the updated description. 2. Created 2 VMs with the same name but in different namespace: one in custom-named namespace, other in 'default'. 3. Update VM resources and restart the VM. Set 'requests' to 1Gi and 'limits' to '2Gi' 4. Install 'stress-ng' in the VMs 5. Select one of the VM ( with a particular namespace) and run the 'stress-ng' tool to stress the memory. # stress-ng --malloc 10 --malloc-bytes 120% -t10m 6. Lookout for alert - 'KubevirtVmHighMemoryUsage' under 'observe' -> 'Alerting' -> 'Alerts' Alert displayed the information of high memory usage of VM in that particular namespace, when the free memory of requested memory goes below 50MB. ALERT displayed as: "Container compute in pod virt-launcher-fedora-digital-lemur-6nmdx in namespace sas free memory is less than 50 MB and it is close to requested memory" With this observation, verifying this bug
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Virtualization 4.11.4 Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2023:3352