Description of problem: This was related to the MemoryPressure Window when selected from the Nodes panel. I cannot determine how the memory usage of the "top pod consumers" is being calculated as it does not match any other memory usage pod display. It is 2x more than the other panels. I will attach screen shots of what I am seeing. Version-Release number of selected component (if applicable): We currently moved to 4.8 but was originally found in in 4.7 How reproducible: Set a MachineConfigPool configuration that sets a memory Hard Eviction value. Create memory related stress that will exceed the memory usage as defined by the Hard Eviction Values. Steps to Reproduce: 1.Set machineConfigPool for the worker node to have a hardevition for memory available 2. Create memory stress that exceeds the memory usage defined for hard eviction 3. Wait for MemoryPressure to be alerted and click on the MemoryPressure link on the Node panel. Check out the info in the pop up panel. Actual results: Top pod consumers looks to have incorrect memory usage Expected results: i would expect the memory usage to match all other memory usage displays for that pod Additional info: I still cannot get access to Red-hat to get the outline of the defect opening policy so I am not sure what logs you need. Please let me know and I will attach anything you require as this is easily reproducible on my smaller KVM environment.
Created attachment 1760291 [details] Screen Shots from console
Opened per Samuel Padgett request from related MemoryPressure Defect.
Hi jhusta, The bug is reported against hardware s390x, I'm doubt whether it is hardware related, by the way, could you share me the image for pod memstress shown in your screenshot?
Hi @yanpzhan my repos and image are in ibm git and artifactory which you will not have access to. We are simply using an ubuntu container and using stress-ng. Here is the command "stress-ng", "-v", "--vm", "1", "--vm-bytes", "'$ALLOCATION'", "--vm-method", "all", "--verify", "--temp-path", "/tmp"]' . With bytes equal to some value. I chose s390x as that is what I am testing on. I don't have access to an x86 machine so I make no assumptions. Here is my dockerfile FROM docker.io/ubuntu RUN apt-get update -y && apt-get install -y stress-ng iperf3 USER 0 CMD stress-ng --mmap 1 Thanks
Thanks jhusta, I built image successfully with the dockerfile. Checked on ocp 4.8 cluster with payload 4.8.0-0.nightly-2021-06-02-025513, the bug is still reproduced. The fix pr9030 is not contained in the payload. Waiting for new build with the fix.
The fix is till not contained in payload 4.8.0-0.nightly-2021-06-06-164529
@yanpzhan Thanks for keep me posted!
Created attachment 1789785 [details] mem-pod-list
In the test, I created the deployment with pod to consume 8G memory, so that memory are used up.
Tested on ocp 4.11 cluster with payload 4.11.0-0.nightly-2022-02-16-211105. 1. $ oc label machineconfigpool worker custom-kubelet=small-pods 2. Create kubeletconfig: apiVersion: machineconfiguration.openshift.io/v1 kind: KubeletConfig metadata: name: set-allocatable spec: machineConfigPoolSelector: matchLabels: custom-kubelet: small-pods kubeletConfig: systemReserved: cpu: 1000m memory: 3Gi 3. Create deployment with pods consume large memory. apiVersion: apps/v1 kind: Deployment metadata: name: memtest namespace: prozyp1 spec: selector: matchLabels: app: httpd replicas: 3 template: metadata: labels: app: httpd spec: containers: - name: httpd image: quay.io/yanpzhan/memstress:latest command: ["stress-ng", "-v", "--vm", "1", "--vm-bytes", "8G", "--vm-method", "all", "--verify", "--temp-path", "/tmp"] ports: - containerPort: 8080 4. Then check on nodes list page, when node show memory pressure info, check in the popover about the top pod info, compare it with the pod memory info on pods list page. The memory info is normal now. The bug is fixed.
Thank you @yanpzhan I am still testing 4.10 but will verify this fix once we move to 4.11
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days