Bug 1934304

Summary: MemoryPressure Top Pod Consumers seems to be 2x expected value
Product: OpenShift Container Platform Reporter: jhusta <jhusta>
Component: Management ConsoleAssignee: Bipul Adhikari <badhikar>
Status: CLOSED ERRATA QA Contact: Yanping Zhang <yanpzhan>
Severity: medium Docs Contact:
Priority: low    
Version: 4.7CC: aos-bugs, badhikar, jhadvig, kdoberst, krmoser, nmukherj, spadgett, yanpzhan, yapei
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: s390x   
OS: Linux   
Whiteboard: Scrubbed
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 10:36:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2055290    
Attachments:
Description Flags
Screen Shots from console
none
mem-pod-list none

Description jhusta 2021-03-02 21:55:48 UTC
Description of problem:
This was related to the MemoryPressure Window when selected from the Nodes panel. I cannot determine how the memory usage of the "top pod consumers" is being calculated as it does not match any other memory usage pod display. It is 2x more than the other panels. I will attach screen shots of what I am seeing. 

Version-Release number of selected component (if applicable):
We currently moved to 4.8 but was originally found in in 4.7

How reproducible:
Set a MachineConfigPool configuration that sets a memory Hard Eviction value. Create memory related stress that will exceed the memory usage as defined by the Hard Eviction Values. 


Steps to Reproduce:
1.Set machineConfigPool for the worker node to have a hardevition for memory available
2. Create memory stress that exceeds the memory usage defined for hard eviction 
3. Wait for MemoryPressure to be alerted and click on the MemoryPressure link on the Node panel. Check out the info in the pop up panel.

Actual results:
Top pod consumers looks to have incorrect memory usage


Expected results:
i would expect the memory usage to match all other memory usage displays for that pod


Additional info:
I still cannot get access to Red-hat to get the outline of the defect opening policy so I am not sure what logs you need. Please let me know and I will attach anything you require as this is easily reproducible on my smaller KVM environment.

Comment 1 jhusta 2021-03-02 21:58:58 UTC
Created attachment 1760291 [details]
Screen Shots from console

Comment 2 jhusta 2021-03-02 22:02:38 UTC
Opened per Samuel Padgett request from related MemoryPressure Defect.

Comment 8 Yanping Zhang 2021-06-01 03:52:29 UTC
Hi jhusta, The bug is reported against hardware s390x, I'm doubt whether it is hardware related, by the way, could you share me the image for pod memstress shown in your screenshot?

Comment 9 jhusta 2021-06-01 17:28:46 UTC
Hi @yanpzhan my repos and image are in ibm git and artifactory which you will not have access to. We are simply using an ubuntu container and using stress-ng. Here is the command "stress-ng", "-v", "--vm", "1", "--vm-bytes", "'$ALLOCATION'", "--vm-method", "all", "--verify", "--temp-path", "/tmp"]' . With bytes equal to some value. I chose s390x as that is what I am testing on. I don't have access to an x86 machine so I make no assumptions. 

Here is my dockerfile
FROM docker.io/ubuntu
RUN apt-get update -y && apt-get install -y stress-ng iperf3
USER 0 
CMD stress-ng --mmap 1

Thanks

Comment 10 Yanping Zhang 2021-06-03 04:04:18 UTC
Thanks jhusta, I built image successfully with the dockerfile.
Checked on ocp 4.8 cluster with payload	4.8.0-0.nightly-2021-06-02-025513, the bug is still reproduced. The fix pr9030 is not contained in the payload. Waiting for new build with the fix.

Comment 11 Yanping Zhang 2021-06-07 03:02:45 UTC
The fix is till not contained in payload 4.8.0-0.nightly-2021-06-06-164529

Comment 12 jhusta 2021-06-07 12:57:33 UTC
@yanpzhan Thanks for keep me posted!

Comment 14 Yanping Zhang 2021-06-10 09:47:19 UTC
Created attachment 1789785 [details]
mem-pod-list

Comment 16 Yanping Zhang 2021-06-10 09:50:26 UTC
In the test, I created the deployment with pod to consume 8G memory, so that memory are used up.

Comment 26 Yanping Zhang 2022-02-17 08:02:37 UTC
Tested on ocp 4.11 cluster with payload 4.11.0-0.nightly-2022-02-16-211105.
1. $ oc label machineconfigpool worker custom-kubelet=small-pods
2. Create kubeletconfig:
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: set-allocatable 
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: small-pods 
  kubeletConfig:
    systemReserved:
      cpu: 1000m
      memory: 3Gi
3. Create deployment with pods consume large memory.
apiVersion: apps/v1
kind: Deployment
metadata:
  name: memtest
  namespace: prozyp1
spec:
  selector:
    matchLabels:
      app: httpd
  replicas: 3
  template:
    metadata:
      labels:
        app: httpd
    spec:
      containers:
        - name: httpd
          image: quay.io/yanpzhan/memstress:latest
          command: ["stress-ng", "-v", "--vm", "1", "--vm-bytes", "8G", "--vm-method", "all", "--verify", "--temp-path", "/tmp"]
          ports:
            - containerPort: 8080
4. Then check on nodes list page, when node show memory pressure info, check in the popover about the top pod info, compare it with the pod memory info on pods list page. The memory info is normal now.
The bug is fixed.

Comment 29 jhusta 2022-02-23 17:29:21 UTC
Thank you @yanpzhan I am still testing 4.10 but will verify this fix once we move to 4.11

Comment 32 errata-xmlrpc 2022-08-10 10:36:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Comment 33 Red Hat Bugzilla 2023-09-15 01:33:02 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days