1934304 – MemoryPressure Top Pod Consumers seems to be 2x expected value

Bug 1934304 - MemoryPressure Top Pod Consumers seems to be 2x expected value

Summary: MemoryPressure Top Pod Consumers seems to be 2x expected value

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Management Console
Sub Component:
Version:	4.7
Hardware:	s390x
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Target Release:	4.11.0
Assignee:	Bipul Adhikari
QA Contact:	Yanping Zhang
Docs Contact:
URL:
Whiteboard:	Scrubbed
Depends On:
Blocks:	2055290
TreeView+	depends on / blocked

Reported:	2021-03-02 21:55 UTC by jhusta
Modified:	2023-09-15 01:33 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-08-10 10:36:17 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Screen Shots from console (204.35 KB, application/pdf) 2021-03-02 21:58 UTC, jhusta	no flags	Details
mem-pod-list (66.82 KB, image/png) 2021-06-10 09:47 UTC, Yanping Zhang	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift console pull 11067	None	open	Bug 1934304: Sum total memory of unnamed container only	2022-02-16 09:42:18 UTC
Github	openshift console pull 9030	None	open	Bug 1934304: Change query for top consumer pods in Nodes Page Memory Pressure popover	2021-05-25 09:11:40 UTC
Red Hat Product Errata	RHSA-2022:5069	None	None	None	2022-08-10 10:36:37 UTC

Description jhusta 2021-03-02 21:55:48 UTC

Description of problem:
This was related to the MemoryPressure Window when selected from the Nodes panel. I cannot determine how the memory usage of the "top pod consumers" is being calculated as it does not match any other memory usage pod display. It is 2x more than the other panels. I will attach screen shots of what I am seeing. 

Version-Release number of selected component (if applicable):
We currently moved to 4.8 but was originally found in in 4.7

How reproducible:
Set a MachineConfigPool configuration that sets a memory Hard Eviction value. Create memory related stress that will exceed the memory usage as defined by the Hard Eviction Values. 


Steps to Reproduce:
1.Set machineConfigPool for the worker node to have a hardevition for memory available
2. Create memory stress that exceeds the memory usage defined for hard eviction 
3. Wait for MemoryPressure to be alerted and click on the MemoryPressure link on the Node panel. Check out the info in the pop up panel.

Actual results:
Top pod consumers looks to have incorrect memory usage


Expected results:
i would expect the memory usage to match all other memory usage displays for that pod


Additional info:
I still cannot get access to Red-hat to get the outline of the defect opening policy so I am not sure what logs you need. Please let me know and I will attach anything you require as this is easily reproducible on my smaller KVM environment.

Comment 1 jhusta 2021-03-02 21:58:58 UTC

Created attachment 1760291 [details]
Screen Shots from console

Comment 2 jhusta 2021-03-02 22:02:38 UTC

Opened per Samuel Padgett request from related MemoryPressure Defect.

Comment 8 Yanping Zhang 2021-06-01 03:52:29 UTC

Hi jhusta, The bug is reported against hardware s390x, I'm doubt whether it is hardware related, by the way, could you share me the image for pod memstress shown in your screenshot?

Comment 9 jhusta 2021-06-01 17:28:46 UTC

Hi @yanpzhan my repos and image are in ibm git and artifactory which you will not have access to. We are simply using an ubuntu container and using stress-ng. Here is the command "stress-ng", "-v", "--vm", "1", "--vm-bytes", "'$ALLOCATION'", "--vm-method", "all", "--verify", "--temp-path", "/tmp"]' . With bytes equal to some value. I chose s390x as that is what I am testing on. I don't have access to an x86 machine so I make no assumptions. 

Here is my dockerfile
FROM docker.io/ubuntu
RUN apt-get update -y && apt-get install -y stress-ng iperf3
USER 0 
CMD stress-ng --mmap 1

Thanks

Comment 10 Yanping Zhang 2021-06-03 04:04:18 UTC

Thanks jhusta, I built image successfully with the dockerfile.
Checked on ocp 4.8 cluster with payload	4.8.0-0.nightly-2021-06-02-025513, the bug is still reproduced. The fix pr9030 is not contained in the payload. Waiting for new build with the fix.

Comment 11 Yanping Zhang 2021-06-07 03:02:45 UTC

The fix is till not contained in payload 4.8.0-0.nightly-2021-06-06-164529

Comment 12 jhusta 2021-06-07 12:57:33 UTC

@yanpzhan Thanks for keep me posted!

Comment 14 Yanping Zhang 2021-06-10 09:47:19 UTC

Created attachment 1789785 [details]
mem-pod-list

Comment 16 Yanping Zhang 2021-06-10 09:50:26 UTC

In the test, I created the deployment with pod to consume 8G memory, so that memory are used up.

Comment 26 Yanping Zhang 2022-02-17 08:02:37 UTC

Tested on ocp 4.11 cluster with payload 4.11.0-0.nightly-2022-02-16-211105.
1. $ oc label machineconfigpool worker custom-kubelet=small-pods
2. Create kubeletconfig：
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: set-allocatable 
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: small-pods 
  kubeletConfig:
    systemReserved:
      cpu: 1000m
      memory: 3Gi
3. Create deployment with pods consume large memory.
apiVersion: apps/v1
kind: Deployment
metadata:
  name: memtest
  namespace: prozyp1
spec:
  selector:
    matchLabels:
      app: httpd
  replicas: 3
  template:
    metadata:
      labels:
        app: httpd
    spec:
      containers:
        - name: httpd
          image: quay.io/yanpzhan/memstress:latest
          command: ["stress-ng", "-v", "--vm", "1", "--vm-bytes", "8G", "--vm-method", "all", "--verify", "--temp-path", "/tmp"]
          ports:
            - containerPort: 8080
4. Then check on nodes list page, when node show memory pressure info, check in the popover about the top pod info, compare it with the pod memory info on pods list page. The memory info is normal now.
The bug is fixed.

Comment 29 jhusta 2022-02-23 17:29:21 UTC

Thank you @yanpzhan I am still testing 4.10 but will verify this fix once we move to 4.11

Comment 32 errata-xmlrpc 2022-08-10 10:36:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Comment 33 Red Hat Bugzilla 2023-09-15 01:33:02 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days

Note You need to log in before you can comment on or make changes to this bug.