Bug 1934304

Summary:

MemoryPressure Top Pod Consumers seems to be 2x expected value

Product:

OpenShift Container Platform

Reporter:

jhusta <jhusta>

Component:

Management Console

Assignee:

Bipul Adhikari <badhikar>

Status:

CLOSED ERRATA

QA Contact:

Yanping Zhang <yanpzhan>

Severity:

medium

Docs Contact:

Priority:

low

Version:

4.7

CC:

aos-bugs, badhikar, jhadvig, kdoberst, krmoser, nmukherj, spadgett, yanpzhan, yapei

Target Milestone:

---

Target Release:

4.11.0

Hardware:

s390x

OS:

Linux

Whiteboard:

Scrubbed

Fixed In Version:

Doc Type:

No Doc Update

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2022-08-10 10:36:17 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

2055290

Attachments:

Description	Flags
Screen Shots from console	none
mem-pod-list	none

Description jhusta 2021-03-02 21:55:48 UTC

Description of problem:
This was related to the MemoryPressure Window when selected from the Nodes panel. I cannot determine how the memory usage of the "top pod consumers" is being calculated as it does not match any other memory usage pod display. It is 2x more than the other panels. I will attach screen shots of what I am seeing. 

Version-Release number of selected component (if applicable):
We currently moved to 4.8 but was originally found in in 4.7

How reproducible:
Set a MachineConfigPool configuration that sets a memory Hard Eviction value. Create memory related stress that will exceed the memory usage as defined by the Hard Eviction Values. 


Steps to Reproduce:
1.Set machineConfigPool for the worker node to have a hardevition for memory available
2. Create memory stress that exceeds the memory usage defined for hard eviction 
3. Wait for MemoryPressure to be alerted and click on the MemoryPressure link on the Node panel. Check out the info in the pop up panel.

Actual results:
Top pod consumers looks to have incorrect memory usage


Expected results:
i would expect the memory usage to match all other memory usage displays for that pod


Additional info:
I still cannot get access to Red-hat to get the outline of the defect opening policy so I am not sure what logs you need. Please let me know and I will attach anything you require as this is easily reproducible on my smaller KVM environment.

Comment 1 jhusta 2021-03-02 21:58:58 UTC

Created attachment 1760291 [details]
Screen Shots from console

Comment 2 jhusta 2021-03-02 22:02:38 UTC

Opened per Samuel Padgett request from related MemoryPressure Defect.

Comment 8 Yanping Zhang 2021-06-01 03:52:29 UTC

Hi jhusta, The bug is reported against hardware s390x, I'm doubt whether it is hardware related, by the way, could you share me the image for pod memstress shown in your screenshot?

Comment 9 jhusta 2021-06-01 17:28:46 UTC

Hi @yanpzhan my repos and image are in ibm git and artifactory which you will not have access to. We are simply using an ubuntu container and using stress-ng. Here is the command "stress-ng", "-v", "--vm", "1", "--vm-bytes", "'$ALLOCATION'", "--vm-method", "all", "--verify", "--temp-path", "/tmp"]' . With bytes equal to some value. I chose s390x as that is what I am testing on. I don't have access to an x86 machine so I make no assumptions. 

Here is my dockerfile
FROM docker.io/ubuntu
RUN apt-get update -y && apt-get install -y stress-ng iperf3
USER 0 
CMD stress-ng --mmap 1

Thanks

Comment 10 Yanping Zhang 2021-06-03 04:04:18 UTC

Thanks jhusta, I built image successfully with the dockerfile.
Checked on ocp 4.8 cluster with payload	4.8.0-0.nightly-2021-06-02-025513, the bug is still reproduced. The fix pr9030 is not contained in the payload. Waiting for new build with the fix.

Comment 11 Yanping Zhang 2021-06-07 03:02:45 UTC

The fix is till not contained in payload 4.8.0-0.nightly-2021-06-06-164529

Comment 12 jhusta 2021-06-07 12:57:33 UTC

@yanpzhan Thanks for keep me posted!

Comment 14 Yanping Zhang 2021-06-10 09:47:19 UTC

Created attachment 1789785 [details]
mem-pod-list

Comment 16 Yanping Zhang 2021-06-10 09:50:26 UTC

In the test, I created the deployment with pod to consume 8G memory, so that memory are used up.

Comment 26 Yanping Zhang 2022-02-17 08:02:37 UTC

Tested on ocp 4.11 cluster with payload 4.11.0-0.nightly-2022-02-16-211105.
1. $ oc label machineconfigpool worker custom-kubelet=small-pods
2. Create kubeletconfig：
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: set-allocatable 
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: small-pods 
  kubeletConfig:
    systemReserved:
      cpu: 1000m
      memory: 3Gi
3. Create deployment with pods consume large memory.
apiVersion: apps/v1
kind: Deployment
metadata:
  name: memtest
  namespace: prozyp1
spec:
  selector:
    matchLabels:
      app: httpd
  replicas: 3
  template:
    metadata:
      labels:
        app: httpd
    spec:
      containers:
        - name: httpd
          image: quay.io/yanpzhan/memstress:latest
          command: ["stress-ng", "-v", "--vm", "1", "--vm-bytes", "8G", "--vm-method", "all", "--verify", "--temp-path", "/tmp"]
          ports:
            - containerPort: 8080
4. Then check on nodes list page, when node show memory pressure info, check in the popover about the top pod info, compare it with the pod memory info on pods list page. The memory info is normal now.
The bug is fixed.

Comment 29 jhusta 2022-02-23 17:29:21 UTC

Thank you @yanpzhan I am still testing 4.10 but will verify this fix once we move to 4.11

Comment 32 errata-xmlrpc 2022-08-10 10:36:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Comment 33 Red Hat Bugzilla 2023-09-15 01:33:02 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days