Bug 1810136 - [4.2] A pod that gradually leaks memory causes node to become unreachable for 10 minutes
Summary: [4.2] A pod that gradually leaks memory causes node to become unreachable for...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.4
Hardware: s390x
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.2.z
Assignee: Ryan Phillips
QA Contact: MinLi
URL:
Whiteboard:
: 1795185 1802639 (view as bug list)
Depends On: 1808429
Blocks: OCP/Z_4.2 1766237 1801826 1801829 1802687
TreeView+ depends on / blocked
 
Reported: 2020-03-04 15:31 UTC by Ryan Phillips
Modified: 2020-07-01 16:08 UTC (History)
23 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1808429
Environment:
Last Closed: 2020-07-01 16:08:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 24631 0 None closed [release-4.2] Bug 1810136: UPSTREAM: 88251: Partially fix incorrect configuration of kubepods.slice unit by kubelet 2021-02-11 09:25:20 UTC
Red Hat Product Errata RHBA-2020:2589 0 None None None 2020-07-01 16:08:47 UTC

Internal Links: 1795177

Comment 1 Ryan Phillips 2020-03-04 19:47:02 UTC
*** Bug 1795185 has been marked as a duplicate of this bug. ***

Comment 2 Ryan Phillips 2020-03-26 15:36:12 UTC
*** Bug 1802639 has been marked as a duplicate of this bug. ***

Comment 6 MinLi 2020-06-10 10:30:00 UTC
verified with version : 4.4.0-0.nightly-2020-06-08-083627

memory-hog-pod got Evicted after 3m2s

$ oc get pod -o wide 
NAME             READY   STATUS    RESTARTS   AGE    IP       NODE                                         NOMINATED NODE   READINESS GATES
memory-hog-pod   0/1     Evicted   0          6m4s   <none>   ip-10-0-210-116.us-east-2.compute.internal   <none>           <none>

$ oc describe node ip-10-0-210-116.us-east-2.compute.internal
Name:               ip-10-0-210-116.us-east-2.compute.internal
Roles:              worker
...
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests      Limits
  --------                    --------      ------
  cpu                         547m (36%)    100m (6%)
  memory                      2027Mi (29%)  537Mi (7%)
  ephemeral-storage           0 (0%)        0 (0%)
  hugepages-1Gi               0 (0%)        0 (0%)
  hugepages-2Mi               0 (0%)        0 (0%)
  attachable-volumes-aws-ebs  0             0
Events:
  Type     Reason                     Age    From                                                 Message
  ----     ------                     ----   ----                                                 -------
  Warning  EvictionThresholdMet       2m40s  kubelet, ip-10-0-210-116.us-east-2.compute.internal  Attempting to reclaim memory
  Normal   NodeHasInsufficientMemory  2m33s  kubelet, ip-10-0-210-116.us-east-2.compute.internal  Node ip-10-0-210-116.us-east-2.compute.internal status is now: NodeHasInsufficientMemory

Comment 8 errata-xmlrpc 2020-07-01 16:08:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2589


Note You need to log in before you can comment on or make changes to this bug.