Bug 1934019

Summary: High RAM usage of machine-api-termination-handler leading to node system oom
Product: OpenShift Container Platform Reporter: Alexander Niebuhr <alexander>
Component: Machine Config OperatorAssignee: Yu Qi Zhang <jerzhang>
Status: CLOSED DUPLICATE QA Contact: Michael Nguyen <mnguyen>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.7   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: ---
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-03-02 10:00:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alexander Niebuhr 2021-03-02 09:55:03 UTC
Description of problem:
we are seeing extremely high ram usage of machine-api-termination-handler pods (>10GB). This lead to System OOM on the nodes

Version-Release number of selected component (if applicable):
4.7

Actual results:
node1 ---
Mar 02 08:47:31 ip-10-0-160-241 hyperkube[1228]: E0302 08:47:29.274245    1228 oomparser.go:149] exiting analyzeLines. OOM events will not be reported.
Mar 02 08:47:31 ip-10-0-160-241 hyperkube[1228]: E0302 08:47:28.939304    1228 oomparser.go:149] exiting analyzeLines. OOM events will not be reported.
Mar 02 08:47:31 ip-10-0-160-241 hyperkube[1228]: I0302 08:47:29.515650    1228 manager.go:1215] Created an OOM event in container "/" at 2021-03-02 08:39:51.798032232 +0000 UTC m=+949569.837181581
Mar 02 08:47:31 ip-10-0-160-241 hyperkube[1228]: I0302 08:47:30.114399    1228 event.go:291] "Event occurred" object="ip-10-0-160-241.eu-central-1.compute.internal" kind="Node" apiVersion="" type="Warning" reason="SystemOOM" message="System OOM encountered"

node2---
Mar 02 08:36:35 ip-10-0-159-11 hyperkube[1227]: I0302 08:36:34.703449    1227 manager.go:1215] Created an OOM event in container "/" at 2021-03-02 08:39:10.22251715 +0000 UTC m=+1722875.568383336
Mar 02 08:36:35 ip-10-0-159-11 hyperkube[1227]: I0302 08:36:35.455905    1227 event.go:291] "Event occurred" object="ip-10-0-159-11.eu-central-1.compute.internal" kind="Node" apiVersion="" type="Warning" reason="SystemOOM" message="System OOM encountered, victim process: opm, pid: 1083350"

Comment 1 Alexander Niebuhr 2021-03-02 10:00:37 UTC

*** This bug has been marked as a duplicate of bug 1934021 ***