Bug 1814187

Summary: [RHV] Master becomes NotReady after several days running
Product: OpenShift Container Platform Reporter: Wei Sun <wsun>
Component: NodeAssignee: Ryan Phillips <rphillips>
Status: CLOSED NOTABUG QA Contact: Jan Zmeskal <jzmeskal>
Severity: urgent Docs Contact:
Priority: low    
Version: 4.4CC: aos-bugs, jcall, jokerman, jzmeskal, lsvaty, lxia, pelauter, rgolan, rphillips, schoudha, scuppett, wsun, wzheng, xxia
Target Milestone: ---Keywords: Reopened, TestBlocker, TestBlockerForLayeredProduct
Target Release: 4.4.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1811924 Environment:
Last Closed: 2020-04-15 18:20:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1802687, 1811924    
Bug Blocks:    
Attachments:
Description Flags
Journal logs from master-0 with kubelet log level 6 none

Comment 1 Wei Sun 2020-03-17 10:21:46 UTC
Clone this bug since looks like it's one critical bug and it's blocking the regression test against IPI on RHV.

Comment 2 Ryan Phillips 2020-03-17 13:41:43 UTC
4.4.0-0.nightly-2020-02-27-020932 is too old. There is a fix in the attached bug that merged.

*** This bug has been marked as a duplicate of bug 1802687 ***

Comment 3 Xingxing Xia 2020-03-17 14:30:40 UTC
Bug 1802687 and 1800319 have same title and target release as 4.5. Search https://bugzilla.redhat.com/buglist.cgi?classification=Red%20Hat&j_top=OR&list_id=10920800&product=OpenShift%20Container%20Platform&product=OpenShift%20Online&query_format=advanced&short_desc=A%20pod%20that%20gradually%20leaks&short_desc_type=regexp didn't find bugs for target release 4.4, could you help clarify if 4.4 bug tracker exists? Thanks

Comment 27 Roy Golan 2020-03-30 14:46:44 UTC
Can one of you increase the log level of kubelet to see if we can get more info there?

Comment 29 Sunil Choudhary 2020-03-31 07:56:34 UTC
Created attachment 1675002 [details]
Journal logs from master-0 with kubelet log level 6

Comment 41 Peter Lauterbach 2020-04-10 19:28:45 UTC
We are no longer seeing this issue since we redeployed the cluster with faster storage.
This issue is NOT a blocker for GA of OCP 4.4 on RHV IPI

Comment 42 Lukas Svaty 2020-04-15 18:20:41 UTC
The problem was identified on performance side, bumping the specs to (and beyond) recommended IOPS, RAM, CPU, Disk size, solved these issues. This issue is not RHV specific, and if reproduced should be solved with OCP Performance team or documented as updated minimal requirements.

Only thing to consider here is proper error handling in case of insufficient requirements as reported had, which should be tracked in a specific bug for that if neccessary.