Bug 1814187 - [RHV] Master becomes NotReady after several days running
Summary: [RHV] Master becomes NotReady after several days running
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.4
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.4.z
Assignee: Ryan Phillips
QA Contact: Jan Zmeskal
Depends On: 1802687 1811924
TreeView+ depends on / blocked
Reported: 2020-03-17 10:16 UTC by Wei Sun
Modified: 2020-04-20 16:10 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1811924
Last Closed: 2020-04-15 18:20:41 UTC
Target Upstream Version:

Attachments (Terms of Use)
Journal logs from master-0 with kubelet log level 6 (903.38 KB, application/x-xz)
2020-03-31 07:56 UTC, Sunil Choudhary
no flags Details

Comment 1 Wei Sun 2020-03-17 10:21:46 UTC
Clone this bug since looks like it's one critical bug and it's blocking the regression test against IPI on RHV.

Comment 2 Ryan Phillips 2020-03-17 13:41:43 UTC
4.4.0-0.nightly-2020-02-27-020932 is too old. There is a fix in the attached bug that merged.

*** This bug has been marked as a duplicate of bug 1802687 ***

Comment 3 Xingxing Xia 2020-03-17 14:30:40 UTC
Bug 1802687 and 1800319 have same title and target release as 4.5. Search https://bugzilla.redhat.com/buglist.cgi?classification=Red%20Hat&j_top=OR&list_id=10920800&product=OpenShift%20Container%20Platform&product=OpenShift%20Online&query_format=advanced&short_desc=A%20pod%20that%20gradually%20leaks&short_desc_type=regexp didn't find bugs for target release 4.4, could you help clarify if 4.4 bug tracker exists? Thanks

Comment 27 Roy Golan 2020-03-30 14:46:44 UTC
Can one of you increase the log level of kubelet to see if we can get more info there?

Comment 29 Sunil Choudhary 2020-03-31 07:56:34 UTC
Created attachment 1675002 [details]
Journal logs from master-0 with kubelet log level 6

Comment 41 Peter Lauterbach 2020-04-10 19:28:45 UTC
We are no longer seeing this issue since we redeployed the cluster with faster storage.
This issue is NOT a blocker for GA of OCP 4.4 on RHV IPI

Comment 42 Lukas Svaty 2020-04-15 18:20:41 UTC
The problem was identified on performance side, bumping the specs to (and beyond) recommended IOPS, RAM, CPU, Disk size, solved these issues. This issue is not RHV specific, and if reproduced should be solved with OCP Performance team or documented as updated minimal requirements.

Only thing to consider here is proper error handling in case of insufficient requirements as reported had, which should be tracked in a specific bug for that if neccessary.

Note You need to log in before you can comment on or make changes to this bug.