Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1952137

Summary: Observing lot of Defunct processes
Product: OpenShift Container Platform Reporter: Anandhu B Raj <abraj>
Component: NodeAssignee: Sascha Grunert <sgrunert>
Node sub component: CRI-O QA Contact: Sunil Choudhary <schoudha>
Status: CLOSED DEFERRED Docs Contact:
Severity: urgent    
Priority: high CC: acardena, aos-bugs, avoigtma, bhershbe, bleanhar, bverschu, deliedit, ealcaniz, eminguez, fsimonce, gdiotte, hgomes, jteagno+bugzilla, mchebbi, miminar, minmli, nagrawal, ofalk, openshift-bugs-escalate, palshure, pducai, pehunt, pratshar, prdeshpa, pweil, rcarrier, rphillips, rupatel, schoudha, sgrunert, sparpate, wking
Version: 4.6Flags: prdeshpa: needinfo-
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-08-17 09:46:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Anandhu B Raj 2021-04-21 15:12:42 UTC
Description of problem:

Pods are stuck in a container creating/terminating  state and in those nodes observing a lot of defunct processes

Opening this Bugreport in connection with the ongoing issue on:

1) https://bugzilla.redhat.com/show_bug.cgi?id=1903228

2) https://bugzilla.redhat.com/show_bug.cgi?id=1942375


As per https://bugzilla.redhat.com/show_bug.cgi?id=1903228#c23 and 
https://bugzilla.redhat.com/show_bug.cgi?id=1903228#c24  it might be connected with a closed errata https://bugzilla.redhat.com/show_bug.cgi?id=1848524


Version-Release number of selected component (if applicable):

4.6.21



Additional info:

Attaching the sosreport.

Comment 6 Sascha Grunert 2021-04-28 08:38:05 UTC
The sosreport indicates that there are other defunct processes other than conmon, so I suspect this issue is related to:
https://bugzilla.redhat.com/show_bug.cgi?id=1932832

I really would like to give the runc fix a try which got mentioned in the first comment.

Anandhu, do you think we can test it this way?

Comment 13 Sascha Grunert 2021-05-03 08:27:43 UTC
Peter, referring to https://bugzilla.redhat.com/show_bug.cgi?id=1942375#c10, which OpenShift version ships the runc-1.0.0-85.rhaos4.6.git77a6f3c package?

Comment 14 Peter Hunt 2021-05-03 19:33:30 UTC
Since it was bumped in https://releases-rhcos-art.cloud.privileged.psi.redhat.com/contents.html?stream=releases%2Frhcos-4.6&release=46.82.202104281641-0 I would guess 2.6.28 will have it

Comment 15 Sascha Grunert 2021-05-04 07:10:30 UTC
Anandhu, can we update the customer version to 2.6.28 to test the fix?

Comment 16 Sascha Grunert 2021-05-04 09:10:03 UTC
(In reply to Sascha Grunert from comment #15)
> Anandhu, can we update the customer version to 2.6.28 to test the fix?

I think Peter meant 4.6.28 :)

Comment 39 Sascha Grunert 2021-05-27 07:50:28 UTC
*** Bug 1932832 has been marked as a duplicate of this bug. ***

Comment 47 Sascha Grunert 2021-06-07 13:59:51 UTC
Unsetting the target release because this issue affects 4.6.

Comment 50 Sascha Grunert 2021-06-14 07:12:31 UTC
The upstream PR has been merged and should automatically land in the next OpenShift release.

Comment 54 Sascha Grunert 2021-06-25 07:02:48 UTC
Hey Anandhu, 4.6.35 should contain the fix.

Comment 61 Peter Hunt 2021-07-14 13:42:41 UTC
FYI CRI-O needs the attached fix as well. for 4.6, it made it in 4.6.36

Comment 62 Peter Hunt 2021-07-14 13:44:44 UTC
*** Bug 1980522 has been marked as a duplicate of this bug. ***

Comment 81 MinLi 2021-08-09 10:36:49 UTC
test on 4.6.42, create some pods with liveness exec probe, don't find any defunct process.

Comment 84 MinLi 2021-08-10 04:18:19 UTC
according to  Comment 82, this issue still exist in production environment, so set assigned again.

Comment 115 Sascha Grunert 2021-08-17 09:46:27 UTC
Thanks Peter, as discussed yesterday I'm closing this bug in favor of BZ#1994444 to focus our observations on the currently open issue.