Bug 1885688

Summary: Switch to periodic process reaper for collecting zombie processes
Product: OpenShift Container Platform Reporter: Stephen Greene <sgreene>
Component: NetworkingAssignee: Stephen Greene <sgreene>
Networking sub component: router QA Contact: Arvind iyengar <aiyengar>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: medium CC: aiyengar, alchan, amcdermo, aos-bugs, apaladug, bbennett, fmarting, hongli, mjoseph, nmalik, rjamadar, sgreene, skanakal
Version: 4.6   
Target Milestone: ---   
Target Release: 4.5.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1859134 Environment:
Last Closed: 2020-11-05 12:46:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1859134    
Bug Blocks:    

Comment 1 Miciah Dashiel Butler Masters 2020-10-15 13:09:36 UTC
*** Bug 1888546 has been marked as a duplicate of this bug. ***

Comment 2 Miciah Dashiel Butler Masters 2020-10-19 14:48:22 UTC
*** Bug 1889019 has been marked as a duplicate of this bug. ***

Comment 4 Arvind iyengar 2020-10-23 05:35:37 UTC
Verified in "4.5.0-0.nightly-2020-10-22-024721" release version. The periodic reaper error does not occur with the patch in place.

Comment 7 errata-xmlrpc 2020-11-05 12:46:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.17 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4325

Comment 8 Anand Paladugu 2020-11-12 13:49:03 UTC
@steve greene

I have another customer seeing the error "931:E1103 15:36:29.512244       1 limiter.go:165] error reloading router: wait: no child processes" intermittently in router pod logs. They could not see any functional impact but are checking with app teams and end-users.

Is seeing that error a confirmation that the race condition exists as captured in this BZ ? or would you recommend any other tests?   The customer recently upgraded and does not want to upgrade again without confirming that this BZ definitely applies to their issue.


Thanks

Comment 9 errata-xmlrpc 2020-11-13 09:07:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.17 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4325