Bug 1859134 - Switch to periodic process reaper for collecting zombie processes
Summary: Switch to periodic process reaper for collecting zombie processes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 4.6.0
Assignee: Stephen Greene
QA Contact: Arvind iyengar
URL:
Whiteboard:
Depends On:
Blocks: 1885688
TreeView+ depends on / blocked
 
Reported: 2020-07-21 10:25 UTC by Andrew McDermott
Modified: 2021-01-21 14:39 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1885688 (view as bug list)
Environment:
Last Closed: 2020-10-27 16:16:18 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift router pull 190 0 None closed Bug 1859134: Switch to periodic process reaper 2021-02-10 15:11:30 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:16:46 UTC

Description Andrew McDermott 2020-07-21 10:25:29 UTC
We need this change https://github.com/openshift/router/pull/111 which will fix https://github.com/openshift/router/pull/78.

"We were getting races when reloading haproxy via the reload-haproxy script because we had our own process reaper (StartReaper). Occasionally the reload would report no child processes and this happened when StartReaper had already reaped the reload script that we were independently waiting on elsewhere."

Comment 1 Andrew McDermott 2020-07-30 10:10:47 UTC
I’m adding UpcomingSprint, because I was occupied by fixing bugs with
higher priority/severity, developing new features with higher
priority, or developing new features to improve stability at a macro
level. I will revisit this bug next sprint.

Comment 2 mfisher 2020-08-18 19:55:58 UTC
Target reset to 4.7 while investigation is either ongoing or not yet started.  Will be considered for earlier release versions when diagnosed and resolved.

Comment 3 Andrew McDermott 2020-09-10 11:50:58 UTC
I’m adding UpcomingSprint, because I was occupied by fixing bugs with
higher priority/severity, developing new features with higher
priority, or developing new features to improve stability at a macro
level. I will revisit this bug next sprint.

Comment 7 Arvind iyengar 2020-10-09 08:28:28 UTC
Verified in "4.6.0-0.nightly-2020-10-08-210814" release. With this payload, the periodic reaper errors does not occur. For reference,the unpatched version will have such messages in the router logs:
------
$ oc -n openshift-ingress logs router-default-689c5d5bb7-w5bbh --tail 10
E1009 05:15:33.741661       1 haproxy.go:416] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory
E1009 05:30:42.298061       1 limiter.go:165] error reloading router: waitid: no child processes
------

Comment 10 errata-xmlrpc 2020-10-27 16:16:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.