1859134 – Switch to periodic process reaper for collecting zombie processes

Bug 1859134 - Switch to periodic process reaper for collecting zombie processes

Summary: Switch to periodic process reaper for collecting zombie processes

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	4.6.0
Assignee:	Stephen Greene
QA Contact:	Arvind iyengar
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1885688
TreeView+	depends on / blocked

Reported:	2020-07-21 10:25 UTC by Andrew McDermott
Modified:	2024-03-25 16:11 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1885688 (view as bug list)
Environment:
Last Closed:	2020-10-27 16:16:18 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift router pull 190	0	None	closed	Bug 1859134: Switch to periodic process reaper	2021-02-10 15:11:30 UTC
Red Hat Product Errata	RHBA-2020:4196	0	None	None	None	2020-10-27 16:16:46 UTC

Description Andrew McDermott 2020-07-21 10:25:29 UTC

We need this change https://github.com/openshift/router/pull/111 which will fix https://github.com/openshift/router/pull/78.

"We were getting races when reloading haproxy via the reload-haproxy script because we had our own process reaper (StartReaper). Occasionally the reload would report no child processes and this happened when StartReaper had already reaped the reload script that we were independently waiting on elsewhere."

Comment 1 Andrew McDermott 2020-07-30 10:10:47 UTC

I’m adding UpcomingSprint, because I was occupied by fixing bugs with
higher priority/severity, developing new features with higher
priority, or developing new features to improve stability at a macro
level. I will revisit this bug next sprint.

Comment 2 mfisher 2020-08-18 19:55:58 UTC

Target reset to 4.7 while investigation is either ongoing or not yet started.  Will be considered for earlier release versions when diagnosed and resolved.

Comment 3 Andrew McDermott 2020-09-10 11:50:58 UTC

I’m adding UpcomingSprint, because I was occupied by fixing bugs with
higher priority/severity, developing new features with higher
priority, or developing new features to improve stability at a macro
level. I will revisit this bug next sprint.

Comment 7 Arvind iyengar 2020-10-09 08:28:28 UTC

Verified in "4.6.0-0.nightly-2020-10-08-210814" release. With this payload, the periodic reaper errors does not occur. For reference,the unpatched version will have such messages in the router logs:
------
$ oc -n openshift-ingress logs router-default-689c5d5bb7-w5bbh --tail 10
E1009 05:15:33.741661       1 haproxy.go:416] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory
E1009 05:30:42.298061       1 limiter.go:165] error reloading router: waitid: no child processes
------

Comment 10 errata-xmlrpc 2020-10-27 16:16:18 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Note You need to log in before you can comment on or make changes to this bug.