We need this change https://github.com/openshift/router/pull/111 which will fix https://github.com/openshift/router/pull/78. "We were getting races when reloading haproxy via the reload-haproxy script because we had our own process reaper (StartReaper). Occasionally the reload would report no child processes and this happened when StartReaper had already reaped the reload script that we were independently waiting on elsewhere."
I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.
Target reset to 4.7 while investigation is either ongoing or not yet started. Will be considered for earlier release versions when diagnosed and resolved.
Verified in "4.6.0-0.nightly-2020-10-08-210814" release. With this payload, the periodic reaper errors does not occur. For reference,the unpatched version will have such messages in the router logs: ------ $ oc -n openshift-ingress logs router-default-689c5d5bb7-w5bbh --tail 10 E1009 05:15:33.741661 1 haproxy.go:416] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory E1009 05:30:42.298061 1 limiter.go:165] error reloading router: waitid: no child processes ------
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196