Description of problem: Met one time error " E0417 08:40:19.657429 1 ratelimiter.go:52] error reloading router: wait: no child processes" in router pod logs when creating/deleting route Version-Release number of selected component (if applicable): openshift version openshift v3.6.27 kubernetes v1.5.2+43a9be4 etcd 3.1.0 How reproducible: met one time Steps to Reproduce: 1. When I using a script to try always creating route and delete it. like: $cat check.sh while true; do oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/routing/unsecure/route_unsecure.json; oc describe route route | grep expose; if [ $? != 0 ]; then echo "the route is not loading to router" >> fail.route ; fi oc delete route route; sleep 10 done 2. run this script in backgroud nohup ./check.sh & 3. met one time 'the route is not loading to router' after running about 24 hour 4. Check the router logs Actual results: oc logs router-xxx .. - HAProxy port 1936 health check ok : 0 retry attempt(s). I0417 08:40:09.418421 1 router.go:508] Router reloaded: - Checking HAProxy /healthz on port 1936 ... - HAProxy port 1936 health check ok : 0 retry attempt(s). E0417 08:40:19.657429 1 ratelimiter.go:52] error reloading router: wait: no child processes - Checking HAProxy /healthz on port 1936 ... - HAProxy port 1936 health check ok : 0 retry attempt(s). I0417 08:40:19.978148 1 router.go:508] Router reloaded: - Checking HAProxy /healthz on port 1936 ... .. Expected results: no this kind of error and route can be loading to router. Additional info:
This is just something haproxy outputs when there are no kids to kill yet... it is harmless and should be ignored. We should clean up the error to make it less scary (unless it repeats)
@Ben it's not only the error in the log. at same time the route also cannot be loading to router. see the following script is checking if the route is loading to router: ***** oc describe route route | grep expose; if [ $? != 0 ]; then echo "the route is not loading to router" >> fail.route ; fi and I can see the error 'the route is not loading to router' in fail.route when the router met 'error reloading router: wait: no child processes'
We seems to be hitting this on our testing OCP 3.5 cluster (with openshift3/ose-haproxy-router:v3.5.5.5 ) Seems that when then happens, the router stop being updated. (we are getting 503 for all new routes). (also noting there are no additional errors in haproxy logs like we used to get with https://bugzilla.redhat.com/show_bug.cgi?id=1429823 )
maschmid: You are probably hitting the router deadlock bug https://bugzilla.redhat.com/show_bug.cgi?id=1440977
The fix for the Pop() panic also has a change to reduce the number of deleted routes in the database. This may help as well since it took 24 hours to happen. Could we retest this with the Pop() panic (bz1437441) fix?
Is this still an issue?
*** This bug has been marked as a duplicate of bug 1437441 ***
I don't think this bug is a dupe of the one above. The problem seems to be the underlying golang itself see comment from tgross on following: wait: no child processes · Issue #178 · joyent/containerpilot https://github.com/joyent/containerpilot/issues/178 "I've run into this intermittently. The code section in question is in utils/run.go ExecuteAndWait. If you check out the golang source code for cmd.Run you'll see a race condition. The process is started and then we wait for it. But if the process completes and exits before the wait happens (because, say, the go runtime decides to do a GC pause right then or the goroutine yields for the syscall), then we'll get an error there." Its probably a non-issue in later versions as we changed the version of golang we ship in later versions.
While the log message is not user friendly we do not, at this time, believe that message indicated a problem or will cause any harm to the system. We do not plan specifically to address this log output in any release. I apologize however I am going to close this BZ as WONTFIX.