Description of problem: When the HAProxy pods are first created they don't properly load the routes configured to start on them. This happens when router pods are redeployed/deleted. This results in restarts of the Proxy pods resulting in serving 503 errors until they reload. [router1.paas.qa.int.phx1.redhat.com] [11:37:00 AM] [root@router1 ~]# docker logs -f $(docker ps | grep haproxy-router | cut -c1-12) I1010 07:35:16.929616 1 router.go:153] Router is only using routes in namespaces matching region=phx1,servicephase=qa,zone=internal I1010 07:35:17.131139 1 router.go:321] Router reloaded: - Checking HAProxy /healthz on port 1936 ... - HAProxy port 1936 health check ok : 0 retry attempt(s). W1010 07:37:00.928813 1 router.go:617] a edge terminated route with host master-webarchi.int.paas.qa.redhat.com does not have the required certificates. The route will still be created but no certificates will be written W1010 07:37:00.928857 1 router.go:617] a edge terminated route with host hello-world-syseng-validation.int.paas.qa.redhat.com does not have the required certificates. The route will still be created but no certificates will be written I1010 07:37:00.987037 1 router.go:321] Router reloaded: Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: the routes are not there after redeployment. Expected results: the routes should exist in the new routers that are created. Additional info:
I think this is related to https://bugzilla.redhat.com/show_bug.cgi?id=1387714
Maru: Can you see if we can do something about not loading the router until we have processed the routes? My only concern about stalling is that not putting up something in a timely manner may interfere with health checks and then we may get killed and then loop.
@bbennett: The liveness/readiness check for a template router targets the haproxy stats port. This precludes starting the router pod without starting haproxy, and it's not possible to change the liveness/readiness probes for a running pod. However, it should be possible to configure haproxy to avoid binding ports for http/tls traffic when it initially starts. Binding could be delayed until the route state had been read.
*** Bug 1387714 has been marked as a duplicate of this bug. ***
Hi, have a similar case but customer is saying that after the second reload. Thanks
I think I found the cause of this issue, separate from the port binding issue. The github PR has a fix.
verified in 3.4.1.0 and the issue has been fixed. # oc logs router-2-badgl I0122 05:18:00.655645 1 router.go:456] Router reloaded: - Checking HAProxy /healthz on port 1936 ... - HAProxy port 1936 health check ok : 0 retry attempt(s). I0122 05:18:00.655951 1 router.go:221] Router is only using routes in namespaces matching team=red E0122 05:18:00.697077 1 controller.go:169] a route in another namespace holds test-edge.example.com and is older than secured-edge-route I0122 05:18:00.747442 1 router.go:456] Router reloaded: - Checking HAProxy /healthz on port 1936 ... - HAProxy port 1936 health check ok : 0 retry attempt(s).
*** Bug 1381584 has been marked as a duplicate of this bug. ***
*** Bug 1402488 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0218