Description of problem: If backing pods of a service exposed via route are unavailable (crashlooping, deleted, etc) the router responds with a 503 but haproxy_server_http_responses_total disappears for that route.
Version-Release number of selected component (if applicable): Tested in OSD 4.3.18
How reproducible: Reproducible
Steps to Reproduce:
1. Create an http deployment, a service and expose it through a route
2. Create a client that queries the route (a shell curl loop should be enough)
3. Delete deployment
Actual results: haproxy_server_http_responses_total metric for that route is no longer available, which means that monitoring on that route is no longer possible (e.g to monitor for errors)
Expected results: haproxy_server_http_responses_total registering the 503s the client is getting
* I've taken a look to a few of haproxy_server_* metrics and they are also not available.
* haproxy_backend_up metric returns 1, which looks wrong
The PR merge made into "4.5.0-0.nightly-2020-05-26-224432" version. In the fixed version, we see the "haproxy_backend_*_metrics" are getting populated properly and "haproxy_backend_http_responses_total" now shows the "error 5xx" (error 503s) in an event with the backend pods goes unavailable as expected.
Created attachment 1693034 [details]
Prometheus graph data from patched cluster version
For future reference this was backported to 4.4 in https://github.com/openshift/router/pull/141/commits
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.