1835845 – haproxy_server_http_responses_total metric disappears when backing pods are not available

Bug 1835845 - haproxy_server_http_responses_total metric disappears when backing pods are not available

Summary: haproxy_server_http_responses_total metric disappears when backing pods are n...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.3.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.5.0
Assignee:	Andrew McDermott
QA Contact:	Arvind iyengar
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1855852
TreeView+	depends on / blocked

Reported:	2020-05-14 15:20 UTC by Rafa Porres Molina
Modified:	2022-08-04 22:27 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Release Note
Doc Text:	If backing pods of a service exposed via a route are unavailable (crashlooping, deleted, etc) the router responds with a 503 but the haproxy_server_http_responses_total metric disappears for that route. We now always report all backend metrics so users can track when no pods are up (e.g., crashlooping).
Clone Of:
Clones:	1855852 (view as bug list)
Environment:
Last Closed:	2020-07-13 17:38:54 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Prometheus graph data from patched cluster version (903.35 KB, image/png) 2020-05-28 12:18 UTC, Arvind iyengar	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift origin pull 24994	None	closed	Bug 1835845: Metrics test for router should allow backend metrics	2020-09-29 08:50:16 UTC
Github	openshift router pull 132	None	closed	Bug 1835845: Report all backend metrics for when there are no endpoints	2020-09-29 08:50:15 UTC
Red Hat Product Errata	RHBA-2020:2409	None	None	None	2020-07-13 17:39:10 UTC

Description Rafa Porres Molina 2020-05-14 15:20:50 UTC

Description of problem: If backing pods of a service exposed via route are unavailable (crashlooping, deleted, etc) the router responds with a 503 but haproxy_server_http_responses_total disappears for that route.


Version-Release number of selected component (if applicable): Tested in OSD 4.3.18


How reproducible: Reproducible


Steps to Reproduce:
1. Create an http deployment, a service and expose it through a route
2. Create a client that queries the route (a shell curl loop should be enough)
3. Delete deployment


Actual results: haproxy_server_http_responses_total metric for that route is no longer available, which means that monitoring on that route is no longer possible (e.g to monitor for errors)


Expected results: haproxy_server_http_responses_total registering the 503s the client is getting


Additional info: 

* I've taken a look to a few of haproxy_server_* metrics and they are also not available. 
* haproxy_backend_up metric returns 1, which looks wrong

Comment 6 Arvind iyengar 2020-05-28 12:17:06 UTC

The PR merge made into "4.5.0-0.nightly-2020-05-26-224432" version. In the fixed version, we see the "haproxy_backend_*_metrics" are getting populated properly and "haproxy_backend_http_responses_total" now shows the "error 5xx" (error 503s) in an event with the backend pods goes unavailable as expected.

Comment 7 Arvind iyengar 2020-05-28 12:18:12 UTC

Created attachment 1693034 [details]
Prometheus graph data from patched cluster version

Comment 8 Stephen Greene 2020-07-10 20:13:42 UTC

For future reference this was backported to 4.4 in https://github.com/openshift/router/pull/141/commits

Comment 9 errata-xmlrpc 2020-07-13 17:38:54 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409

Note You need to log in before you can comment on or make changes to this bug.