Bug 1832539 - haproxy current sessions data in Prometheus keep increasing
Summary: haproxy current sessions data in Prometheus keep increasing
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 4.4
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.5.0
Assignee: Andrew McDermott
QA Contact: Arvind iyengar
URL:
Whiteboard:
Depends On:
Blocks: 1848687
TreeView+ depends on / blocked
 
Reported: 2020-05-06 19:36 UTC by jooho lee
Modified: 2020-07-13 17:35 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
haproxy_frontend_current_session or haproxy_server_current_session should show the number of active sessions. Previously, the counters were not being reset and would increase indefinitely. The value of these counters are no longer preserved across router restarts and will now accurately depict the number of active sesssions.
Clone Of:
: 1848687 (view as bug list)
Environment:
Last Closed: 2020-07-13 17:35:31 UTC
Target Upstream Version:


Attachments (Terms of Use)
ocp 4.4 (171.19 KB, image/png)
2020-05-06 19:36 UTC, jooho lee
no flags Details
ocp 3.11 (15 million) (91.59 KB, image/png)
2020-05-06 19:37 UTC, jooho lee
no flags Details
Current sessions metric is now correct (165.70 KB, image/png)
2020-05-06 23:22 UTC, Clayton Coleman
no flags Details
Promethrus UI data for patched v4.5 cluster (1.71 MB, image/png)
2020-05-15 12:44 UTC, Arvind iyengar
no flags Details
Prometheus UI data for v4.4 cluster (846.88 KB, image/png)
2020-05-15 12:45 UTC, Arvind iyengar
no flags Details
Promethrus UI data for unpatched v4.5 cluster (846.77 KB, image/png)
2020-05-15 12:46 UTC, Arvind iyengar
no flags Details


Links
System ID Priority Status Summary Last Updated
Github openshift router pull 127 None closed Bug 1832539: Only a subset of router metrics should be preserved across restarts 2020-09-10 12:47:01 UTC
Red Hat Product Errata RHBA-2020:2409 None None None 2020-07-13 17:35:48 UTC

Description jooho lee 2020-05-06 19:36:36 UTC
Created attachment 1685927 [details]
ocp 4.4

Description of problem:
From Prometheus, we can see haproxy exporter.

haproxy_frontend_current_session or haproxy_server_current_session should show active sessions but it does not show like that. It seems that the number of sessions keeps increasing. From my test environment, there is no load that's why I can see session decresement after load testing. However, for customer production cluster, it goes to 15 million with ocp 3.11 and 8k with ocp 4.4

With 4.4, there is not much data because the customer just upgraded and ingress pod restarted and lost the data. However, when I see the graph, it seems to have the same issues with ocp 3.x


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Open Prometheus UI
2. query "haproxy_frontend_current_seesions"
3. The value should be under 20000 * ingress nodes


Actual results:
The value keeps increasing

Expected results:
The value should show only active sessions so the session counts would be around 15k per router.

Additional info:

Comment 1 jooho lee 2020-05-06 19:37:18 UTC
Created attachment 1685928 [details]
ocp 3.11 (15 million)

Comment 3 Clayton Coleman 2020-05-06 23:19:33 UTC
The impact is that almost all router metrics are wrong after a sustained interval.

Comment 4 Clayton Coleman 2020-05-06 23:22:31 UTC
Created attachment 1685986 [details]
Current sessions metric is now correct

Uploaded an query from the PR that shows sessions being correct

Comment 7 Arvind iyengar 2020-05-15 12:43:12 UTC
The PR was merged and made available in "4.5.0-0.nightly-2020-05-11-084820". It is verified that in this version the Prometheus UI now shows the correct metric data for "haproxy_frontend_current_session".

Comment 8 Arvind iyengar 2020-05-15 12:44:48 UTC
Created attachment 1688909 [details]
Promethrus UI data for patched v4.5 cluster

Comment 9 Arvind iyengar 2020-05-15 12:45:52 UTC
Created attachment 1688910 [details]
Prometheus UI data for  v4.4 cluster

Comment 10 Arvind iyengar 2020-05-15 12:46:35 UTC
Created attachment 1688912 [details]
Promethrus UI data for unpatched v4.5 cluster

Comment 11 errata-xmlrpc 2020-07-13 17:35:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.