Bug 1832539

Summary: haproxy current sessions data in Prometheus keep increasing
Product: OpenShift Container Platform Reporter: jooho lee <jlee>
Component: NetworkingAssignee: Andrew McDermott <amcdermo>
Networking sub component: router QA Contact: Arvind iyengar <aiyengar>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: aiyengar, alegrand, anpicker, aos-bugs, bbennett, bperkins, ccoleman, erooth, kakkoyun, lcosic, mjoseph, mloibl, pkrupa, surbania
Version: 4.4   
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
haproxy_frontend_current_session or haproxy_server_current_session should show the number of active sessions. Previously, the counters were not being reset and would increase indefinitely. The value of these counters are no longer preserved across router restarts and will now accurately depict the number of active sesssions.
Story Points: ---
Clone Of:
: 1848687 (view as bug list) Environment:
Last Closed: 2020-07-13 17:35:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1848687    
Attachments:
Description Flags
ocp 4.4
none
ocp 3.11 (15 million)
none
Current sessions metric is now correct
none
Promethrus UI data for patched v4.5 cluster
none
Prometheus UI data for v4.4 cluster
none
Promethrus UI data for unpatched v4.5 cluster none

Description jooho lee 2020-05-06 19:36:36 UTC
Created attachment 1685927 [details]
ocp 4.4

Description of problem:
From Prometheus, we can see haproxy exporter.

haproxy_frontend_current_session or haproxy_server_current_session should show active sessions but it does not show like that. It seems that the number of sessions keeps increasing. From my test environment, there is no load that's why I can see session decresement after load testing. However, for customer production cluster, it goes to 15 million with ocp 3.11 and 8k with ocp 4.4

With 4.4, there is not much data because the customer just upgraded and ingress pod restarted and lost the data. However, when I see the graph, it seems to have the same issues with ocp 3.x


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Open Prometheus UI
2. query "haproxy_frontend_current_seesions"
3. The value should be under 20000 * ingress nodes


Actual results:
The value keeps increasing

Expected results:
The value should show only active sessions so the session counts would be around 15k per router.

Additional info:

Comment 1 jooho lee 2020-05-06 19:37:18 UTC
Created attachment 1685928 [details]
ocp 3.11 (15 million)

Comment 3 Clayton Coleman 2020-05-06 23:19:33 UTC
The impact is that almost all router metrics are wrong after a sustained interval.

Comment 4 Clayton Coleman 2020-05-06 23:22:31 UTC
Created attachment 1685986 [details]
Current sessions metric is now correct

Uploaded an query from the PR that shows sessions being correct

Comment 7 Arvind iyengar 2020-05-15 12:43:12 UTC
The PR was merged and made available in "4.5.0-0.nightly-2020-05-11-084820". It is verified that in this version the Prometheus UI now shows the correct metric data for "haproxy_frontend_current_session".

Comment 8 Arvind iyengar 2020-05-15 12:44:48 UTC
Created attachment 1688909 [details]
Promethrus UI data for patched v4.5 cluster

Comment 9 Arvind iyengar 2020-05-15 12:45:52 UTC
Created attachment 1688910 [details]
Prometheus UI data for  v4.4 cluster

Comment 10 Arvind iyengar 2020-05-15 12:46:35 UTC
Created attachment 1688912 [details]
Promethrus UI data for unpatched v4.5 cluster

Comment 11 errata-xmlrpc 2020-07-13 17:35:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409