Bug 1752814

Summary: Counter metrics are decreasing which should not be allowed
Product: OpenShift Container Platform Reporter: Anshul Verma <ansverma>
Component: NetworkingAssignee: Stephen Greene <sgreene>
Networking sub component: router QA Contact: Arvind iyengar <aiyengar>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: agogala, aiyengar, alegrand, amcdermo, anpicker, aos-bugs, bshirren, dhansen, erooth, hongli, kakkoyun, lcosic, mcalizo, mjahangi, mloibl, mwasher, pkrupa, rheinzma, surbania
Version: 3.11.0Keywords: Reopened
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: HAProxy is reloaded when the router changes the configuration file Consequence: Some counter prometheus metrics were decreasing across reloads which explicitly violates the definition of a counter metric. Fix: Have the router reload metrics code note the time of the last metrics scrape, to avoid scraping beyond the preserved counter values during a reload Result: Counter metrics do not see a sudden increase followed by a decrease when the router is reloading.
Story Points: ---
Clone Of:
: 1890545 (view as bug list) Environment:
Last Closed: 2020-10-27 15:54:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1890534, 1890545    

Description Anshul Verma 2019-09-17 09:35:14 UTC
Description of problem:

The metrics which should act as counters like haproxy_frontend_http_responses_total. These metrics should only be allowed to increase or to start over again from zero but should not decrease. 
Such definition of a prometheus counter can be found here: https://prometheus.io/docs/concepts/metric_types/#counter

Decreasing counters produce a problem in prometheus rate() function because decreasing means that the counter must be restarted and the counter starts again with zero. But if the counter does not start with zero and only decreases a little bit the rate functions produces a peak in the statistic.

Such counters are seen decreasing which should not be allowed.

Comment 7 Daneyon Hansen 2020-01-13 23:13:32 UTC
@Selim,

Now that [2] is done, we should be able to fix this bz using the preferred method [3]. We will address [3] during our next sprint planning on 1/16.

Comment 17 Arvind iyengar 2020-09-30 12:42:41 UTC
Verfied in "4.6.0-0.nightly-2020-09-30-052433" release version. With this payload, it is noted noted that the metric counters continues to increase or resets to zero during reload conditions but no decrements.

Comment 24 errata-xmlrpc 2020-10-27 15:54:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196