Bug 1752814 - Counter metrics are decreasing which should not be allowed
Summary: Counter metrics are decreasing which should not be allowed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.6.0
Assignee: Stephen Greene
QA Contact: Arvind iyengar
URL:
Whiteboard:
Depends On:
Blocks: 1890534 1890545
TreeView+ depends on / blocked
 
Reported: 2019-09-17 09:35 UTC by Anshul Verma
Modified: 2020-10-27 15:54 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: HAProxy is reloaded when the router changes the configuration file Consequence: Some counter prometheus metrics were decreasing across reloads which explicitly violates the definition of a counter metric. Fix: Have the router reload metrics code note the time of the last metrics scrape, to avoid scraping beyond the preserved counter values during a reload Result: Counter metrics do not see a sudden increase followed by a decrease when the router is reloading.
Clone Of:
: 1890545 (view as bug list)
Environment:
Last Closed: 2020-10-27 15:54:19 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift router pull 179 0 None closed Bug 1752814: Fix decreasing counter metrics when reloading HAProxy 2020-12-07 09:39:11 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 15:54:52 UTC

Description Anshul Verma 2019-09-17 09:35:14 UTC
Description of problem:

The metrics which should act as counters like haproxy_frontend_http_responses_total. These metrics should only be allowed to increase or to start over again from zero but should not decrease. 
Such definition of a prometheus counter can be found here: https://prometheus.io/docs/concepts/metric_types/#counter

Decreasing counters produce a problem in prometheus rate() function because decreasing means that the counter must be restarted and the counter starts again with zero. But if the counter does not start with zero and only decreases a little bit the rate functions produces a peak in the statistic.

Such counters are seen decreasing which should not be allowed.

Comment 7 Daneyon Hansen 2020-01-13 23:13:32 UTC
@Selim,

Now that [2] is done, we should be able to fix this bz using the preferred method [3]. We will address [3] during our next sprint planning on 1/16.

Comment 17 Arvind iyengar 2020-09-30 12:42:41 UTC
Verfied in "4.6.0-0.nightly-2020-09-30-052433" release version. With this payload, it is noted noted that the metric counters continues to increase or resets to zero during reload conditions but no decrements.

Comment 24 errata-xmlrpc 2020-10-27 15:54:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.