Bug 1891362 - Wrong metrics count for openshift_build_result_total
Summary: Wrong metrics count for openshift_build_result_total
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Build
Version: 4.7
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.7.0
Assignee: Adam Kaplan
QA Contact: wewang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-26 01:23 UTC by wewang
Modified: 2021-02-24 15:28 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 15:28:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift openshift-controller-manager pull 143 0 None closed Bug 1891362: Remove the openshift_build_result_total metric 2021-01-22 14:13:21 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:28:23 UTC

Description wewang 2020-10-26 01:23:33 UTC
Description of problem:
actually successful build is 3, but metrics of openshift_build_result_total{result="success",strategy="docker"} is 4


Version-Release number of selected component (if applicable):
4.7.0-0.nightly-2020-10-22-175439

How reproducible:
always

Steps to Reproduce:
[wewang@wangwen work]$ oc -n openshift-controller-manager exec controller-manager-zc695  -- curl -k -H "Authorization: Bearer $token" 'https://10.129.0.5:8443/metrics'   |grep "openshift_build_result_total"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0# HELP openshift_build_result_total [ALPHA] Counts the total number of finished builds across all namespaces by result and strategy
# TYPE openshift_build_result_total counter
openshift_build_result_total{result="failed",strategy="docker"} 3
openshift_build_result_total{result="failed",strategy="source"} 2
openshift_build_result_total{result="success",strategy="docker"} 4
openshift_build_result_total{result="success",strategy="source"} 2
100 68539    0 68539    0     0  5148k      0 --:--:-- --:--:-- --:--:-- 5148k
 
[wewang@wangwen work]$ oc get builds
NAME               TYPE     FROM          STATUS                       STARTED          DURATION
build-src1-1       Source   Git@57073c0   Complete                     26 minutes ago   1m31s
build-src2-1       Source   Git@57073c0   Complete                     26 minutes ago   1m24s
build-src3-1       Source   Git           Failed (FetchSourceFailed)   26 minutes ago   12s
build-docker-1-1   Docker   Git@57073c0   Complete                     25 minutes ago   1m5s
build-docker-2-1   Docker   Git@57073c0   Complete                     22 minutes ago   46s
build-docker-3-1   Docker   Git@57073c0   Complete                     22 minutes ago   49s
build-docker-4-1   Docker   Git           Failed (FetchSourceFailed)   22 minutes ago   3s
build-docker-5-1   Docker   Git           Failed (FetchSourceFailed)   22 minutes ago   3sActual results:
metrics count is not same with actual counts of successful builds
Expected results:
metrics count should the same with actual counts of successful builds


Additional info:
the same issue with openshift_build_result_total{result="failed",strategy="source"} and openshift_build_result_total{result="failed",strategy="docker"}

Comment 1 Adam Kaplan 2020-10-27 18:41:23 UTC
The metric appears to work well for successful builds. However, failed builds can frequently hit the "completed build" method calls more than once in their lifecycle (at least 1/3 of the time). This wasn't an issue until this metric was introduced - most other operations in the completed build steps are idempotent.

The "Failed" phase is unique in that the build container reports this phase transition, _not_ the build controller. We should enhance builds to use detailed failure conditions, which was alluded to in BUILD-73 [1]. Then we can have the build controller take over the "Running" -> "Failed" phase transition using the conditions reported by the build pod.


[1] https://issues.redhat.com/browse/BUILD-73

Comment 3 Adam Kaplan 2020-11-12 13:27:49 UTC
This metric has been removed in 4.7. It may be reintroduced in a future release.

Comment 4 wewang 2020-11-13 06:36:18 UTC
Got it.

Comment 7 errata-xmlrpc 2021-02-24 15:28:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.