Description of problem: actually successful build is 3, but metrics of openshift_build_result_total{result="success",strategy="docker"} is 4 Version-Release number of selected component (if applicable): 4.7.0-0.nightly-2020-10-22-175439 How reproducible: always Steps to Reproduce: [wewang@wangwen work]$ oc -n openshift-controller-manager exec controller-manager-zc695 -- curl -k -H "Authorization: Bearer $token" 'https://10.129.0.5:8443/metrics' |grep "openshift_build_result_total" % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0# HELP openshift_build_result_total [ALPHA] Counts the total number of finished builds across all namespaces by result and strategy # TYPE openshift_build_result_total counter openshift_build_result_total{result="failed",strategy="docker"} 3 openshift_build_result_total{result="failed",strategy="source"} 2 openshift_build_result_total{result="success",strategy="docker"} 4 openshift_build_result_total{result="success",strategy="source"} 2 100 68539 0 68539 0 0 5148k 0 --:--:-- --:--:-- --:--:-- 5148k [wewang@wangwen work]$ oc get builds NAME TYPE FROM STATUS STARTED DURATION build-src1-1 Source Git@57073c0 Complete 26 minutes ago 1m31s build-src2-1 Source Git@57073c0 Complete 26 minutes ago 1m24s build-src3-1 Source Git Failed (FetchSourceFailed) 26 minutes ago 12s build-docker-1-1 Docker Git@57073c0 Complete 25 minutes ago 1m5s build-docker-2-1 Docker Git@57073c0 Complete 22 minutes ago 46s build-docker-3-1 Docker Git@57073c0 Complete 22 minutes ago 49s build-docker-4-1 Docker Git Failed (FetchSourceFailed) 22 minutes ago 3s build-docker-5-1 Docker Git Failed (FetchSourceFailed) 22 minutes ago 3sActual results: metrics count is not same with actual counts of successful builds Expected results: metrics count should the same with actual counts of successful builds Additional info: the same issue with openshift_build_result_total{result="failed",strategy="source"} and openshift_build_result_total{result="failed",strategy="docker"}
The metric appears to work well for successful builds. However, failed builds can frequently hit the "completed build" method calls more than once in their lifecycle (at least 1/3 of the time). This wasn't an issue until this metric was introduced - most other operations in the completed build steps are idempotent. The "Failed" phase is unique in that the build container reports this phase transition, _not_ the build controller. We should enhance builds to use detailed failure conditions, which was alluded to in BUILD-73 [1]. Then we can have the build controller take over the "Running" -> "Failed" phase transition using the conditions reported by the build pod. [1] https://issues.redhat.com/browse/BUILD-73
This metric has been removed in 4.7. It may be reintroduced in a future release.
Got it.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633