Bug 1891362
| Summary: | Wrong metrics count for openshift_build_result_total | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | wewang <wewang> |
| Component: | Build | Assignee: | Adam Kaplan <adam.kaplan> |
| Status: | CLOSED ERRATA | QA Contact: | wewang <wewang> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.7 | CC: | aos-bugs, wzheng |
| Target Milestone: | --- | ||
| Target Release: | 4.7.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-02-24 15:28:00 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
The metric appears to work well for successful builds. However, failed builds can frequently hit the "completed build" method calls more than once in their lifecycle (at least 1/3 of the time). This wasn't an issue until this metric was introduced - most other operations in the completed build steps are idempotent. The "Failed" phase is unique in that the build container reports this phase transition, _not_ the build controller. We should enhance builds to use detailed failure conditions, which was alluded to in BUILD-73 [1]. Then we can have the build controller take over the "Running" -> "Failed" phase transition using the conditions reported by the build pod. [1] https://issues.redhat.com/browse/BUILD-73 This metric has been removed in 4.7. It may be reintroduced in a future release. Got it. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |
Description of problem: actually successful build is 3, but metrics of openshift_build_result_total{result="success",strategy="docker"} is 4 Version-Release number of selected component (if applicable): 4.7.0-0.nightly-2020-10-22-175439 How reproducible: always Steps to Reproduce: [wewang@wangwen work]$ oc -n openshift-controller-manager exec controller-manager-zc695 -- curl -k -H "Authorization: Bearer $token" 'https://10.129.0.5:8443/metrics' |grep "openshift_build_result_total" % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0# HELP openshift_build_result_total [ALPHA] Counts the total number of finished builds across all namespaces by result and strategy # TYPE openshift_build_result_total counter openshift_build_result_total{result="failed",strategy="docker"} 3 openshift_build_result_total{result="failed",strategy="source"} 2 openshift_build_result_total{result="success",strategy="docker"} 4 openshift_build_result_total{result="success",strategy="source"} 2 100 68539 0 68539 0 0 5148k 0 --:--:-- --:--:-- --:--:-- 5148k [wewang@wangwen work]$ oc get builds NAME TYPE FROM STATUS STARTED DURATION build-src1-1 Source Git@57073c0 Complete 26 minutes ago 1m31s build-src2-1 Source Git@57073c0 Complete 26 minutes ago 1m24s build-src3-1 Source Git Failed (FetchSourceFailed) 26 minutes ago 12s build-docker-1-1 Docker Git@57073c0 Complete 25 minutes ago 1m5s build-docker-2-1 Docker Git@57073c0 Complete 22 minutes ago 46s build-docker-3-1 Docker Git@57073c0 Complete 22 minutes ago 49s build-docker-4-1 Docker Git Failed (FetchSourceFailed) 22 minutes ago 3s build-docker-5-1 Docker Git Failed (FetchSourceFailed) 22 minutes ago 3sActual results: metrics count is not same with actual counts of successful builds Expected results: metrics count should the same with actual counts of successful builds Additional info: the same issue with openshift_build_result_total{result="failed",strategy="source"} and openshift_build_result_total{result="failed",strategy="docker"}