Bug 1755936

Summary: [GCP] [flake] [Feature:Prometheus][Conformance] Prometheus when installed on the cluster should provide ingress metrics [Suite:openshift/conformance/parallel/minimal]
Product: OpenShift Container Platform Reporter: David Eads <deads>
Component: NetworkingAssignee: Dan Mace <dmace>
Networking sub component: router QA Contact: Hongan Li <hongli>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: high CC: alegrand, anpicker, aos-bugs, ccoleman, erooth, kakkoyun, lcosic, lxia, mloibl, pkrupa, surbania
Version: 4.2.0Keywords: Reopened
Target Milestone: ---   
Target Release: 4.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1757807 (view as bug list) Environment:
Last Closed: 2020-05-13 21:25:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1757807    

Description David Eads 2019-09-26 13:17:38 UTC
The "[Feature:Prometheus][Conformance] Prometheus when installed on the cluster should provide ingress metrics [Suite:openshift/conformance/parallel/minimal]" test is the most frequent flake on GCP tests.  It rarely fails twice in a run, but it has failed numerous times.

One example job here: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-gcp-4.2/404

Test grid here: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.2-informing#canary-openshift-ocp-installer-e2e-gcp-4.2

Based on the pattern, I wonder if we're seeing an ingress problem.  Please reach out if you find strong evidence that cluster ingress is failing.  the OAuth server failure looks that way.

Comment 1 Frederic Branczyk 2019-09-26 13:37:10 UTC
I checked the Prometheus dump and it seems these two metrics are the ones that are not found: https://github.com/openshift/origin/blob/4b9f648354a2dcb2832e3765caa571028f99ce00/test/extended/prometheus/prometheus.go#L286-L287

We didn't write these tests (and I'd personally prefer if they were in the component's test suite not this one as as this example shows the ownership is unclear). Moving to routing component.

Comment 2 Dan Mace 2019-10-23 14:26:08 UTC
CI search and test grid data don't seem to indicate this is happening often enough to warrant our attention. If it recurs, we can open another bug.

Comment 4 Dan Mace 2019-10-24 19:44:54 UTC
*** Bug 1763679 has been marked as a duplicate of this bug. ***

Comment 5 Dan Mace 2019-10-29 13:24:26 UTC
I don't know what's going on with this bug, but we need the fix for CI so I'm moving it to verified myself.

Comment 7 errata-xmlrpc 2020-05-13 21:25:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062