Bug 1757807

Summary: [GCP] [flake] [Feature:Prometheus][Conformance] Prometheus when installed on the cluster should provide ingress metrics [Suite:openshift/conformance/parallel/minimal]
Product: OpenShift Container Platform Reporter: Dan Mace <dmace>
Component: NetworkingAssignee: Dan Mace <dmace>
Networking sub component: router QA Contact: Hongan Li <hongli>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: high CC: alegrand, anpicker, aos-bugs, ccoleman, deads, erooth, hongli, kakkoyun, lcosic, mloibl, pkrupa, surbania
Version: 4.2.0Keywords: Reopened
Target Milestone: ---   
Target Release: 4.2.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1755936 Environment:
Last Closed: 2019-11-19 13:49:01 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1755936    
Bug Blocks:    

Description Dan Mace 2019-10-02 13:30:00 UTC
+++ This bug was initially created as a clone of Bug #1755936 +++

The "[Feature:Prometheus][Conformance] Prometheus when installed on the cluster should provide ingress metrics [Suite:openshift/conformance/parallel/minimal]" test is the most frequent flake on GCP tests.  It rarely fails twice in a run, but it has failed numerous times.

One example job here: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-gcp-4.2/404

Test grid here: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.2-informing#canary-openshift-ocp-installer-e2e-gcp-4.2

Based on the pattern, I wonder if we're seeing an ingress problem.  Please reach out if you find strong evidence that cluster ingress is failing.  the OAuth server failure looks that way.

--- Additional comment from Frederic Branczyk on 2019-09-26 13:37:10 UTC ---

I checked the Prometheus dump and it seems these two metrics are the ones that are not found: https://github.com/openshift/origin/blob/4b9f648354a2dcb2832e3765caa571028f99ce00/test/extended/prometheus/prometheus.go#L286-L287

We didn't write these tests (and I'd personally prefer if they were in the component's test suite not this one as as this example shows the ownership is unclear). Moving to routing component.

Comment 1 Dan Mace 2019-10-23 14:26:16 UTC
CI search and test grid data don't seem to indicate this is happening often enough to warrant our attention. If it recurs, we can open another bug.

Comment 5 errata-xmlrpc 2019-11-19 13:49:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3869