Bug 1755936 - [GCP] [flake] [Feature:Prometheus][Conformance] Prometheus when installed on the cluster should provide ingress metrics [Suite:openshift/conformance/parallel/minimal]
Summary: [GCP] [flake] [Feature:Prometheus][Conformance] Prometheus when installed on ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 4.3.0
Assignee: Dan Mace
QA Contact: Hongan Li
URL:
Whiteboard:
: 1763679 (view as bug list)
Depends On:
Blocks: 1757807
TreeView+ depends on / blocked
 
Reported: 2019-09-26 13:17 UTC by David Eads
Modified: 2020-05-13 21:25 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1757807 (view as bug list)
Environment:
Last Closed: 2020-05-13 21:25:27 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift origin pull 23879 'None' closed Bug 1755936: e2e: fix ingress metrics parallelism flake 2020-02-03 13:02:02 UTC
Github openshift origin pull 23892 'None' closed Bug 1755936: e2e: stabilize ingress metrics tests 2020-02-03 13:02:02 UTC
Red Hat Product Errata RHBA-2020:0062 None None None 2020-05-13 21:25:30 UTC

Description David Eads 2019-09-26 13:17:38 UTC
The "[Feature:Prometheus][Conformance] Prometheus when installed on the cluster should provide ingress metrics [Suite:openshift/conformance/parallel/minimal]" test is the most frequent flake on GCP tests.  It rarely fails twice in a run, but it has failed numerous times.

One example job here: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-gcp-4.2/404

Test grid here: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.2-informing#canary-openshift-ocp-installer-e2e-gcp-4.2

Based on the pattern, I wonder if we're seeing an ingress problem.  Please reach out if you find strong evidence that cluster ingress is failing.  the OAuth server failure looks that way.

Comment 1 Frederic Branczyk 2019-09-26 13:37:10 UTC
I checked the Prometheus dump and it seems these two metrics are the ones that are not found: https://github.com/openshift/origin/blob/4b9f648354a2dcb2832e3765caa571028f99ce00/test/extended/prometheus/prometheus.go#L286-L287

We didn't write these tests (and I'd personally prefer if they were in the component's test suite not this one as as this example shows the ownership is unclear). Moving to routing component.

Comment 2 Dan Mace 2019-10-23 14:26:08 UTC
CI search and test grid data don't seem to indicate this is happening often enough to warrant our attention. If it recurs, we can open another bug.

Comment 4 Dan Mace 2019-10-24 19:44:54 UTC
*** Bug 1763679 has been marked as a duplicate of this bug. ***

Comment 5 Dan Mace 2019-10-29 13:24:26 UTC
I don't know what's going on with this bug, but we need the fix for CI so I'm moving it to verified myself.

Comment 7 errata-xmlrpc 2020-05-13 21:25:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062


Note You need to log in before you can comment on or make changes to this bug.