1955229 – release-openshift-origin-installer-e2e-aws-calico-4.7 is permfailing

Bug 1955229 - release-openshift-origin-installer-e2e-aws-calico-4.7 is permfailing

Summary: release-openshift-origin-installer-e2e-aws-calico-4.7 is permfailing

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	4.8.0
Assignee:	jamo luhrsen
QA Contact:	zhaozhanqi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-04-29 17:50 UTC by Ben Parees
Modified:	2021-07-27 23:06 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:	job=release-openshift-origin-installer-e2e-aws-calico-4.7=all
Last Closed:	2021-07-27 23:05:13 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift release pull 18367	0	None	open	Bug 1955229: step-registry: add Calico workflow	2021-05-10 09:28:13 UTC
Red Hat Product Errata	RHSA-2021:2438	0	None	None	None	2021-07-27 23:06:02 UTC

Description Ben Parees 2021-04-29 17:50:32 UTC

job:
release-openshift-origin-installer-e2e-aws-calico-4.7 

is always failing in CI, see testgrid results:
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.7-informing#release-openshift-origin-installer-e2e-aws-calico-4.7

sample job:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-calico-4.7/1387173451675996160

Assigning this to the networking team since this appears to be a Calico specific job (not sure exactly who owns testing w/ calico).

If this job is no longer relevant/interesting, please disable it as it is wasting CI resources and polluting CI pass rate signal.

Comment 1 Dan Williams 2021-05-03 15:12:28 UTC

[sig-network-edge][Conformance][Area:Networking][Feature:Router] The HAProxy router should be able to connect to a service that is idled because a GET on the route will unidle it [Suite:openshift/conformance/parallel/minimal]

^^^^^ I'm pretty sure Calico doesn't support unidling, so this test should be skipped for anything !openshift-sdn && !ovn-kubernetes

Comment 2 Dan Williams 2021-05-03 15:14:50 UTC

[sig-instrumentation] Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early] [Suite:openshift/conformance/parallel]

^^^^^ looks legit and should get fixed in our kube-proxy handling

Apr 27 23:46:22.731: INFO: stdout: "{\"status\":\"success\",\"data\":{\"resultType\":\"vector\",\"result\":[{\"metric\":{\"__name__\":\"ALERTS\",\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"kube-proxy\",\"namespace\":\"openshift-kube-proxy\",\"service\":\"openshift-kube-proxy\",\"severity\":\"warning\"},\"value\":[1619567182.717,\"1\"]}]}}"
Apr 27 23:46:22.731: INFO: promQL query: ALERTS{alertname!~"Watchdog|AlertmanagerReceiversNotConfigured|PrometheusRemoteWriteDesiredShards",alertstate="firing",severity!="info"} >= 1 had reported incorrect results:
[{"metric":{"__name__":"ALERTS","alertname":"TargetDown","alertstate":"firing","job":"kube-proxy","namespace":"openshift-kube-proxy","service":"openshift-kube-proxy","severity":"warning"},"value":[1619567182.717,"1"]}]
STEP: perform prometheus metric query ALERTS{alertname!~"Watchdog|AlertmanagerReceiversNotConfigured|PrometheusRemoteWriteDesiredShards",alertstate="firing",severity!="info"} >= 1

Apr 27 23:46:33.082: INFO: stdout: "{\"status\":\"success\",\"data\":{\"resultType\":\"vector\",\"result\":[{\"metric\":{\"__name__\":\"ALERTS\",\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"kube-proxy\",\"namespace\":\"openshift-kube-proxy\",\"service\":\"openshift-kube-proxy\",\"severity\":\"warning\"},\"value\":[1619567193.068,\"1\"]}]}}"
Apr 27 23:46:33.082: INFO: promQL query: ALERTS{alertname!~"Watchdog|AlertmanagerReceiversNotConfigured|PrometheusRemoteWriteDesiredShards",alertstate="firing",severity!="info"} >= 1 had reported incorrect results:
[{"metric":{"__name__":"ALERTS","alertname":"TargetDown","alertstate":"firing","job":"kube-proxy","namespace":"openshift-kube-proxy","service":"openshift-kube-proxy","severity":"warning"},"value":[1619567193.068,"1"]}]

Comment 3 jamo luhrsen 2021-05-04 17:19:24 UTC

I will take a look at this as soon as I get a few higher priority things taken care of.

Comment 5 jamo luhrsen 2021-05-24 21:33:55 UTC

This job was moved to use the step-registry for 4.8 and that job is not perma-failing. I think you could mark
this bug as verified now. The job properly twice already and did fail, but each failed for a different reason.

https://testgrid.k8s.io/redhat-openshift-ocp-release-4.8-informing#periodic-ci-openshift-release-master-ci-4.8-e2e-aws-calico

here is another PR that will bring 4.3-4.7 versions of this job to also use the step registry:
https://github.com/openshift/release/pull/18803

Comment 6 zhaozhanqi 2021-05-31 10:17:24 UTC

Move this to verified according to comment 5

Comment 9 errata-xmlrpc 2021-07-27 23:05:13 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Note You need to log in before you can comment on or make changes to this bug.