Bug 2053205

Summary: ci-openshift-cluster-network-operator-master-e2e-agnostic-upgrade is failing most of the time
Product: OpenShift Container Platform Reporter: Jaime Caamaño Ruiz <jcaamano>
Component: NetworkingAssignee: jamo luhrsen <jluhrsen>
Networking sub component: openshift-sdn QA Contact: zhaozhanqi <zzhao>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: high CC: anbhat, ffernand
Version: 4.11   
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 10:49:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jaime Caamaño Ruiz 2022-02-10 16:49:12 UTC
Description of problem:

ci-openshift-cluster-network-operator-master-e2e-agnostic-upgrade is failing most of the time

https://prow.ci.openshift.org/job-history/origin-ci-test/pr-logs/directory/pull-ci-openshift-cluster-network-operator-master-e2e-agnostic-upgrade

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

For CNO e2e-agnostic-upgrade is failing most of the time. For other projects as well, like console-operator for example:

https://prow.ci.openshift.org/job-history/origin-ci-test/pr-logs/directory/pull-ci-openshift-console-operator-master-e2e-agnostic-upgrade

Others seem to be more lucky about it:
https://prow.ci.openshift.org/job-history/origin-ci-test/pr-logs/directory/pull-ci-openshift-cluster-monitoring-operator-master-e2e-agnostic-upgrade
https://prow.ci.openshift.org/job-history/origin-ci-test/pr-logs/directory/pull-ci-openshift-cluster-etcd-operator-master-e2e-agnostic-upgrade

Comment 2 jamo luhrsen 2022-03-14 23:36:11 UTC
This e2e-agnostic-upgrade job is really just the e2e-azure-upgrade job [0]. Not sure why
agnostic is used in the name. but, the periodic version of this job is also pretty
unhealthy. I've pinged the TRT team about this job to see if they have any lead on
it's health.

The first few jobs I looked at were failing to bring up initial resources and not related
to even running tests. If the infra is not very stable, I'd argue to change these jobs
as presubmits to use aws instead of azure to reduce the noise devs have to deal with on
their PRs. Here's a PR [3] to do just that. If that's reasonable, please comment on the PR
and add a /lgtm

[0] https://github.com/openshift/release/blob/a3830da4426d5afb00765e809a1e8c8f6a48e422/ci-operator/config/openshift/cluster-network-operator/openshift-cluster-network-operator-release-4.11.yaml#L66-L69
[1] https://sippy.ci.openshift.org/sippy-ng/jobs/4.10/analysis?filters=%7B%22items%22%3A%5B%7B%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22equals%22%2C%22value%22%3A%22periodic-ci-openshift-release-master-ci-4.10-e2e-azure-upgrade%22%7D%5D%7D
[2] https://coreos.slack.com/archives/C01CQA76KMX/p1647298980599429
[3] https://github.com/openshift/release/pull/26977

Comment 4 jamo luhrsen 2022-03-15 17:43:30 UTC
Marking verified as the fix for this was to move the agnostic job from azure to aws. The name of the
job is now e2e-aws-upgrade as well. the PR was merged to do this and new PR checks are using the
new job. example:

https://github.com/openshift/cluster-network-operator/pull/1339

Comment 6 errata-xmlrpc 2022-08-10 10:49:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069