Bug 2074544 - e2e-metal-ipi-ovn-ipv6 failing due to recent CEO changes
Summary: e2e-metal-ipi-ovn-ipv6 failing due to recent CEO changes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.11
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.11.0
Assignee: jamo luhrsen
QA Contact: Anurag saxena
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-12 13:15 UTC by Casey Callendrello
Modified: 2022-08-10 11:06 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-10 11:06:30 UTC
Target Upstream Version:
Embargoed:
jluhrsen: needinfo-
jluhrsen: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift release pull 28469 0 None open Bug 2074544: Revert "cno: disable ovn-kubernetes ipv6" 2022-05-10 21:25:40 UTC
Github openshift release pull 28613 0 None open Bug 2074544: cno, ovn: re-enable ovn-kubernetes ipv6 2022-05-16 09:49:28 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 11:06:50 UTC

Description Casey Callendrello 2022-04-12 13:15:47 UTC
There appears to be something wrong with etcd on this job. It has a 99% failure rate -- https://testgrid.k8s.io/redhat-openshift-ocp-release-4.11-informing#periodic-ci-openshift-release-master-nightly-4.11-e2e-metal-ipi-ovn-ipv6

I have filed a PR to make it optional. Once we fix the issue, we can re-enable it.

Comment 1 Steven Hardy 2022-04-12 13:22:20 UTC
> I have filed a PR to make it optional. Once we fix the issue, we can re-enable it.

It's a real regression so I'd prefer we fix the issue - I pushed https://github.com/openshift/cluster-etcd-operator/pull/785 but investigation/testing ongoing to confirm if that's the only issue

Comment 2 Casey Callendrello 2022-04-12 13:39:22 UTC
Stephen --

Agreed, this is a real regression. However, thanks to the interesting nature of prow and merge pools, it is essentially impossible to merge any PRs until this is fixed. Hence the blocker bug.

With less than two weeks to go before feature freeze, we can't afford to be stuck for another week.

Feel free to file a PR re-enabling the job when things are stable.

Comment 3 Steven Hardy 2022-04-12 16:12:21 UTC
It seems my fix isn't sufficient, I'll remove my assignment so this can hopefully be triaged/investigated by the etcd team

I triggered e2e-metal-ipi-ovn-ipv6 on https://github.com/openshift/cluster-etcd-operator/pull/785 so we can hopefully collect more details re the remaining issues

Comment 4 Steven Hardy 2022-04-12 17:57:18 UTC
Spotted some similar issues with https://github.com/openshift/cluster-etcd-operator/pull/784 - updated my PR with another fix and re-testing

Comment 5 Steven Hardy 2022-04-12 18:29:25 UTC
Not yet got the fixes working so trying a revert https://github.com/openshift/cluster-etcd-operator/pull/786 (this did work locally for me, but lets confirm in CI)

Comment 7 Casey Callendrello 2022-04-25 10:26:30 UTC
According to testgrid [1], this job finally went green on 4/22. So, yes, I think we can set it as blocking for CNO if desired.

1: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.11-informing#periodic-ci-openshift-release-master-nightly-4.11-e2e-metal-ipi-ovn-ipv6

Comment 8 melbeher 2022-04-25 17:45:33 UTC
@cdc Would you kindly change the component if possible ? .. I think it is no longer etcd problem, or ? 

cc @dwest

Comment 9 Casey Callendrello 2022-04-26 10:51:18 UTC
Agreed, we can kick this back to the SDN team. (It should have been a BLOCKS bug for etcd team anyways). Thanks for the prompt fix.

Comment 10 jamo luhrsen 2022-05-10 21:28:56 UTC
This job is no longer failing at such a high rate. sippy [0] is showing that it's passing more than 50% of
the time. This PR [1] will revert the initial change that made these jobs optional in CNO and OVNK. Also,
it's good to see that the job is also a payload blocker again [2], as it was also moved to informing/optional
by TRT when it was failing so often.


[0] https://sippy.dptools.openshift.org/sippy-ng/jobs/4.11/analysis?filters=%7B%22items%22%3A%5B%7B%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22equals%22%2C%22value%22%3A%22periodic-ci-openshift-release-master-nightly-4.11-e2e-metal-ipi-ovn-ipv6%22%7D%5D%7D
[1] https://github.com/openshift/release/pull/28469
[2] https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/#4.11.0-0.nightly

Comment 13 Dan Williams 2022-05-13 17:36:37 UTC
At least these two need to merge to move things along:

https://github.com/openshift/image-customization-controller/pull/49
https://github.com/openshift/installer/pull/5909

Comment 18 errata-xmlrpc 2022-08-10 11:06:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.