Bug 1883660

Summary: e2e-metal-ipi CI job consistently failing on 4.4
Product: OpenShift Container Platform Reporter: Doug Hellmann <dhellmann>
Component: NetworkingAssignee: Dan Winship <danw>
Networking sub component: ovn-kubernetes QA Contact: Anurag saxena <anusaxen>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: achernet, bbennett, beth.white, danw, shardy, stbenjam, yboaron
Version: 4.4Keywords: Triaged
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:21:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1888352    
Bug Blocks:    

Description Doug Hellmann 2020-09-29 19:37:11 UTC
Version:

CI

Platform: baremetal, IPI

What happened?

CI jobs are failing with bootstrapping errors like

2020-09-29 18:51:34 level=info msg="Cluster operator network Progressing is True with Deploying: DaemonSet \"openshift-multus/multus-admission-controller\" is not available (awaiting 3 nodes)"

For example: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-etcd-operator/450/pull-ci-openshift-cluster-etcd-operator-release-4.4-e2e-metal-ipi/1310995092688867328

What did you expect to happen?

The CI job should pass.

How to reproduce it (as minimally and precisely as possible)?

Submit a patch against a 4.4 branch for an IPI component like the baremetal-operator.

Anything else we need to know?

This is blocking a feature patch, https://github.com/openshift/baremetal-operator/pull/63

Comment 4 Steven Hardy 2020-10-01 08:38:14 UTC
I tested locally and can confirm bootstrap failure with 4.4.0-0.ci-2020-09-30-103924 when configured to use ipv6

The same build with ipv4 works though, so we may have some regression specific to ipv6

Comment 5 Yossi Boaron 2020-10-01 08:56:07 UTC
Yep, in my case it was also IPv6 deployment.

Comment 6 Stephen Benjamin 2020-10-12 14:02:41 UTC
We didn't have install-gather working on baremetal in 4.4, so someone would need to collect the logs by hand and pinpoint the failure. Did anyone who ran a local install collect any more information?

Comment 8 Steven Hardy 2020-10-20 13:14:25 UTC
It seems this will require the same fix as https://bugzilla.redhat.com/show_bug.cgi?id=1888352 backported, moving the component to ovn-kubernetes

Comment 9 Dan Winship 2020-10-20 16:46:11 UTC
yeah, already filed but the bot didn't link it here because the 4.5 bz is "invalid" because it's not verified yet

Comment 11 Dan Winship 2020-11-11 12:06:50 UTC
This is being fixed by removing this job; we do not need to support IPv6 in 4.4.

Comment 16 errata-xmlrpc 2021-02-24 15:21:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633