Bug 1848048 - OpenShift installer fails when using ovn-kubernetes
Summary: OpenShift installer fails when using ovn-kubernetes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.6.0
Assignee: Ben Bennett
QA Contact: Anurag saxena
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-17 14:54 UTC by ravig
Modified: 2020-10-27 16:07 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:07:36 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 1830 0 None closed Revert "Start openvswitch and ovsdb-server when network is ovn/ovs" 2021-01-13 10:58:11 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:07:56 UTC

Description ravig 2020-06-17 14:54:13 UTC
Description of problem:

Cluster fails to come up when using ovn-kubernetes as network type with the following errors:


level=error msg="Cluster operator network Degraded is True with RolloutHung: DaemonSet \"openshift-ovn-kubernetes/ovnkube-node\" rollout is not making progress - last change 2020-06-16T20:41:34Z"
level=info msg="Cluster operator network Progressing is True with Deploying: DaemonSet \"openshift-multus/network-metrics-daemon\" is waiting for other operators to become ready\nDaemonSet \"openshift-multus/multus-admission-controller\" is waiting for other operators to become ready\nDaemonSet \"openshift-ovn-kubernetes/ovnkube-node\" is not available (awaiting 3 nodes)"
level=info msg="Cluster operator network Available is False with Startup: The network is starting up"
level=info msg="Pulling debug logs from the bootstrap machine"
level=info msg="Bootstrap gather logs captured here \"/tmp/installer/log-bundle-20200616210856.tar.gz\""
level=fatal msg="Bootstrap failed to complete: failed to wait for bootstrapping to complete: timed out waiting for the condition"
error: failed to execute wrapped command: exit status 1
2020/06/16 21:09:21 Container test in pod e2e-operator-ipi-install-install failed, exit code 1, reason Error
2020/06/16 21:09:22 Copied 7.24MB of artifacts from e2e-operator-ipi-install-install to /logs/artifacts/e2e-operator/ipi-install-install
2020/06/16 21:09:22 Executing "e2e-operator-gather-must-gather"
2020/06/16 21:09:25 Container cp-secret-wrapper in pod e2e-operator-gather-must-gather completed successfully
Running must-gather...
error: gather did not start for pod must-gather-5jf2q: timed out waiting for the condition
error: failed to execute wrapped command: exit status 1 


Version-Release number of selected component (if applicable):


How reproducible:

Everytime in CI:

https://search.apps.build01.ci.devcluster.openshift.com/?search=Cluster+operator+network+Degraded+is+True+with+RolloutHung&maxAge=48h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job

Steps to Reproduce:
1. Spin up OpenShift cluster with ovn-kubernetes
2.
3.

Actual results:
OpenShift installation fails

Expected results:
OpenShift installation successful

Additional info:

Comment 1 ravig 2020-06-17 14:55:08 UTC
I0617 05:12:31.245747   45253 ovs.go:250] exec(126): stderr: "ovs-ofctl: br-int is not a bridge or a socket\n"
I0617 05:12:31.245754   45253 ovs.go:252] exec(126): err: exit status 1
F0617 05:12:31.245772   45253 ovnkube.go:129] timed out dumping br-int flow entries for node ip-10-0-139-86.us-east-2.compute.internal: timed out waiting for the condition

I can the above error in the ovn-node log

Comment 2 Ben Bennett 2020-06-18 13:04:26 UTC
We believe it is fixed by https://github.com/openshift/machine-config-operator/pull/1830

Comment 8 errata-xmlrpc 2020-10-27 16:07:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.