Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1999255

Summary: ovnkube-node always crashes out the first time it starts
Product: OpenShift Container Platform Reporter: Dan Winship <danw>
Component: NetworkingAssignee: Christoph Stäbler <cstabler>
Networking sub component: ovn-kubernetes QA Contact: Dan Brahaney <dbrahane>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: low CC: cstabler, dbrahane, kkarampo, mapandey, suc, trozet, zzhao
Version: 4.9   
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-12 04:37:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dan Winship 2021-08-30 18:11:13 UTC
even on successful e2e ovn runs, each of the ovnkube-node pods has a previous.log that ends with:

F0830 07:32:04.337095    2679 ovnkube.go:131] error waiting for node readiness: failed while waiting on patch port "patch-br-ex_ip-10-0-144-192.us-west-1.compute.internal-to-br-int" to be created by ovn-controller and while getting ofport. stderr: "ovs-vsctl: no row \"patch-br-ex_ip-10-0-144-192.us-west-1.compute.internal-to-br-int\" in table Interface\n", error: exit status 1

That's from https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-e2e-aws-ovn/1432239942456578048, specifically https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-e2e-aws-ovn/1432239942456578048/artifacts/e2e-aws-ovn/gather-extra/artifacts/pods/openshift-ovn-kubernetes_ovnkube-node-7dd8x_ovnkube-node_previous.log.

The corresponding ovn-controller log suggests ovnkube-node just didn't wait long enough. Actually, despite the claim in the error message, it looks like ovnkube-node is not actually *waiting* for ovn-controller, but rather, simply assuming ovn-controller will already be ready by the time it reaches that check. Which is turning out to be false.

Comment 5 zhaozhanqi 2022-01-25 03:39:28 UTC
I think this bug can be moved to verified on 4.10.0-0.nightly-2022-01-24-020644

no restart for ovnkube-node pod

$ oc get pod -n openshift-ovn-kubernetes
NAME                   READY   STATUS    RESTARTS      AGE
ovnkube-master-92jk8   6/6     Running   0             43m
ovnkube-master-fnrf6   6/6     Running   6 (42m ago)   43m
ovnkube-master-ncx24   6/6     Running   6 (41m ago)   43m
ovnkube-node-bq8dw     5/5     Running   0             43m
ovnkube-node-fkd2z     5/5     Running   0             26m
ovnkube-node-mqwqb     5/5     Running   0             26m
ovnkube-node-qrlb4     5/5     Running   0             43m
ovnkube-node-s6m6v     5/5     Running   0             26m
ovnkube-node-zbjd7     5/5     Running   0             43m

Comment 12 errata-xmlrpc 2022-03-12 04:37:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056

Comment 13 Red Hat Bugzilla 2023-09-15 01:14:26 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days