1999255 – ovnkube-node always crashes out the first time it starts

Bug 1999255 - ovnkube-node always crashes out the first time it starts

Summary: ovnkube-node always crashes out the first time it starts

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	4.10.0
Assignee:	Christoph Stäbler
QA Contact:	Dan Brahaney
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-08-30 18:11 UTC by Dan Winship
Modified:	2023-09-15 01:14 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-03-12 04:37:58 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	ovn-org ovn-kubernetes pull 2523	0	None	Merged	Node wait for Controller before initializing Gateway	2021-10-29 13:48:06 UTC
Red Hat Product Errata	RHSA-2022:0056	0	None	None	None	2022-03-12 04:38:16 UTC

Description Dan Winship 2021-08-30 18:11:13 UTC

even on successful e2e ovn runs, each of the ovnkube-node pods has a previous.log that ends with:

F0830 07:32:04.337095    2679 ovnkube.go:131] error waiting for node readiness: failed while waiting on patch port "patch-br-ex_ip-10-0-144-192.us-west-1.compute.internal-to-br-int" to be created by ovn-controller and while getting ofport. stderr: "ovs-vsctl: no row \"patch-br-ex_ip-10-0-144-192.us-west-1.compute.internal-to-br-int\" in table Interface\n", error: exit status 1

That's from https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-e2e-aws-ovn/1432239942456578048, specifically https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-e2e-aws-ovn/1432239942456578048/artifacts/e2e-aws-ovn/gather-extra/artifacts/pods/openshift-ovn-kubernetes_ovnkube-node-7dd8x_ovnkube-node_previous.log.

The corresponding ovn-controller log suggests ovnkube-node just didn't wait long enough. Actually, despite the claim in the error message, it looks like ovnkube-node is not actually *waiting* for ovn-controller, but rather, simply assuming ovn-controller will already be ready by the time it reaches that check. Which is turning out to be false.

Comment 5 zhaozhanqi 2022-01-25 03:39:28 UTC

I think this bug can be moved to verified on 4.10.0-0.nightly-2022-01-24-020644

no restart for ovnkube-node pod

$ oc get pod -n openshift-ovn-kubernetes
NAME                   READY   STATUS    RESTARTS      AGE
ovnkube-master-92jk8   6/6     Running   0             43m
ovnkube-master-fnrf6   6/6     Running   6 (42m ago)   43m
ovnkube-master-ncx24   6/6     Running   6 (41m ago)   43m
ovnkube-node-bq8dw     5/5     Running   0             43m
ovnkube-node-fkd2z     5/5     Running   0             26m
ovnkube-node-mqwqb     5/5     Running   0             26m
ovnkube-node-qrlb4     5/5     Running   0             43m
ovnkube-node-s6m6v     5/5     Running   0             26m
ovnkube-node-zbjd7     5/5     Running   0             43m

Comment 12 errata-xmlrpc 2022-03-12 04:37:58 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056

Comment 13 Red Hat Bugzilla 2023-09-15 01:14:26 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days

Note You need to log in before you can comment on or make changes to this bug.