Bug 2060534 - openshift-apiserver pod in crashloop due to unable to reach kubernetes svc ip
Summary: openshift-apiserver pod in crashloop due to unable to reach kubernetes svc ip
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.9
Hardware: Unspecified
OS: Unspecified
medium
urgent
Target Milestone: ---
: 4.11.0
Assignee: Tim Rozet
QA Contact: Anurag saxena
URL:
Whiteboard:
: 2016115 2072122 (view as bug list)
Depends On:
Blocks: 2065780
TreeView+ depends on / blocked
 
Reported: 2022-03-03 17:36 UTC by Anurag saxena
Modified: 2022-11-02 16:01 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2065780 (view as bug list)
Environment:
Last Closed: 2022-08-10 10:52:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ovn-org ovn-kubernetes pull 2845 0 None open Fix cleaning VF representor ports 2022-03-03 20:15:04 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:52:41 UTC

Comment 1 Tim Rozet 2022-03-03 19:42:07 UTC
The problem is that ovnkube deleted the OVS side of the veth interface:
[trozet@fedora contrib]$ oc logs ovnkube-node-r6gn8 -c ovnkube-node | grep apiserver-64c59d7d44-9jxmp
I0303 15:10:41.861452    6959 cni.go:227] [openshift-apiserver/apiserver-64c59d7d44-9jxmp 8046af2ac9b9f4ce38b0dd16bf3be9200d0b74b073e5c9521fcce2e6c1551f64] ADD starting CNI request [openshift-apiserver/apiserver-64c59d7d44-9jxmp 8046af2ac9b9f4ce38b0dd16bf3be9200d0b74b073e5c9521fcce2e6c1551f64]
I0303 15:10:41.893578    6959 helper_linux.go:257] ConfigureOVS: namespace: openshift-apiserver, podName: apiserver-64c59d7d44-9jxmp
I0303 15:10:43.914813    6959 cni.go:248] [openshift-apiserver/apiserver-64c59d7d44-9jxmp 8046af2ac9b9f4ce38b0dd16bf3be9200d0b74b073e5c9521fcce2e6c1551f64] ADD finished CNI request [openshift-apiserver/apiserver-64c59d7d44-9jxmp 8046af2ac9b9f4ce38b0dd16bf3be9200d0b74b073e5c9521fcce2e6c1551f64], result "{\"interfaces\":[{\"name\":\"8046af2ac9b9f4c\",\"mac\":\"3e:4f:c3:c0:93:b8\"},{\"name\":\"eth0\",\"mac\":\"0a:58:0a:80:00:35\",\"sandbox\":\"/var/run/netns/b2376631-9d4e-47da-ab5b-9d79cdc1b5f4\"}],\"ips\":[{\"version\":\"4\",\"interface\":1,\"address\":\"10.128.0.53/23\",\"gateway\":\"10.128.0.1\"}],\"dns\":{}}", err <nil>

W0303 15:10:51.913934    6959 healthcheck.go:229] Found stale OVS Interface, deleting OVS Port with interface 8046af2ac9b9f4c


This is coming from vfrepresentor clean up code that is indiscriminately removing non vf representor interfaces:
https://github.com/ovn-org/ovn-kubernetes/blob/master/go-controller/pkg/node/healthcheck.go#L169

We need to fix this TODO so this function only looks at VF representor ports:
https://github.com/openshift/ovn-kubernetes/blob/release-4.10/go-controller/pkg/node/healthcheck.go#L227

Comment 2 Tim Rozet 2022-03-03 19:44:20 UTC
This code is present in 4.9 as well. The bug exists there too.

Comment 3 Riccardo Ravaioli 2022-03-18 17:28:25 UTC
*** Bug 2016115 has been marked as a duplicate of this bug. ***

Comment 4 Riccardo Ravaioli 2022-03-21 16:22:07 UTC
This is fixed in downstream master by https://github.com/openshift/ovn-kubernetes/pull/987, moving the BZ status to ON_QA.

Comment 5 Riccardo Ravaioli 2022-04-06 10:51:13 UTC
*** Bug 2072122 has been marked as a duplicate of this bug. ***

Comment 8 errata-xmlrpc 2022-08-10 10:52:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Comment 9 Riccardo Ravaioli 2022-11-02 16:01:05 UTC
*** Bug 2016115 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.