1882737 – check-endpoints should gracefully handle missing podnetworkconnectivitycheck crd

Bug 1882737 - check-endpoints should gracefully handle missing podnetworkconnectivitycheck crd

Summary: check-endpoints should gracefully handle missing podnetworkconnectivitycheck crd

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-apiserver
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	4.6.0
Assignee:	Luis Sanchez
QA Contact:	Ke Wang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-09-25 13:44 UTC by Luis Sanchez
Modified:	2020-10-27 16:45 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-10-27 16:45:38 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-kube-apiserver-operator pull 955	0	None	closed	Bug 1882737: check-endpoints should gracefully handle missing podnetworkconnectivitycheck crd	2021-01-11 03:07:25 UTC
Red Hat Product Errata	RHBA-2020:4196	0	None	None	None	2020-10-27 16:45:51 UTC

Description Luis Sanchez 2020-09-25 13:44:40 UTC

Description of problem:

If the podnetworkconnectivitycheck crd is not available, the check-endpoints tool (currently found in the kube-apiserver and openshift-apiserver pods) should disable itself until the podnetworkconnectivitycheck crd is available.

Comment 3 Ke Wang 2020-10-10 10:57:09 UTC

From https://bugzilla.redhat.com/show_bug.cgi?id=1876166#c3, enabling PodNetworkConnectivityCheck in release builds didn't work as expected, the following message is continuously output in kube-apiserver-operator logs for a long time(more than 30 minutes),the PodNetworkConnectivityCheck crd still not was created, unable to continue verification, so assign bug back.

I1010 10:30:01.198964       1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator", UID:"870c8cd3-5a47-4eb6-a9a7-2aed44f9f948", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'EndpointDetectionFailure' PodNetworkConnectivityCheck.controlplane.operator.openshift.io/kube-apiserver-ip-10-0-155-67.us-east-2.compute.internal-to-load-balancer-api-internal -n openshift-kube-apiserver: the server could not find the requested resource (post podnetworkconnectivitychecks.controlplane.operator.openshift.io)

Comment 4 Luis Sanchez 2020-10-12 14:18:54 UTC

This BZ is just for the check-endpoints tool, not the connectivitycheckcontrollers themselves. You can verify the check-endpoints containers are idle when the PodNetworkConnectivityCheck CRDdoes not exist (which is now the default).

Comment 5 Ke Wang 2020-10-13 04:28:47 UTC

Verified with OCP 4.6.0-0.nightly-2020-10-12-223649,

Updated the KubeAPIServer as following, 
$ oc edit kubeapiserver cluster
spec:
  unsupportedConfigOverrides:
    operator:
      enableConnectivityCheckController: "True"

Wait for a while, check the kube-apiserver-operator log, the podnetworkconnectivitycheck crd is not available,  
I1013 04:15:43.234129       1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator", UID:"981f0ab2-418      a-4cdf-81b4-b2d0c1500f48", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'EndpointDetectionFailure' PodNetworkConnectivityCheck.controlplane.operato      r.openshift.io/kube-apiserver-ip-xx-xx-xx-xx.us-east-2.compute.internal-to-etcd-server-ip-xx-x-xx-xxx.us-east-2.compute.internal -n openshift-kube-apiserver: the server could not find t      he requested resource (post podnetworkconnectivitychecks.controlplane.operator.openshift.io) 

$ oc get pods -n openshift-apiserver
NAME                         READY   STATUS             RESTARTS   AGE
apiserver-6c7864474f-5d7wp   1/2     CrashLoopBackOff   12         62m
apiserver-6c7864474f-j9njx   1/2     CrashLoopBackOff   12         66m
apiserver-6c7864474f-mflw7   1/2     CrashLoopBackOff   12         64m

$ oc get pods -n openshift-kube-apiserver | grep kube-apiserver
kube-apiserver-ip.xxx.us-east-2.compute.internal      4/5     CrashLoopBackOff   12         75m
kube-apiserver-ip.xx.us-east-2.compute.internal      4/5     CrashLoopBackOff   12         73m
kube-apiserver-ip.x.us-east-2.compute.internal      4/5     CrashLoopBackOff   12         69m

Check if the openshift-apiserver-check-endpoints and kube-apiserver-check-endpoints containers is running on all masters,
$ nodes=$(oc get no | grep master | awk '{print $1}')
$ for no in $nodes; do oc debug node/$no -- chroot /host crictl ps | grep 'check-endpoints';done

No running check-endpoints tools openshift-apiserver-check-endpoints and kube-apiserver-check-endpoints containers found, it is as expected.

Comment 7 errata-xmlrpc 2020-10-27 16:45:38 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Note You need to log in before you can comment on or make changes to this bug.