Bug 1882737

Summary: check-endpoints should gracefully handle missing podnetworkconnectivitycheck crd
Product: OpenShift Container Platform Reporter: Luis Sanchez <sanchezl>
Component: kube-apiserverAssignee: Luis Sanchez <sanchezl>
Status: CLOSED ERRATA QA Contact: Ke Wang <kewang>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.6CC: aos-bugs, mfojtik, xxia
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:45:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Luis Sanchez 2020-09-25 13:44:40 UTC
Description of problem:

If the podnetworkconnectivitycheck crd is not available, the check-endpoints tool (currently found in the kube-apiserver and openshift-apiserver pods) should disable itself until the podnetworkconnectivitycheck crd is available.

Comment 3 Ke Wang 2020-10-10 10:57:09 UTC
From https://bugzilla.redhat.com/show_bug.cgi?id=1876166#c3, enabling PodNetworkConnectivityCheck in release builds didn't work as expected, the following message is continuously output in kube-apiserver-operator logs for a long time(more than 30 minutes),the PodNetworkConnectivityCheck crd still not was created, unable to continue verification, so assign bug back.

I1010 10:30:01.198964       1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator", UID:"870c8cd3-5a47-4eb6-a9a7-2aed44f9f948", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'EndpointDetectionFailure' PodNetworkConnectivityCheck.controlplane.operator.openshift.io/kube-apiserver-ip-10-0-155-67.us-east-2.compute.internal-to-load-balancer-api-internal -n openshift-kube-apiserver: the server could not find the requested resource (post podnetworkconnectivitychecks.controlplane.operator.openshift.io)

Comment 4 Luis Sanchez 2020-10-12 14:18:54 UTC
This BZ is just for the check-endpoints tool, not the connectivitycheckcontrollers themselves. You can verify the check-endpoints containers are idle when the PodNetworkConnectivityCheck CRDdoes not exist (which is now the default).

Comment 5 Ke Wang 2020-10-13 04:28:47 UTC
Verified with OCP 4.6.0-0.nightly-2020-10-12-223649,

Updated the KubeAPIServer as following, 
$ oc edit kubeapiserver cluster
spec:
  unsupportedConfigOverrides:
    operator:
      enableConnectivityCheckController: "True"

Wait for a while, check the kube-apiserver-operator log, the podnetworkconnectivitycheck crd is not available,  
I1013 04:15:43.234129       1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator", UID:"981f0ab2-418      a-4cdf-81b4-b2d0c1500f48", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'EndpointDetectionFailure' PodNetworkConnectivityCheck.controlplane.operato      r.openshift.io/kube-apiserver-ip-xx-xx-xx-xx.us-east-2.compute.internal-to-etcd-server-ip-xx-x-xx-xxx.us-east-2.compute.internal -n openshift-kube-apiserver: the server could not find t      he requested resource (post podnetworkconnectivitychecks.controlplane.operator.openshift.io) 

$ oc get pods -n openshift-apiserver
NAME                         READY   STATUS             RESTARTS   AGE
apiserver-6c7864474f-5d7wp   1/2     CrashLoopBackOff   12         62m
apiserver-6c7864474f-j9njx   1/2     CrashLoopBackOff   12         66m
apiserver-6c7864474f-mflw7   1/2     CrashLoopBackOff   12         64m

$ oc get pods -n openshift-kube-apiserver | grep kube-apiserver
kube-apiserver-ip.xxx.us-east-2.compute.internal      4/5     CrashLoopBackOff   12         75m
kube-apiserver-ip.xx.us-east-2.compute.internal      4/5     CrashLoopBackOff   12         73m
kube-apiserver-ip.x.us-east-2.compute.internal      4/5     CrashLoopBackOff   12         69m

Check if the openshift-apiserver-check-endpoints and kube-apiserver-check-endpoints containers is running on all masters,
$ nodes=$(oc get no | grep master | awk '{print $1}')
$ for no in $nodes; do oc debug node/$no -- chroot /host crictl ps | grep 'check-endpoints';done

No running check-endpoints tools openshift-apiserver-check-endpoints and kube-apiserver-check-endpoints containers found, it is as expected.

Comment 7 errata-xmlrpc 2020-10-27 16:45:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196