Bug 1882737 - check-endpoints should gracefully handle missing podnetworkconnectivitycheck crd
Summary: check-endpoints should gracefully handle missing podnetworkconnectivitycheck crd
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.6.0
Assignee: Luis Sanchez
QA Contact: Ke Wang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-25 13:44 UTC by Luis Sanchez
Modified: 2020-10-27 16:45 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:45:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-kube-apiserver-operator pull 955 0 None closed Bug 1882737: check-endpoints should gracefully handle missing podnetworkconnectivitycheck crd 2021-01-11 03:07:25 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:45:51 UTC

Description Luis Sanchez 2020-09-25 13:44:40 UTC
Description of problem:

If the podnetworkconnectivitycheck crd is not available, the check-endpoints tool (currently found in the kube-apiserver and openshift-apiserver pods) should disable itself until the podnetworkconnectivitycheck crd is available.

Comment 3 Ke Wang 2020-10-10 10:57:09 UTC
From https://bugzilla.redhat.com/show_bug.cgi?id=1876166#c3, enabling PodNetworkConnectivityCheck in release builds didn't work as expected, the following message is continuously output in kube-apiserver-operator logs for a long time(more than 30 minutes),the PodNetworkConnectivityCheck crd still not was created, unable to continue verification, so assign bug back.

I1010 10:30:01.198964       1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator", UID:"870c8cd3-5a47-4eb6-a9a7-2aed44f9f948", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'EndpointDetectionFailure' PodNetworkConnectivityCheck.controlplane.operator.openshift.io/kube-apiserver-ip-10-0-155-67.us-east-2.compute.internal-to-load-balancer-api-internal -n openshift-kube-apiserver: the server could not find the requested resource (post podnetworkconnectivitychecks.controlplane.operator.openshift.io)

Comment 4 Luis Sanchez 2020-10-12 14:18:54 UTC
This BZ is just for the check-endpoints tool, not the connectivitycheckcontrollers themselves. You can verify the check-endpoints containers are idle when the PodNetworkConnectivityCheck CRDdoes not exist (which is now the default).

Comment 5 Ke Wang 2020-10-13 04:28:47 UTC
Verified with OCP 4.6.0-0.nightly-2020-10-12-223649,

Updated the KubeAPIServer as following, 
$ oc edit kubeapiserver cluster
spec:
  unsupportedConfigOverrides:
    operator:
      enableConnectivityCheckController: "True"

Wait for a while, check the kube-apiserver-operator log, the podnetworkconnectivitycheck crd is not available,  
I1013 04:15:43.234129       1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator", UID:"981f0ab2-418      a-4cdf-81b4-b2d0c1500f48", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'EndpointDetectionFailure' PodNetworkConnectivityCheck.controlplane.operato      r.openshift.io/kube-apiserver-ip-xx-xx-xx-xx.us-east-2.compute.internal-to-etcd-server-ip-xx-x-xx-xxx.us-east-2.compute.internal -n openshift-kube-apiserver: the server could not find t      he requested resource (post podnetworkconnectivitychecks.controlplane.operator.openshift.io) 

$ oc get pods -n openshift-apiserver
NAME                         READY   STATUS             RESTARTS   AGE
apiserver-6c7864474f-5d7wp   1/2     CrashLoopBackOff   12         62m
apiserver-6c7864474f-j9njx   1/2     CrashLoopBackOff   12         66m
apiserver-6c7864474f-mflw7   1/2     CrashLoopBackOff   12         64m

$ oc get pods -n openshift-kube-apiserver | grep kube-apiserver
kube-apiserver-ip.xxx.us-east-2.compute.internal      4/5     CrashLoopBackOff   12         75m
kube-apiserver-ip.xx.us-east-2.compute.internal      4/5     CrashLoopBackOff   12         73m
kube-apiserver-ip.x.us-east-2.compute.internal      4/5     CrashLoopBackOff   12         69m

Check if the openshift-apiserver-check-endpoints and kube-apiserver-check-endpoints containers is running on all masters,
$ nodes=$(oc get no | grep master | awk '{print $1}')
$ for no in $nodes; do oc debug node/$no -- chroot /host crictl ps | grep 'check-endpoints';done

No running check-endpoints tools openshift-apiserver-check-endpoints and kube-apiserver-check-endpoints containers found, it is as expected.

Comment 7 errata-xmlrpc 2020-10-27 16:45:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.