Bug 1889677

Summary: Upgrade is stuck in kube-apiserver CrashLoopBackOff on check-endpoints container (openshift-apiserver pods are also observed same CrashLoopBackOff)
Product: OpenShift Container Platform Reporter: MinLi <minmli>
Component: kube-apiserverAssignee: Luis Sanchez <sanchezl>
Status: CLOSED DUPLICATE QA Contact: Ke Wang <kewang>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.7CC: aos-bugs, mfojtik, xxia
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-21 13:16:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description MinLi 2020-10-20 11:14:43 UTC
Description of problem:
The kube-apiserver pod ran into CrashLoopBackOff due to the PodNetworkConnectivityCheck fail when the cluster upgrade from 4.6.0-rc.4 to 4.7 nightly build

Version-Release number of selected component (if applicable):
4.6.0-rc.4

How reproducible:


Steps to Reproduce:
1.install a cluster with version 4.6.0-rc.4
2.upgrade the cluster to 4.7.0-0.nightly-2020-10-17-034503
3.

Actual results:
2.upgrade fail, co openshift-apiserver degrade, and namespace pods ran into CrashLoopBackOff status due to ConnectivityCheck fail


Expected results:
2.upgrade succeed

Additional info:
$ oc get po -n openshift-kube-apiserver 
NAME                                                           READY   STATUS             RESTARTS   AGE
installer-5-ip-10-0-176-199.us-east-2.compute.internal         0/1     Completed          0          108m
kube-apiserver-ip-10-0-147-197.us-east-2.compute.internal      4/5     CrashLoopBackOff   25         3h14m
kube-apiserver-ip-10-0-176-199.us-east-2.compute.internal      4/5     CrashLoopBackOff   25         108m
kube-apiserver-ip-10-0-214-165.us-east-2.compute.internal      4/5     CrashLoopBackOff   25         3h18m
revision-pruner-4-ip-10-0-147-197.us-east-2.compute.internal   0/1     Completed          0          3h3m
revision-pruner-4-ip-10-0-176-199.us-east-2.compute.internal   0/1     Completed          0          3h1m
revision-pruner-4-ip-10-0-214-165.us-east-2.compute.internal   0/1     Completed          0          3h6m


$ oc get pod kube-apiserver-ip-10-0-147-197.us-east-2.compute.internal -n openshift-kube-apiserver -o yaml
...
  - containerID: cri-o://05d81129a3fb841f1bcb564ac5a49300a05fd91624a71b3a5e35ef099d963902
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4691dc29704c9cb06d2345894f1a8f074b58a0d208318c5218241388b0916e1b
    imageID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4691dc29704c9cb06d2345894f1a8f074b58a0d208318c5218241388b0916e1b
    lastState:
      terminated:
        containerID: cri-o://05d81129a3fb841f1bcb564ac5a49300a05fd91624a71b3a5e35ef099d963902
        exitCode: 0
        finishedAt: "2020-10-20T11:09:50Z"
        reason: Completed
        startedAt: "2020-10-20T11:09:44Z"
    name: kube-apiserver-check-endpoints
    ready: false
    restartCount: 31
    started: false
    state:
      waiting:
        message: back-off 5m0s restarting failed container=kube-apiserver-check-endpoints
          pod=kube-apiserver-ip-10-0-147-197.us-east-2.compute.internal_openshift-kube-apiserver(bb2901174f0fad53f457a459ecaaeb2e)
        reason: CrashLoopBackOff



$ oc logs -n openshift-kube-apiserver kube-apiserver-ip-10-0-147-197.us-east-2.compute.internal -c kube-apiserver-check-endpoints 
...
E1020 10:43:43.154124       1 reflector.go:127] k8s.io/client-go.0/tools/cache/reflector.go:156: Failed to watch *v1alpha1.PodNetworkConnectivityCheck: failed to list *v1alpha1.PodNetworkConnectivityCheck: the server could not find the requested resource (get podnetworkconnectivitychecks.controlplane.operator.openshift.io)
I1020 10:43:43.345413       1 start_stop_controllers.go:70] The server doesn't have a resource type "podnetworkconnectivitychecks.controlplane.operator.openshift.io".
~

Comment 1 MinLi 2020-10-20 11:18:08 UTC
There is a similar bug in 4.7 :https://bugzilla.redhat.com/show_bug.cgi?id=1876166

Comment 3 W. Trevor King 2021-04-05 17:46:13 UTC
Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1].  If you feel like this bug still needs to be a suspect, please add keyword again.

[1]: https://github.com/openshift/enhancements/pull/475