Bug 1889677 - Upgrade is stuck in kube-apiserver CrashLoopBackOff on check-endpoints container (openshift-apiserver pods are also observed same CrashLoopBackOff)
Summary: Upgrade is stuck in kube-apiserver CrashLoopBackOff on check-endpoints contai...
Status: CLOSED DUPLICATE of bug 1887718
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 4.7
Hardware: x86_64
OS: Linux
Target Milestone: ---
: 4.7.0
Assignee: Luis Sanchez
QA Contact: Ke Wang
Depends On:
TreeView+ depends on / blocked
Reported: 2020-10-20 11:14 UTC by MinLi
Modified: 2020-10-21 13:16 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2020-10-21 13:16:27 UTC
Target Upstream Version:

Attachments (Terms of Use)

Description MinLi 2020-10-20 11:14:43 UTC
Description of problem:
The kube-apiserver pod ran into CrashLoopBackOff due to the PodNetworkConnectivityCheck fail when the cluster upgrade from 4.6.0-rc.4 to 4.7 nightly build

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.install a cluster with version 4.6.0-rc.4
2.upgrade the cluster to 4.7.0-0.nightly-2020-10-17-034503

Actual results:
2.upgrade fail, co openshift-apiserver degrade, and namespace pods ran into CrashLoopBackOff status due to ConnectivityCheck fail

Expected results:
2.upgrade succeed

Additional info:
$ oc get po -n openshift-kube-apiserver 
NAME                                                           READY   STATUS             RESTARTS   AGE
installer-5-ip-10-0-176-199.us-east-2.compute.internal         0/1     Completed          0          108m
kube-apiserver-ip-10-0-147-197.us-east-2.compute.internal      4/5     CrashLoopBackOff   25         3h14m
kube-apiserver-ip-10-0-176-199.us-east-2.compute.internal      4/5     CrashLoopBackOff   25         108m
kube-apiserver-ip-10-0-214-165.us-east-2.compute.internal      4/5     CrashLoopBackOff   25         3h18m
revision-pruner-4-ip-10-0-147-197.us-east-2.compute.internal   0/1     Completed          0          3h3m
revision-pruner-4-ip-10-0-176-199.us-east-2.compute.internal   0/1     Completed          0          3h1m
revision-pruner-4-ip-10-0-214-165.us-east-2.compute.internal   0/1     Completed          0          3h6m

$ oc get pod kube-apiserver-ip-10-0-147-197.us-east-2.compute.internal -n openshift-kube-apiserver -o yaml
  - containerID: cri-o://05d81129a3fb841f1bcb564ac5a49300a05fd91624a71b3a5e35ef099d963902
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4691dc29704c9cb06d2345894f1a8f074b58a0d208318c5218241388b0916e1b
    imageID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4691dc29704c9cb06d2345894f1a8f074b58a0d208318c5218241388b0916e1b
        containerID: cri-o://05d81129a3fb841f1bcb564ac5a49300a05fd91624a71b3a5e35ef099d963902
        exitCode: 0
        finishedAt: "2020-10-20T11:09:50Z"
        reason: Completed
        startedAt: "2020-10-20T11:09:44Z"
    name: kube-apiserver-check-endpoints
    ready: false
    restartCount: 31
    started: false
        message: back-off 5m0s restarting failed container=kube-apiserver-check-endpoints
        reason: CrashLoopBackOff

$ oc logs -n openshift-kube-apiserver kube-apiserver-ip-10-0-147-197.us-east-2.compute.internal -c kube-apiserver-check-endpoints 
E1020 10:43:43.154124       1 reflector.go:127] k8s.io/client-go@v0.19.0/tools/cache/reflector.go:156: Failed to watch *v1alpha1.PodNetworkConnectivityCheck: failed to list *v1alpha1.PodNetworkConnectivityCheck: the server could not find the requested resource (get podnetworkconnectivitychecks.controlplane.operator.openshift.io)
I1020 10:43:43.345413       1 start_stop_controllers.go:70] The server doesn't have a resource type "podnetworkconnectivitychecks.controlplane.operator.openshift.io".

Comment 1 MinLi 2020-10-20 11:18:08 UTC
There is a similar bug in 4.7 :https://bugzilla.redhat.com/show_bug.cgi?id=1876166

Note You need to log in before you can comment on or make changes to this bug.