Verification steps, 1. Add foo: ["bar"] as additional argument for kube-apiserver via unsupportedConfigOverrides $ oc edit kubeapiserver cluster spec: ..... unsupportedConfigOverrides: apiServerArguments: foo: - bar In another terminal console, run script test.sh #!/usr/bin/env bash while true do oc get co/kube-apiserver;oc get kubeapiserver -oyaml | grep -A15 'latestAvailableRevision';sleep 30 done Wait for a few minutes, the following will be displayed. NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE kube-apiserver 4.9.0-0.nightly-2021-09-10-170926 True True False 6h5m NodeInstallerProgressing: 1 nodes are at revision 5; 0 nodes have achieved new revision 6 latestAvailableRevision: 6 latestAvailableRevisionReason: "" nodeStatuses: - currentRevision: 5 nodeName: kewang-13sno1-gjpzm-master-0.c.openshift-qe.internal targetRevision: 6 readyReplicas: 0 kind: List metadata: resourceVersion: "" selfLink: "" ... NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE kube-apiserver 4.9.0-0.nightly-2021-09-10-170926 True True False 6h16m NodeInstallerProgressing: 1 nodes are at revision 5; 0 nodes have achieved new revision 6 latestAvailableRevision: 6 latestAvailableRevisionReason: "" nodeStatuses: - currentRevision: 5 lastFailedReason: OperandFailedFallback lastFailedRevision: 6 lastFailedRevisionErrors: - 'fallback to last-known-good revision 5 took place after: waiting for kube-apiserver static pod to listen on port 6443: Get "https://localhost:6443/healthz/etcd": dial tcp [::1]:6443: connect: connection refused (NetworkError)' lastFailedTime: "2021-09-13T08:22:23Z" lastFallbackCount: 1 nodeName: kewang-13sno1-gjpzm-master-0.c.openshift-qe.internal targetRevision: 6 readyReplicas: 0 kind: List metadata: ... NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE kube-apiserver 4.9.0-0.nightly-2021-09-10-170926 True True True 6h18m StaticPodFallbackRevisionDegraded: a static pod kube-apiserver-kewang-13sno1-gjpzm-master-0.c.openshift-qe.internal was rolled back to revision 6 due to waiting for kube-apiserver static pod to listen on port 6443: Get "https://localhost:6443/healthz/etcd": dial tcp [::1]:6443: connect: connection refused latestAvailableRevision: 6 latestAvailableRevisionReason: "" nodeStatuses: - currentRevision: 5 lastFailedReason: OperandFailedFallback lastFailedRevision: 6 lastFailedRevisionErrors: - 'fallback to last-known-good revision 5 took place after: waiting for kube-apiserver static pod to listen on port 6443: Get "https://localhost:6443/healthz/etcd": dial tcp [::1]:6443: connect: connection refused (NetworkError)' lastFailedTime: "2021-09-13T08:22:23Z" lastFallbackCount: 1 nodeName: kewang-13sno1-gjpzm-master-0.c.openshift-qe.internal targetRevision: 6 readyReplicas: 0 kind: List metadata: resourceVersion: "" 2. Wait another more minutes, until the lastFallbackCount value is 1 and shows above message, then remove the addition argument from unsupportedConfigOverrides. This trigger a new revision, the kube-apiserver is back. ... NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE kube-apiserver 4.9.0-0.nightly-2021-09-10-170926 True False False 6h26m latestAvailableRevision: 7 latestAvailableRevisionReason: "" nodeStatuses: - currentRevision: 7 lastFailedReason: OperandFailedFallback lastFailedRevision: 6 lastFailedRevisionErrors: - 'fallback to last-known-good revision 5 took place after: waiting for kube-apiserver static pod to listen on port 6443: Get "https://localhost:6443/healthz/etcd": dial tcp [::1]:6443: connect: connection refused (NetworkError)' lastFailedTime: "2021-09-13T08:22:23Z" lastFallbackCount: 1 nodeName: kewang-13sno1-gjpzm-master-0.c.openshift-qe.internal readyReplicas: 0 kind: List metadata: resourceVersion: "" NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE kube-apiserver 4.9.0-0.nightly-2021-09-10-170926 True False False 6h27m To log in master node, current kube-apiserver-last-known-good is pointed to kube-apiserver-pod-7 by the startup-monitor, sh-4.4# ls kube-apiserver-last-known-good -l lrwxrwxrwx. 1 root root 81 Sep 13 08:34 kube-apiserver-last-known-good -> /etc/kubernetes/static-pod-resources/kube-apiserver-pod-7/kube-apiserver-pod.yaml We can see startup-monitor was created for each revision of kube-apiserver by installer, after the operand becomes ready, , and then was removed. sh-4.4# find . -name 'kube-apiserver-startup-monitor-pod.yaml' ./kube-apiserver-pod-2/configmaps/kube-apiserver-pod/kube-apiserver-startup-monitor-pod.yaml ./kube-apiserver-pod-2/kube-apiserver-startup-monitor-pod.yaml ./kube-apiserver-pod-3/configmaps/kube-apiserver-pod/kube-apiserver-startup-monitor-pod.yaml ./kube-apiserver-pod-3/kube-apiserver-startup-monitor-pod.yaml ./kube-apiserver-pod-4/configmaps/kube-apiserver-pod/kube-apiserver-startup-monitor-pod.yaml ./kube-apiserver-pod-4/kube-apiserver-startup-monitor-pod.yaml ./kube-apiserver-pod-5/configmaps/kube-apiserver-pod/kube-apiserver-startup-monitor-pod.yaml ./kube-apiserver-pod-5/kube-apiserver-startup-monitor-pod.yaml ./kube-apiserver-pod-6/configmaps/kube-apiserver-pod/kube-apiserver-startup-monitor-pod.yaml ./kube-apiserver-pod-6/kube-apiserver-startup-monitor-pod.yaml ./kube-apiserver-pod-7/configmaps/kube-apiserver-pod/kube-apiserver-startup-monitor-pod.yaml ./kube-apiserver-pod-7/kube-apiserver-startup-monitor-pod.yaml sh-4.4# journalctl -b -u crio | grep -E '(Creating container|Removed container).*kube-apiserver-startup-monitor' ... Sep 13 08:33:54 kewang-13sno1-gjpzm-master-0.c.openshift-qe.internal crio[1621]: time="2021-09-13 08:33:54.866229951Z" level=info msg="Creating container: openshift-kube-apiserver/kube-apiserver-startup-monitor-kewang-13sno1-gjpzm-master-0.c.openshift-qe.internal/startup-monitor" id=9483384c-922e-4352-a2e4-a70f40323c1d name=/runtime.v1alpha2.RuntimeService/CreateContainer Sep 13 08:39:57 kewang-13sno1-gjpzm-master-0.c.openshift-qe.internal crio[1621]: time="2021-09-13 08:39:57.883444823Z" level=info msg="Removed container 066f86863e5aa322da1de33381e1c8e79f368a99a885f4d1f0c7125d0bb1632b: openshift-kube-apiserver/kube-apiserver-startup-monitor-kewang-13sno1-gjpzm-master-0.c.openshift-qe.internal/startup-monitor" id=a796ddad-272e-4ac3-ad29-3f94351ddacc name=/runtime.v1alpha2.RuntimeService/RemoveContainer From above , we can see the startup-monitor watched the operand pods for readiness, the fallback mechanism works as expected. So move the bug VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759