+++ This bug was initially created as a clone of Bug #2003946 +++ +++ This bug was initially created as a clone of Bug #1998031 +++ From https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-ovirt-upgrade/1430418602502131712: ``` [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Degraded 1 unexpected clusteroperator state transitions during e2e test run Aug 25 08:06:51.762 - 808s E clusteroperator/openshift-apiserver condition/Degraded status/True reason/APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-apiserver () ``` From loki (https://grafana-loki.ci.openshift.org/explore?orgId=1&left=%5B%221629842400000%22,%221629928799000%22,%22Grafana%20Cloud%22,%7B%22expr%22:%22%7Binvoker%3D%5C%22openshift-internal-ci%2Fperiodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-ovirt-upgrade%2F1430418602502131712%5C%22%7D%20%7C%20unpack%20%7C%20namespace%3D%5C%22openshift-authentication-operator%5C%22%20%7C%20container_name%3D%5C%22authentication-operator%5C%22%22%7D%5D): ``` 2021-08-25 10:06:51 I0825 08:06:51.740397 1 status_controller.go:211] clusteroperator/authentication diff {"status":{"conditions":[{"lastTransitionTime":"2021-08-25T08:06:51Z","message":"APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-oauth-apiserver ()\nOAuthServerDeploymentDegraded: 1 of 3 requested instances are unavailable for oauth-openshift.openshift-authentication ()","reason":"APIServerDeployment_UnavailablePod::OAuthServerDeployment_UnavailablePod","status":"True","type":"Degraded"},{"lastTransitionTime":"2021-08-25T07:45:00Z","message":"AuthenticatorCertKeyProgressing: All is well","reason":"AsExpected","status":"False","type":"Progressing"},{"lastTransitionTime":"2021-08-25T07:57:01Z","message":"All is well","reason":"AsExpected","status":"True","type":"Available"},{"lastTransitionTime":"2021-08-25T06:49:31Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"}]}} 2021-08-25 10:06:24 I0825 08:06:24.738959 1 request.go:668] Waited for 1.015853827s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-oauth-apiserver/configmaps/audit ``` Checking metrics (`kube_deployment_status_replicas_available{namespace="openshift-authentication"}`) the deployment had 1 replicas missing at 08:06:51.740397 There were 3 RSs (from oldest to youngest) - oauth-openshift-7fb8f95bd8 - oauth-openshift-6f975f59c - oauth-openshift-65d76d9bcd - 08:04:43 -> 08:20:55 less than 3 replicas available Checking events.json: ``` "message": "Scaled up replica set oauth-openshift-65d76d9bcd to 1", "message": "Scaled up replica set oauth-openshift-65d76d9bcd to 2", "message": "Scaled up replica set oauth-openshift-65d76d9bcd to 3", ``` No events about scaling oauth-openshift-65d76d9bcd. Checking kubelet: ``` ovirt14-vhnb7-master-1:18830:Aug 25 07:43:47.183806 ovirt14-vhnb7-master-1 hyperkube[1846]: I0825 07:43:47.183768 1846 kubelet.go:1944] "SyncLoop ADD" source="api" pods=[openshift-authentication/oauth-openshift-65d76d9bcd-mlj5s] ovirt14-vhnb7-master-1:22979:Aug 25 08:09:43.697971 ovirt14-vhnb7-master-1 hyperkube[1846]: I0825 08:09:43.697943 1846 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-authentication/oauth-openshift-65d76d9bcd-mlj5s] ovirt14-vhnb7-master-1:24774:Aug 25 08:10:11.266126 ovirt14-vhnb7-master-1 hyperkube[1846]: I0825 08:10:11.266097 1846 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-authentication/oauth-openshift-65d76d9bcd-mlj5s] ovirt14-vhnb7-master-1:32270:Aug 25 08:14:43.473159 ovirt14-vhnb7-master-1 hyperkube[1971]: I0825 08:14:43.471314 1971 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-authentication/oauth-openshift-65d76d9bcd-c8mv4] ovirt14-vhnb7-master-2:16925:Aug 25 07:42:44.615329 ovirt14-vhnb7-master-2 hyperkube[1850]: I0825 07:42:44.615293 1850 kubelet.go:1944] "SyncLoop ADD" source="api" pods=[openshift-authentication/oauth-openshift-65d76d9bcd-k4g7p] ovirt14-vhnb7-master-2:23385:Aug 25 08:14:50.475225 ovirt14-vhnb7-master-2 hyperkube[1850]: I0825 08:14:50.475018 1850 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-authentication/oauth-openshift-65d76d9bcd-k4g7p] ovirt14-vhnb7-master-2:26008:Aug 25 08:15:24.629063 ovirt14-vhnb7-master-2 hyperkube[1850]: I0825 08:15:24.629032 1850 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-authentication/oauth-openshift-65d76d9bcd-k4g7p] ovirt14-vhnb7-master-2:33998:Aug 25 08:19:59.335853 ovirt14-vhnb7-master-2 hyperkube[1965]: I0825 08:19:59.335828 1965 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-authentication/oauth-openshift-65d76d9bcd-ptwm7] ovirt14-vhnb7-master-0:21275:Aug 25 07:44:30.963045 ovirt14-vhnb7-master-0 hyperkube[1851]: I0825 07:44:30.962513 1851 kubelet.go:1944] "SyncLoop ADD" source="api" pods=[openshift-authentication/oauth-openshift-65d76d9bcd-snm9m] ovirt14-vhnb7-master-0:24990:Aug 25 08:04:44.819607 ovirt14-vhnb7-master-0 hyperkube[1851]: I0825 08:04:44.819553 1851 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-authentication/oauth-openshift-65d76d9bcd-snm9m] ovirt14-vhnb7-master-0:26129:Aug 25 08:05:15.971355 ovirt14-vhnb7-master-0 hyperkube[1851]: I0825 08:05:15.971321 1851 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-authentication/oauth-openshift-65d76d9bcd-snm9m] ovirt14-vhnb7-master-0:34407:Aug 25 08:09:40.277931 ovirt14-vhnb7-master-0 hyperkube[1991]: I0825 08:09:40.277906 1991 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-authentication/oauth-openshift-65d76d9bcd-mnvdm] ``` - oauth-openshift-65d76d9bcd-snm9m: <07:44:30.962513; 08:04:44.819553> -> oauth-openshift-65d76d9bcd-mnvdm: <08:09:40.277906; ???> - oauth-openshift-65d76d9bcd-mlj5s: <07:43:47.183768; 08:09:43.697943> -> oauth-openshift-65d76d9bcd-c8mv4: <08:14:43.471314; ???> - oauth-openshift-65d76d9bcd-k4g7p: <07:42:44.615293; 08:14:50.475018> -> oauth-openshift-65d76d9bcd-ptwm7: <08:19:59.335828; ???> Checking `kube_replicaset_spec_replicas{namespace="openshift-authentication",replicaset=~"oauth-openshift-65d76d9bcd"}`, the value was set to 3 the whole time. Meaning something just deleted the pods one by one. Although, the same can be observed with openshift-apiserver instances (see `kube_deployment_status_replicas_available{namespace="openshift-apiserver"}`). ``` Aug 25 07:33:31.075468 ovirt14-vhnb7-master-0 hyperkube[1851]: I0825 07:33:31.075433 1851 kubelet.go:1944] "SyncLoop ADD" source="api" pods=[openshift-apiserver/apiserver-7478d5665f-5mlx4] Aug 25 08:04:44.754920 ovirt14-vhnb7-master-0 hyperkube[1851]: I0825 08:04:44.754791 1851 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-apiserver/apiserver-7478d5665f-5mlx4] Aug 25 08:05:57.242004 ovirt14-vhnb7-master-0 hyperkube[1851]: I0825 08:05:57.241938 1851 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-apiserver/apiserver-7478d5665f-5mlx4] Aug 25 08:09:40.255344 ovirt14-vhnb7-master-0 hyperkube[1991]: I0825 08:09:40.255312 1991 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-apiserver/apiserver-7478d5665f-g6nt9] Aug 25 07:31:59.161083 ovirt14-vhnb7-master-1 hyperkube[1846]: I0825 07:31:59.161061 1846 kubelet.go:1944] "SyncLoop ADD" source="api" pods=[openshift-apiserver/apiserver-7478d5665f-wktv4] Aug 25 08:09:44.274744 ovirt14-vhnb7-master-1 hyperkube[1846]: I0825 08:09:44.274715 1846 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-apiserver/apiserver-7478d5665f-wktv4] Aug 25 08:10:59.127089 ovirt14-vhnb7-master-1 hyperkube[1846]: I0825 08:10:59.127057 1846 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-apiserver/apiserver-7478d5665f-wktv4] Aug 25 08:14:43.407859 ovirt14-vhnb7-master-1 hyperkube[1971]: I0825 08:14:43.407823 1971 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-apiserver/apiserver-7478d5665f-kdvqw] Aug 25 08:14:50.437017 ovirt14-vhnb7-master-1 hyperkube[1971]: I0825 08:14:50.436981 1971 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-apiserver-operator/openshift-apiserver-operator-7954c899b6-hq7xx] Aug 25 07:30:31.271077 ovirt14-vhnb7-master-2 hyperkube[1850]: I0825 07:30:31.271018 1850 kubelet.go:1944] "SyncLoop ADD" source="api" pods=[openshift-apiserver/apiserver-7478d5665f-krq8w] Aug 25 08:14:50.386938 ovirt14-vhnb7-master-2 hyperkube[1850]: I0825 08:14:50.386718 1850 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-apiserver-operator/openshift-apiserver-operator-7954c899b6-p2wbp] Aug 25 08:14:50.493640 ovirt14-vhnb7-master-2 hyperkube[1850]: I0825 08:14:50.492425 1850 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-apiserver/apiserver-7478d5665f-krq8w] Aug 25 08:15:04.621295 ovirt14-vhnb7-master-2 hyperkube[1850]: I0825 08:15:04.621266 1850 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-apiserver-operator/openshift-apiserver-operator-7954c899b6-p2wbp] Aug 25 08:16:14.597465 ovirt14-vhnb7-master-2 hyperkube[1850]: I0825 08:16:14.597413 1850 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-apiserver/apiserver-7478d5665f-krq8w] Aug 25 08:19:59.338981 ovirt14-vhnb7-master-2 hyperkube[1965]: I0825 08:19:59.338960 1965 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-apiserver/apiserver-7478d5665f-gslsh] ``` - apiserver-7478d5665f-5mlx4: <07:33:31.075433; 08:04:44.754791> -> apiserver-7478d5665f-g6nt9: <08:09:40.255312; ???> - apiserver-7478d5665f-wktv4: <07:31:59.161061; 08:09:44.274715> -> apiserver-7478d5665f-kdvqw: <08:14:43.407823; ???> - apiserver-7478d5665f-krq8w: <07:30:31.271018; 08:14:50.386718> -> apiserver-7478d5665f-gslsh: <08:19:59.338960; ???> Putting both alongside (master0 -> master1 -> master2) - apiserver-7478d5665f-5mlx4: <07:33:31.075433; 08:04:44.754791> -> apiserver-7478d5665f-g6nt9: <08:09:40.255312; ???> - oauth-openshift-65d76d9bcd-snm9m: <07:44:30.962513; 08:04:44.819553> -> oauth-openshift-65d76d9bcd-mnvdm: <08:09:40.277906; ???> - apiserver-7478d5665f-wktv4: <07:31:59.161061; 08:09:44.274715> -> apiserver-7478d5665f-kdvqw: <08:14:43.407823; ???> - oauth-openshift-65d76d9bcd-mlj5s: <07:43:47.183768; 08:09:43.697943> -> oauth-openshift-65d76d9bcd-c8mv4: <08:14:43.471314; ???> - apiserver-7478d5665f-krq8w: <07:30:31.271018; 08:14:50.386718> -> apiserver-7478d5665f-gslsh: <08:19:59.338960; ???> - oauth-openshift-65d76d9bcd-k4g7p: <07:42:44.615293; 08:14:50.475018> -> oauth-openshift-65d76d9bcd-ptwm7: <08:19:59.335828; ???> Checking kubelets for cordoning/drain/update: ``` Aug 25 08:04:42.977292 ovirt14-vhnb7-master-0 root[392852]: machine-config-daemon[382060]: Starting update from rendered-master-9faf006a78c47c3fb7ca6a9223f67108 to rendered-master-7242100dcf595a1cee714eaa5fee9736: &{osUpdate:true kargs:false fips:false passwd:false files:true units:true kernelType:false extensions:false} Aug 25 08:04:43.016573 ovirt14-vhnb7-master-0 root[392853]: machine-config-daemon[382060]: Node has been successfully cordoned Aug 25 08:04:43.020530 ovirt14-vhnb7-master-0 root[392854]: machine-config-daemon[382060]: Update prepared; beginning drain Aug 25 08:05:57.764428 ovirt14-vhnb7-master-0 root[400279]: machine-config-daemon[382060]: drain complete Aug 25 08:06:40.588616 ovirt14-vhnb7-master-0 root[403514]: machine-config-daemon[382060]: initiating reboot: Node will reboot into config rendered-master-7242100dcf595a1cee714eaa5fee9736 Aug 25 08:06:40.600122 ovirt14-vhnb7-master-0 systemd[1]: Started machine-config-daemon: Node will reboot into config rendered-master-7242100dcf595a1cee714eaa5fee9736. Aug 25 08:06:40.706811 ovirt14-vhnb7-master-0 systemd[1]: Stopping machine-config-daemon: Node will reboot into config rendered-master-7242100dcf595a1cee714eaa5fee9736... Aug 25 08:06:40.827596 ovirt14-vhnb7-master-0 systemd[1]: Stopped machine-config-daemon: Node will reboot into config rendered-master-7242100dcf595a1cee714eaa5fee9736. Aug 25 08:09:40.255344 ovirt14-vhnb7-master-0 hyperkube[1991]: I0825 08:09:40.255312 1991 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-apiserver/apiserver-7478d5665f-g6nt9] Aug 25 08:09:40.277931 ovirt14-vhnb7-master-0 hyperkube[1991]: I0825 08:09:40.277906 1991 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-authentication/oauth-openshift-65d76d9bcd-mnvdm] Aug 25 08:09:41.927616 ovirt14-vhnb7-master-1 root[392706]: machine-config-daemon[357804]: Starting update from rendered-master-9faf006a78c47c3fb7ca6a9223f67108 to rendered-master-7242100dcf595a1cee714eaa5fee9736: &{osUpdate:true kargs:false fips:false passwd:false files:true units:true kernelType:false extensions:false} Aug 25 08:09:41.968324 ovirt14-vhnb7-master-1 root[392714]: machine-config-daemon[357804]: Node has been successfully cordoned Aug 25 08:09:41.971817 ovirt14-vhnb7-master-1 root[392715]: machine-config-daemon[357804]: Update prepared; beginning drain Aug 25 08:10:59.273200 ovirt14-vhnb7-master-1 root[402449]: machine-config-daemon[357804]: drain complete Aug 25 08:11:45.707314 ovirt14-vhnb7-master-1 root[405834]: machine-config-daemon[357804]: initiating reboot: Node will reboot into config rendered-master-7242100dcf595a1cee714eaa5fee9736 Aug 25 08:11:45.719771 ovirt14-vhnb7-master-1 systemd[1]: Started machine-config-daemon: Node will reboot into config rendered-master-7242100dcf595a1cee714eaa5fee9736. Aug 25 08:11:45.767006 ovirt14-vhnb7-master-1 systemd[1]: Stopped machine-config-daemon: Node will reboot into config rendered-master-7242100dcf595a1cee714eaa5fee9736. Aug 25 08:14:43.473159 ovirt14-vhnb7-master-1 hyperkube[1971]: I0825 08:14:43.471314 1971 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-authentication/oauth-openshift-65d76d9bcd-c8mv4] Aug 25 08:14:50.437017 ovirt14-vhnb7-master-1 hyperkube[1971]: I0825 08:14:50.436981 1971 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-apiserver-operator/openshift-apiserver-operator-7954c899b6-hq7xx] Aug 25 08:14:48.609668 ovirt14-vhnb7-master-2 root[427626]: machine-config-daemon[354533]: Starting update from rendered-master-9faf006a78c47c3fb7ca6a9223f67108 to rendered-master-7242100dcf595a1cee714eaa5fee9736: &{osUpdate:true kargs:false fips:false passwd:false files:true units:true kernelType:false extensions:false} Aug 25 08:14:48.646779 ovirt14-vhnb7-master-2 root[427627]: machine-config-daemon[354533]: Node has been successfully cordoned Aug 25 08:14:48.652760 ovirt14-vhnb7-master-2 root[427628]: machine-config-daemon[354533]: Update prepared; beginning drain Aug 25 08:16:15.397745 ovirt14-vhnb7-master-2 root[439943]: machine-config-daemon[354533]: drain complete Aug 25 08:17:13.039866 ovirt14-vhnb7-master-2 root[443741]: machine-config-daemon[354533]: initiating reboot: Node will reboot into config rendered-master-7242100dcf595a1cee714eaa5fee9736 Aug 25 08:17:13.052337 ovirt14-vhnb7-master-2 systemd[1]: Started machine-config-daemon: Node will reboot into config rendered-master-7242100dcf595a1cee714eaa5fee9736. Aug 25 08:17:13.159769 ovirt14-vhnb7-master-2 systemd[1]: Stopping machine-config-daemon: Node will reboot into config rendered-master-7242100dcf595a1cee714eaa5fee9736... Aug 25 08:17:13.283820 ovirt14-vhnb7-master-2 systemd[1]: Stopped machine-config-daemon: Node will reboot into config rendered-master-7242100dcf595a1cee714eaa5fee9736. Aug 25 08:19:59.335853 ovirt14-vhnb7-master-2 hyperkube[1965]: I0825 08:19:59.335828 1965 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-authentication/oauth-openshift-65d76d9bcd-ptwm7] Aug 25 08:19:59.338981 ovirt14-vhnb7-master-2 hyperkube[1965]: I0825 08:19:59.338960 1965 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-apiserver/apiserver-7478d5665f-gslsh] ``` From openshift-authentication/oauth-openshift deployment: ``` { "apiVersion": "apps/v1", "kind": "Deployment", "metadata": { ... "name": "oauth-openshift", "namespace": "openshift-authentication", "resourceVersion": "28554", "uid": "ca7e25cc-c3d8-400f-89ef-49776b1e6094" }, "spec": { ... "replicas": 3, "strategy": { "rollingUpdate": { "maxSurge": 0, "maxUnavailable": 1 }, "type": "RollingUpdate" }, "template": { ... "spec": { "affinity": { "podAntiAffinity": { "requiredDuringSchedulingIgnoredDuringExecution": [ { "labelSelector": { "matchLabels": { "app": "oauth-openshift", "oauth-openshift-anti-affinity": "true" } }, "topologyKey": "kubernetes.io/hostname" } ] } }, ``` From openshift-apiserver/apiserver deployment: ``` { "apiVersion": "apps/v1", "kind": "Deployment", "metadata": { ... "name": "apiserver", "namespace": "openshift-apiserver", }, "spec": { ... "replicas": 3, "strategy": { "rollingUpdate": { "maxSurge": 0, "maxUnavailable": 1 }, "type": "RollingUpdate" }, "template": { ... "spec": { "affinity": { "podAntiAffinity": { "requiredDuringSchedulingIgnoredDuringExecution": [ { "labelSelector": { "matchLabels": { "apiserver": "true", "app": "openshift-apiserver-a", "openshift-apiserver-anti-affinity": "true" } }, "topologyKey": "kubernetes.io/hostname" } ] } }, ``` Conclusion: 1. Master nodes are drained one by one (approximately after 5 minute of each other). 2. Both apiserver/oauth-openshift pods do not have enough time to get back running/available in time. 3. There's no PDB that can avoid draining two apiserver/oauth-openshift instances at the same time so the number of available instances dropped from 3 to 1 at some point. Current behavior: - Both apiserver/oauth-openshift operators changes condition/Degraded to true when there's at least one replicas missing. Expected behavior: - The operators allow 1 replica to be unavailable and change condition/Degraded to true only when there are at least 2 replicas unavailable (since both deployments set maxUnavailable to 1) - PDB is defined to avoid 2 unavailable replicas at the same time
https://github.com/openshift/library-go/pull/1056/commits/8efd1883d406cc389eb25e2a257c4451dfbd668c needs to be ported to library-go in 4.7 as well. Let's revisit the backport 4.7 once there's a customer need.
The bug was reproduced in our 4.6-> 4.7 upgrade, need to back-port to library-go in 4.7 as well. Otherwise, customer will hit the bug.
During upgrade from 4.6-> 4.7, kube-apiserver goes into DEGRADED, $ oc get clusteroperators NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE ... dns 4.6.55 True False False 4h3m ... kube-apiserver 4.7.43 True True True 4h1m ... machine-config 4.6.55 True False False 4h3m Checked must-gather log, kube-apiserver goes Degrared at last because of unready node newugd-5971-pl9kh-control-plane-1. 2022-02-22T09:28:57.052569491Z I0222 09:28:57.039818 1 status_controller.go:213] clusteroperator/kube-apiserver diff {"status":{"conditions":[{"lastTransitionTime":"2022-02-22T09: 20:37Z","message":"NodeInstallerDegraded: 1 nodes are failing on revision 7:\nNodeInstallerDegraded: no detailed termination message, see `oc get -n \"openshift-kube-apiserver\" pods/\" installer-7-newugd-5971-pl9kh-control-plane-2\" -oyaml`","reason":"NodeInstaller_InstallerPodFailed","status":"True","type":"Degraded"},{"lastTransitionTime":"2022-02-22T09:18:17Z","message":"NodeInstallerProgressing: 3 nodes are at revision 6; 0 nodes have achieved new revision 7","reason":"NodeInstaller","status":"True","type":"Progressing"},{"lastTransitionTime":"2022-02-22T07:44:15Z","message":"StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 6; 0 nodes have achieved new revision 7","reason":"AsExpected","status":"True","type":"A vailable"},{"lastTransitionTime":"2022-02-22T07:41:58Z","message":"KubeletMinorVersionUpgradeable: Kubelet minor versions on 6 nodes are behind the expected API server version; nevertheless, they will continue to be supported in the next OpenShift minor version upgrade.","reason":"AsExpected","status":"True","type":"Upgradeable"}]}} 2022-02-22T09:28:57.072422040Z I0222 09:28:57.072352 1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserve r-operator", UID:"c99ea6b0-5aae-4cdb-9f89-a63525f9aa26", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/kube-apiserver changed: Degraded message changed from "NodeControllerDegraded: The master nodes not ready: node \"newugd-5971-pl9kh-control-plane-1\" not ready since 2022-02-22 09:28:46 +0000 UTC because KubeletNotReady ([container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful])\nNodeInstallerDegraded: 1 nodes are failing on revision 7:\nNodeInstallerDegraded: no detailed termination message, see `oc get -n \"openshift-kube-apiserver\" pods/\"installer-7-newugd-5971-pl9kh-control-plane-2\" -o yaml`" to "NodeInstallerDegraded: 1 nodes are failing on revision 7:\nNodeInstallerDegraded: no detailed termination message, see `oc get -n \"openshift-kube-apiserver\" pods/\"installer-7-newugd-5971-pl9kh-control-plane-2\" -oyaml`" Track the node status during upgrade with config machine, in log namespaces/openshift-machine-config-operator/pods/machine-config-daemon-99jhw/machine-config-daemon/machine-config-daemon/logs/current.log, machine config for node newugd-5971-pl9kh-control-plane-1, 2022-02-22T09:18:08.851003218Z I0222 09:18:08.850955 99197 daemon.go:383] Node newugd-5971-pl9kh-control-plane-1 is part of the control plane ... 2022-02-22T09:25:30.912795104Z I0222 09:25:30.912751 99197 update.go:1714] Update prepared; beginning drain ... 2022-02-22T09:26:46.204696209Z I0222 09:26:46.204678 99197 update.go:1714] drain complete ... 2022-02-22T09:26:46.326050179Z I0222 09:26:46.325997 99197 update.go:1714] initiating reboot: Node will reboot into config rendered-master-c43d10a537590420b7654f576eb5668a ... 2022-02-22T09:28:57.312231275Z I0222 09:28:57.312169 2771 update.go:1714] completed update for config rendered-master-c43d10a537590420b7654f576eb5668a Finished machine config on node newugd-5971-pl9kh-control-plane-1 at 09:28:57.312169. Checked the machine-config log, namespaces/openshift-machine-config-operator/pods/machine-config-controller-7799944b89-8pq4j/machine-config-controller/machine-config-controller/logs/current.log ... 2022-02-22T09:28:01.367703684Z I0222 09:28:01.367628 1 node_controller.go:419] Pool master: node newugd-5971-pl9kh-control-plane-1: Reporting unready: node newugd-5971-pl9kh-control-plane-1 is reporting OutOfDisk=Unknown 2022-02-22T09:28:46.839548728Z I0222 09:28:46.839434 1 node_controller.go:419] Pool master: node newugd-5971-pl9kh-control-plane-1: Reporting unready: node newugd-5971-pl9kh-control-plane-1 is reporting NotReady=False 2022-02-22T09:28:57.004252577Z I0222 09:28:57.004209 1 node_controller.go:419] Pool master: node newugd-5971-pl9kh-control-plane-1: Reporting unready: node newugd-5971-pl9kh-control-plane-1 is reporting Unschedulable 2022-02-22T09:28:57.255035779Z I0222 09:28:57.251994 1 node_controller.go:419] Pool master: node newugd-5971-pl9kh-control-plane-1: Completed update to rendered-master-c43d10a537590420b7654f576eb5668a 2022-02-22T09:28:57.347001835Z I0222 09:28:57.346582 1 node_controller.go:419] Pool master: node newugd-5971-pl9kh-control-plane-1: Reporting ready Node newugd-5971-pl9kh-control-plane-1 is ready at 09:28:57.346582, though the status for clusteroperator/kube-apiserver changed: Degraded message changed from "NodeControllerDegraded at 09:28:57.072352: The master nodes not ready: node \"newugd-5971-pl9kh-control-plane-1\" not ready since 2022-02-22 09:28:46. We can see kube-apiserver pods do not have enough time to get back running/available in time when master nodes are drained. the kube-apiserver operator allow 1 replica to be unavailable and change condition/Degraded to true.
Ke Wang, this issue is specifically about openshift-apiserver operator. In order to prevent kube-apiserver from going Degraded, there's more to be backported. Please see https://issues.redhat.com/browse/WRKLDS-293 for more detailed information. The improvements are targeted for 4.10. Given the issue is reported against 4.7 it is unlikely we would backport all the functionality there without providing a customer escalation.