Sometime in the last few weeks we have regressed (or AWS has regressed?) the behavior of draining requests from the apiserver when it begins graceful shutdown. This occurs in the presence of no other disruption - starting a cluster, triggering graceful shutdown of an apiserver pod via TERM on PID 1, and polling API calls from outside the cluster is sufficient to reproduce the issue on any 4.8 AWS cluster. From looking at the nightly upgrade runs this seems to have gotten substantially worse around 3/15, unfortunately that's also the time the crio regression around shutdown landed, so there's no start date that is obvious from our runs https://testgrid.k8s.io/redhat-openshift-ocp-release-4.8-informing#periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-upgrade. We definitely had full green for a period in 4.7, so either we are taking longer (in a way that triggers some unusual behavior) or we are getting included by AWS when we don't expect to. Will gather and report more data on this shortly. In the meantime, the test will be marked flaky while we investigate.
Clayton is right, AWS is broken. I did the following manual test which demonstrates that new connections are being forwarded to an API that undergoes a termination. 1. installed a 4.8 cluster (4.8.0-0.ci-2021-03-28-220420) 2. taken one API out of the pool (mv kube-apiserver-pod.yaml ../). To increase my chances of hitting an API server. 3. run the following script (which creates a new connection to a server on each run): while; do curl -H "User-Agent: james" -H "Authorization: Bearer REDACTED" https://api.polmonkmkimp.group-b.devcluster.openshift.com:6443/api/v1/namespaces -k -m 2 -o /dev/null -sw "$(date +%T) %{http_code} %{time_total}s\n"; sleep 1; done 4. sent a SIGTERM to one server: kill -s SIGTERM $PID here's what I have found out: requests were received around ~T+70s (--shutdown-delay-duration) which is VERY late and indicates that AWS didn't take the API server out of rotation. E0: T+0s: SIGTERM sent to 10-0-131-183 (date; kill -s SIGTERM 143136; Mon Mar 29 12:16:54 UTC 2021) E1: T+0s: 10-0-131-183 receives a TERM signal (oc get events -n openshift-kube-apiserver --sort-by={.metadata.creationTimestamp} -ojson | jq '.items' | jq '.[] | select(.reason=="TerminationStart")') { "apiVersion": "v1", "eventTime": null, "firstTimestamp": null, "involvedObject": { "apiVersion": "v1", "kind": "Pod", "name": "kube-apiserver-ip-10-0-131-183.ec2.internal", "namespace": "openshift-kube-apiserver" }, "kind": "Event", "lastTimestamp": null, "message": "Received signal to terminate, becoming unready, but keeping serving", "metadata": { "creationTimestamp": "2021-03-29T12:16:54Z", "name": "kube-apiserver-ip-10-0-131-183.ec2.internal.1670cf5163faf315", "namespace": "openshift-kube-apiserver", "resourceVersion": "88811", "uid": "ea1efadc-3cdf-4275-a83f-cfd8ccc7472d" }, "reason": "TerminationStart", "reportingComponent": "", "reportingInstance": "", "source": { "component": "apiserver", "host": "ip-10-0-131-183" }, "type": "Normal" } E2: T+0s: logs from 10-0-131-183 filtered by the user agent (oc logs -n openshift-kube-apiserver kube-apiserver-ip-10-0-131-183.ec2.internal -f | grep "James") I0329 12:17:00.565819 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="232.707358ms" userAgent="james" srcIP="10.0.51.233:58801" resp=200 I0329 12:17:02.404500 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="232.509186ms" userAgent="james" srcIP="10.0.51.233:21409" resp=200 I0329 12:17:08.536342 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="226.236527ms" userAgent="james" srcIP="10.0.51.233:23306" resp=200 I0329 12:17:10.368416 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="231.572429ms" userAgent="james" srcIP="10.0.51.233:40110" resp=200 I0329 12:17:19.529428 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="223.988728ms" userAgent="james" srcIP="10.0.51.233:18542" resp=200 I0329 12:17:23.198021 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="232.905483ms" userAgent="james" srcIP="10.0.51.233:46005" resp=200 I0329 12:17:25.039396 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="233.728848ms" userAgent="james" srcIP="10.0.51.233:10436" resp=200 I0329 12:17:26.857096 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="227.374887ms" userAgent="james" srcIP="10.0.51.233:43878" resp=200 I0329 12:17:28.701398 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="223.498453ms" userAgent="james" srcIP="10.0.51.233:64204" resp=200 I0329 12:17:30.530245 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="229.98221ms" userAgent="james" srcIP="10.0.51.233:54945" resp=200 I0329 12:17:32.333707 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="224.751331ms" userAgent="james" srcIP="10.0.51.233:50888" resp=200 I0329 12:17:35.986866 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="228.489524ms" userAgent="james" srcIP="10.0.51.233:5798" resp=200 I0329 12:17:37.805527 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="229.850606ms" userAgent="james" srcIP="10.0.51.233:63211" resp=200 I0329 12:17:41.469994 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="233.488308ms" userAgent="james" srcIP="10.0.51.233:51799" resp=200 I0329 12:17:43.304633 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="231.870691ms" userAgent="james" srcIP="10.0.51.233:31088" resp=200 I0329 12:17:46.961367 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="228.730156ms" userAgent="james" srcIP="10.0.51.233:44535" resp=200 I0329 12:17:50.612538 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="242.212359ms" userAgent="james" srcIP="10.0.51.233:29328" resp=200 W0329 12:17:52.200501 18 patch_genericapiserver.go:84] Request to "/api/v1/namespaces" (source IP 10.0.51.233:37342, user agent "james") through a connection created very late in the graceful termination process (more than 80% has passed), possibly a sign for a broken load balancer setup. I0329 12:17:52.432302 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="231.829746ms" userAgent="james" srcIP="10.0.51.233:37342" resp=200 W0329 12:17:54.026533 18 patch_genericapiserver.go:84] Request to "/api/v1/namespaces" (source IP 10.0.51.233:27541, user agent "james") through a connection created very late in the graceful termination process (more than 80% has passed), possibly a sign for a broken load balancer setup. I0329 12:17:54.255197 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="228.686402ms" userAgent="james" srcIP="10.0.51.233:27541" resp=200 W0329 12:17:55.839230 18 patch_genericapiserver.go:84] Request to "/api/v1/namespaces" (source IP 10.0.51.233:50792, user agent "james") through a connection created very late in the graceful termination process (more than 80% has passed), possibly a sign for a broken load balancer setup. I0329 12:17:56.065143 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="225.954251ms" userAgent="james" srcIP="10.0.51.233:50792" resp=200 W0329 12:17:59.508938 18 patch_genericapiserver.go:84] Request to "/api/v1/namespaces" (source IP 10.0.51.233:10138, user agent "james") through a connection created very late in the graceful termination process (more than 80% has passed), possibly a sign for a broken load balancer setup. I0329 12:17:59.738239 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="229.355668ms" userAgent="james" srcIP="10.0.51.233:10138" resp=200 W0329 12:18:01.342323 18 patch_genericapiserver.go:84] Request to "/api/v1/namespaces" (source IP 10.0.51.233:54300, user agent "james") through a connection created very late in the graceful termination process (more than 80% has passed), possibly a sign for a broken load balancer setup. I0329 12:18:01.575489 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="233.194447ms" userAgent="james" srcIP="10.0.51.233:54300" resp=200 W0329 12:18:03.170068 18 patch_genericapiserver.go:84] Request to "/api/v1/namespaces" (source IP 10.0.51.233:28110, user agent "james") through a connection created very late in the graceful termination process (more than 80% has passed), possibly a sign for a broken load balancer setup. I0329 12:18:03.396975 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="226.930343ms" userAgent="james" srcIP="10.0.51.233:28110" resp=200 error: http2: server sent GOAWAY and closed the connection; LastStreamID=3, ErrCode=NO_ERROR, debug="" E3: T+70s: TerminationMinimalShutdownDurationFinished recived on 12:18:04 (oc get events -n openshift-kube-apiserver --sort-by={.metadata.creationTimestamp} -ojson | jq '.items' | jq '.[] | select(.reason=="TerminationMinimalShutdownDurationFinished")') { "apiVersion": "v1", "eventTime": null, "firstTimestamp": null, "involvedObject": { "apiVersion": "v1", "kind": "Pod", "name": "kube-apiserver-ip-10-0-131-183.ec2.internal", "namespace": "openshift-kube-apiserver" }, "kind": "Event", "lastTimestamp": null, "message": "The minimal shutdown duration of 1m10s finished", "metadata": { "creationTimestamp": "2021-03-29T12:18:04Z", "name": "kube-apiserver-ip-10-0-131-183.ec2.internal.1670cf61b0f72d2d", "namespace": "openshift-kube-apiserver", "resourceVersion": "89139", "uid": "4eab8165-cb40-4a7a-b7e7-5401938b82f9" }, "reason": "TerminationMinimalShutdownDurationFinished", "reportingComponent": "", "reportingInstance": "", "source": { "component": "apiserver", "host": "ip-10-0-131-183" }, "type": "Normal" } E4: T+70s: TerminationStoppedServing recived on 12:18:04 (oc get events -n openshift-kube-apiserver --sort-by={.metadata.creationTimestamp} -ojson | jq '.items' | jq '.[] | select(.reason=="TerminationStoppedServing")') { "apiVersion": "v1", "eventTime": null, "firstTimestamp": null, "involvedObject": { "apiVersion": "v1", "kind": "Pod", "name": "kube-apiserver-ip-10-0-131-183.ec2.internal", "namespace": "openshift-kube-apiserver" }, "kind": "Event", "lastTimestamp": null, "message": "Server has stopped listening", "metadata": { "creationTimestamp": "2021-03-29T12:18:04Z", "name": "kube-apiserver-ip-10-0-131-183.ec2.internal.1670cf61b1990110", "namespace": "openshift-kube-apiserver", "resourceVersion": "89140", "uid": "84d567bd-f653-4fbd-b542-82fff0c72d81" }, "reason": "TerminationStoppedServing", "reportingComponent": "", "reportingInstance": "", "source": { "component": "apiserver", "host": "ip-10-0-131-183" }, "type": "Normal" } E4: T+118s: TerminationGracefulTerminationFinished recived on 12:19:04 (oc get events -n openshift-kube-apiserver --sort-by={.metadata.creationTimestamp} -ojson | jq '.items' | jq '.[] | select(.reason=="TerminationGracefulTerminationFinished")') { "apiVersion": "v1", "eventTime": null, "firstTimestamp": null, "involvedObject": { "apiVersion": "v1", "kind": "Pod", "name": "kube-apiserver-ip-10-0-131-183.ec2.internal", "namespace": "openshift-kube-apiserver" }, "kind": "Event", "lastTimestamp": null, "message": "All pending requests processed", "metadata": { "creationTimestamp": "2021-03-29T12:19:04Z", "name": "kube-apiserver-ip-10-0-131-183.ec2.internal.1670cf6fa9f8ef82", "namespace": "openshift-kube-apiserver", "resourceVersion": "89452", "uid": "c6bddd39-5611-4e19-a7fe-3fcbf204ef02" }, "reason": "TerminationGracefulTerminationFinished", "reportingComponent": "", "reportingInstance": "", "source": { "component": "apiserver", "host": "ip-10-0-131-183" }, "type": "Normal" }
Did a few more tests, in the last run the server received a request at T0 + 211s [root@ip-10-0-131-183 ~]# date; kill -s SIGTERM 278562 Mon Mar 29 16:25:23 UTC 2021 ❯ oc logs -n openshift-kube-apiserver kube-apiserver-ip-10-0-131-183.ec2.internal -f | grep "james" I0329 16:25:52.937653 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="222.902219ms" userAgent="james" srcIP="10.0.91.230:54834" resp=200 I0329 16:25:54.747723 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="223.178316ms" userAgent="james" srcIP="10.0.37.169:63291" resp=200 I0329 16:25:56.586519 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="236.856737ms" userAgent="james" srcIP="10.0.78.223:20031" resp=200 I0329 16:26:00.841269 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="455.263823ms" userAgent="james" srcIP="10.0.51.233:62462" resp=200 I0329 16:26:03.026039 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="462.656103ms" userAgent="james" srcIP="10.0.25.113:57850" resp=200 I0329 16:26:05.218424 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="456.940913ms" userAgent="james" srcIP="10.0.25.113:11272" resp=200 I0329 16:26:07.373366 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="442.235736ms" userAgent="james" srcIP="10.0.25.113:46960" resp=200 I0329 16:26:09.554608 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="462.029928ms" userAgent="james" srcIP="10.0.25.113:16254" resp=200 I0329 16:26:13.770448 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="460.215337ms" userAgent="james" srcIP="10.0.25.113:6017" resp=200 I0329 16:26:15.948896 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="458.875829ms" userAgent="james" srcIP="10.0.25.113:22118" resp=200 I0329 16:26:18.110111 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="447.566662ms" userAgent="james" srcIP="10.0.25.113:14208" resp=200 I0329 16:26:24.324797 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="446.271584ms" userAgent="james" srcIP="10.0.25.113:61408" resp=200 I0329 16:26:26.470754 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="444.751335ms" userAgent="james" srcIP="10.0.25.113:53187" resp=200 I0329 16:26:32.744479 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="455.035465ms" userAgent="james" srcIP="10.0.25.113:24460" resp=200 I0329 16:26:41.314015 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="448.624921ms" userAgent="james" srcIP="10.0.25.113:50744" resp=200 I0329 16:26:43.463580 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="446.781681ms" userAgent="james" srcIP="10.0.25.113:13908" resp=200 I0329 16:26:49.771985 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="439.832331ms" userAgent="james" srcIP="10.0.25.113:60474" resp=200 I0329 16:26:56.056219 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="461.870335ms" userAgent="james" srcIP="10.0.25.113:28137" resp=200 I0329 16:26:58.227994 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="444.899661ms" userAgent="james" srcIP="10.0.25.113:40385" resp=200 I0329 16:27:00.399969 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="455.621468ms" userAgent="james" srcIP="10.0.25.113:48095" resp=200 I0329 16:27:04.588044 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="438.775726ms" userAgent="james" srcIP="10.0.25.113:61575" resp=200 I0329 16:27:08.826198 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="494.197819ms" userAgent="james" srcIP="10.0.25.113:16214" resp=200 I0329 16:27:10.960414 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="447.225468ms" userAgent="james" srcIP="10.0.25.113:5935" resp=200 I0329 16:27:17.268094 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="464.328062ms" userAgent="james" srcIP="10.0.25.113:51641" resp=200 I0329 16:27:21.469696 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="455.002004ms" userAgent="james" srcIP="10.0.25.113:18644" resp=200 I0329 16:27:23.650813 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="457.71601ms" userAgent="james" srcIP="10.0.25.113:13542" resp=200 I0329 16:27:27.865922 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="446.575823ms" userAgent="james" srcIP="10.0.25.113:44279" resp=200 I0329 16:27:30.018900 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="451.042525ms" userAgent="james" srcIP="10.0.25.113:38352" resp=200 I0329 16:27:36.330192 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="492.799839ms" userAgent="james" srcIP="10.0.25.113:45080" resp=200 I0329 16:27:38.470673 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="440.340293ms" userAgent="james" srcIP="10.0.25.113:22339" resp=200 I0329 16:27:46.864711 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="440.568683ms" userAgent="james" srcIP="10.0.25.113:17841" resp=200 I0329 16:27:49.003416 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="440.534977ms" userAgent="james" srcIP="10.0.25.113:47367" resp=200 I0329 16:27:51.176509 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="463.568847ms" userAgent="james" srcIP="10.0.25.113:18046" resp=200 I0329 16:27:53.334598 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="447.761414ms" userAgent="james" srcIP="10.0.25.113:61739" resp=200 I0329 16:28:08.160325 18 httplog.go:94] "HTTP" verb="GET" URI="/api/v1/namespaces" latency="228.933349ms" userAgent="james" srcIP="10.0.6.16:30366" resp=200
I'm going to open a support ticket with AWS.
Created attachment 1767396 [details] cloudwatch metrics for the latest run
Since the PR 1079 increases termination timeouts for AWS, we need to verify on AWS and non-AWS clusters. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-05-21-233425 True False 24h Cluster version is 4.8.0-0.nightly-2021-05-21-233425 $ oc get infrastructures.config.openshift.io -o yaml | grep -A1 " platformSpec" platformSpec: type: GCP $ oc -n openshift-kube-apiserver get cm config -o json | jq -r '.data."config.yaml"' | jq '.apiServerArguments."shutdown-delay-duration"' [ "70s" ] $ oc get pod -n openshift-kube-apiserver kube-apiserver-ci-ln-d6ygqnb-f76d1-d6kbg-master-1 -o yaml | grep 135s exec watch-termination --termination-touch-file=/var/log/kube-apiserver/.terminating --termination-log-file=/var/log/kube-apiserver/termination.log --graceful-termination-duration=135s --kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/kube-apiserver-cert-syncer-kubeconfig/kubeconfig -- hyperkube kube-apiserver --openshift-config=/etc/kubernetes/static-pod-resources/configmaps/config/config.yaml --advertise-address=${HOST_IP} -v=2 --permit-address-sharing - Kube-apiserver rolling out, $ oc patch kubeapiserver/cluster --type=json -p '[ {"op": "replace", "path": "/spec/forceRedeploymentReason", "value": "roll-'"$( date --rfc-3339=ns )"'"} ]' Checked kube-apiserver operator logs, the kube-apiserver rolling out started and ended time, from 03:44:35 to 03:55:27, see below, I0525 03:44:35.707678 1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator", UID:"3748cc54-2961-4e73-98c8-2f4e24a89b9d", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'ConfigMapUpdated' Updated ConfigMap/kube-apiserver-pod -n openshift-kube-apiserver: cause by changes in data.forceRedeploymentReason ... I0525 03:55:27.483033 1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator", UID:"3748cc54-2961 -4e73-98c8-2f4e24a89b9d", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/kube-apiserver changed: Pro gressing changed from True to False ("NodeInstallerProgressing: 3 nodes are at revision 9"),Available message changed from "StaticPodsAvailable: 3 nodes are active; 1 nodes are at revis ion 8; 2 nodes are at revision 9" to "StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 9" - Etcd encryption, $ oc patch apiserver/cluster -p '{"spec":{"encryption":{"type":"aescbc"}}}' --type merge apiserver.config.openshift.io/cluster patched Checked the kube-apiserver operator logs, the encryption started at 04:10:25, ... I0525 04:10:25.175851 1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator", UID:"3748cc54-2961 -4e73-98c8-2f4e24a89b9d", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'EncryptionKeyCreated' Secret "encryption-key-openshift-kube-apiserver-1" succ essfully created: ["key-does-not-exist"] ... $ oc get kubeapiserver -o=jsonpath='{range .items[0].status.conditions[?(@.type=="Encrypted")]}{.lastTransitionTime}{"\n"}{.reason}{"\n"}{.message}{"\n"}' 2021-05-25T04:34:59Z EncryptionCompleted All resources encrypted: secrets, configmaps The encryption ended at 4:34:59, took about 24 minutes. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ $ oc get infrastructures.config.openshift.io -o yaml | grep -A1 " platformSpec" platformSpec: aws: {} $ oc -n openshift-kube-apiserver get cm config -o json | jq -r '.data."config.yaml"' | jq '.apiServerArguments."shutdown-delay-duration"' [ "210s" ] $ oc get pod -n openshift-kube-apiserver kube-apiserver-ip-10-0-49-171.us-east-2.compute.internal -o yaml | grep 275s exec watch-termination --termination-touch-file=/var/log/kube-apiserver/.terminating --termination-log-file=/var/log/kube-apiserver/termination.log --graceful-termination-duration=275s --kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/kube-apiserver-cert-syncer-kubeconfig/kubeconfig -- hyperkube kube-apiserver --openshift-config=/etc/kubernetes/static-pod-resources/configmaps/config/config.yaml --advertise-address=${HOST_IP} -v=2 --permit-address-sharing $ oc exec -n openshift-kube-apiserver-operator kube-apiserver-operator-6fc449958c-sl7p8 -- date; oc patch kubeapiserver/cluster --type=json -p '[ {"op": "replace", "path": "/spec/forceRedeploymentReason", "value": "roll-'"$( date --rfc-3339=ns )"'"} ]' Tue May 25 05:29:06 UTC 2021 kubeapiserver.operator.openshift.io/cluster patched Checked kube-apiserver operator logs, the kube-apiserver rolling out started and ended time, from 05:29:09 to 05:48:50, about 19m41s I0525 05:29:09.535071 1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator", UID:"8beb5658-6257 -4e15-89cc-d7e17af29b3f", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'ConfigMapUpdated' Updated ConfigMap/kube-apiserver-pod -n openshift-kube-apis erver: 3767 cause by changes in data.forceRedeploymentReason ... I0525 05:48:50.210507 1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator", UID:"8beb5658-6257 -4e15-89cc-d7e17af29b3f", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/kube-apiserver changed: Pro gressing changed from True to False ("NodeInstallerProgressing: 3 nodes are at revision 18"),Available message changed from "StaticPodsAvailable: 3 nodes are active; 1 nodes are at revision 17; 2 nodes are at revision 18" to "StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 18" $ oc exec -n openshift-kube-apiserver-operator kube-apiserver-operator-6fc449958c-sl7p8 -- date; oc patch apiserver/cluster -p '{"spec":{"encryption":{"type":"aescbc"}}}' --type merge Tue May 25 06:38:20 UTC 2021 apiserver.config.openshift.io/cluster patched Checked the kube-apiserver operator logs, the encryption started at 06:38:21, ... I0525 06:38:21.862745 1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator", UID:"8beb5658-6257 -4e15-89cc-d7e17af29b3f", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'EncryptionKeyCreated' Secret "encryption-key-openshift-kube-apiserver-1" succ essfully created: ["key-does-not-exist"] ... $ oc get kubeapiserver -o=jsonpath='{range .items[0].status.conditions[?(@.type=="Encrypted")]}{.lastTransitionTime}{"\n"}{.reason}{"\n"}{.message}{"\n"}' 2021-05-25T07:21:02Z EncryptionCompleted All resources encrypted: secrets, configmaps The encryption ended at 07:21:02, took about 43 minutes. From above results, the kube-apiserver rolling out and etcd encryption on AWS took much more time than GCP, it is as expected, so move the bug VERIFIED,
*** Bug 1845412 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438