Description of problem: The kubelet couldn't startup after certificate rotation. It will threw below error when execute systemctl restart kubelet OOct 31 12:29:32 ip-10-0-134-182 hyperkube[10938]: W1031 12:29:32.491224 10938 feature_gate.go:223] unrecognized feature gate: LegacyNodeRoleBehavior Oct 31 12:29:32 ip-10-0-134-182 hyperkube[10938]: W1031 12:29:32.491229 10938 feature_gate.go:223] unrecognized feature gate: NodeDisruptionExclusion Oct 31 12:29:32 ip-10-0-134-182 hyperkube[10938]: I1031 12:29:32.491234 10938 feature_gate.go:246] feature gates: &{map[APIPriorityAndFairness:true DownwardAPIHugePages:true PodSecurity:true RotateKubeletServerCertificate:true SupportPodPidsLimit:true]} Oct 31 12:29:32 ip-10-0-134-182 hyperkube[10938]: W1031 12:29:32.491365 10938 plugins.go:132] WARNING: aws built-in cloud provider is now deprecated. The AWS provider is deprecated and will be removed in a future release. Please use https://github.com/kubernetes/cloud-provider-aws Oct 31 12:29:32 ip-10-0-134-182 hyperkube[10938]: I1031 12:29:32.491643 10938 aws.go:1270] Building AWS cloudprovider Oct 31 12:29:32 ip-10-0-134-182 hyperkube[10938]: I1031 12:29:32.491707 10938 aws.go:1230] Zone not specified in configuration file; querying AWS metadata service Oct 31 12:29:32 ip-10-0-134-182 systemd[1]: run-r8684c30c702f435386db156069eafc49.scope: Succeeded. -- Subject: Unit succeeded -- Defined-By: systemd -- Support: https://access.redhat.com/support -- -- The unit run-r8684c30c702f435386db156069eafc49.scope has successfully entered the 'dead' state. Oct 31 12:29:32 ip-10-0-134-182 systemd[1]: run-r8684c30c702f435386db156069eafc49.scope: Consumed 614us CPU time -- Subject: Resources consumed by unit runtime -- Defined-By: systemd -- Support: https://access.redhat.com/support -- -- The unit run-r8684c30c702f435386db156069eafc49.scope completed and consumed the indicated resources. Oct 31 12:29:32 ip-10-0-134-182 hyperkube[10938]: E1031 12:29:32.510000 10938 server.go:294] "Failed to run kubelet" err="failed to run Kubelet: could not init cloud provider \"aws\": error finding instance i-018e00e6eba4d5fba: \"error listing AWS instances: \\\"AuthFailure: AWS was not able to validate the provided access credentials\\\\n\\\\tstatus code: 401, request id: a9b4f9ce-65ed-4db3-8442-60e753d11713\\\"\"" Oct 31 12:29:32 ip-10-0-134-182 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE Oct 31 12:29:32 ip-10-0-134-182 systemd[1]: kubelet.service: Failed with result 'exit-code'. -- Subject: Unit failed -- Defined-By: systemd -- Support: https://access.redhat.com/support -- -- The unit kubelet.service has entered the 'failed' state with result 'exit-code'. Oct 31 12:29:32 ip-10-0-134-182 systemd[1]: Failed to start Kubernetes Kubelet. -- Subject: Unit kubelet.service has failed -- Defined-By: systemd -- Support: https://access.redhat.com/support -- -- Unit kubelet.service has failed. -- -- The result is failed. Oct 31 12:29:32 ip-10-0-134-182 systemd[1]: kubelet.service: Consumed 89ms CPU time -- Subject: Resources consumed by unit runtime -- Defined-By: systemd -- Support: https://access.redhat.com/support -- -- The unit kubelet.service completed and consumed the indicated resources. How reproducible: Steps to Reproduce: 1. spin up fresh cluster with OCP4.10 2. create a ntp node with local stratum 1 on a dedicated node and run systemctl restart chronyd ``` driftfile /var/lib/chrony/drift makestep 1.0 3 allow 10.0.0.0/12 local stratum 1 logdir /var/log/chrony manual ``` 3. apply machineconfigs below to get cluster to the ntp server connected master: ``` # Generated by Butane; do not edit apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: master name: 99-master-chrony spec: config: ignition: version: 3.2.0 storage: files: - contents: source: data:,pool%20infra-0.anowak4rdu.lab.upshift.rdu2.redhat.com%20iburst%20%0Adriftfile%20%2Fvar%2Flib%2Fchrony%2Fdrift%0Amakestep%201.0%203%0Artcsync%0Alogdir%20%2Fvar%2Flog%2Fchrony%0A mode: 420 overwrite: true path: /etc/chrony.conf ``` worker: ``` # Generated by Butane; do not edit apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: 99-worker-chrony spec: config: ignition: version: 3.2.0 storage: files: - contents: source: data:,pool%20infra-0.anowak4rdu.lab.upshift.rdu2.redhat.com%20iburst%20%0Adriftfile%20%2Fvar%2Flib%2Fchrony%2Fdrift%0Amakestep%201.0%203%0Artcsync%0Alogdir%20%2Fvar%2Flog%2Fchrony%0A mode: 420 overwrite: true path: /etc/chrony.conf ``` 4. set date on ntp server to 20 months in future and restart chrony on ntp server an all ocp nodes ``` newdate=$(date "+%Y-%m-%d %H:%M:%S" -d '10 months') sudo timedatectl set-time "$newdate" sleep 2 sudo systemctl restart chronyd sleep 10 for i in {0..2} do for j in master worker do ssh -i /home/quicklab/.ssh/quicklab.key -o 'UserKnownHostsFile /dev/null' -o 'StrictHostKeyChecking no' -l quicklab $j-$i.anowakrdu2a.lab.upshift.rdu2.redhat.com sudo systemctl restart chronyd done done ``` 5. wait a minute and restart kubelet on all master nodes 6. approve all pending certificates to get the master nodes ready `oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve` 7. reboot one of the master nodes Actual results: The kubelet fail to startup and openshift console can not login. Expected results: The kubelet should be startup successfully and openshift console can be login Additional info:
Hi Liquan, As @harpatil mentioned, this issue is not related to the kubelet. The error seems to be in the initialization of the Cloud Provider where the listing of AWS instances failed due to invalid access credentials. "could not init cloud provider \"aws\": error finding instance i-018e00e6eba4d5fba: \"error listing AWS instances: \\\"AuthFailure: AWS was not able to validate the provided access credentials\\\\n\\\\tstatus code: 401, request id: a9b4f9ce-65ed-4db3-8442-60e753d11713\\\"\"" Thanks, Ramesh