Bug 1919069

Summary: [sig-api-machinery][Feature:APIServer][Late] kubelet terminates kube-apiserver gracefully
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: NodeAssignee: Harshal Patil <harpatil>
Node sub component: Kubelet QA Contact: Ke Wang <kewang>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: high CC: aos-bugs, ccoleman, deads, ercohen, harpatil, jingzhan, jluhrsen, jokerman, lszaszki, mfojtik, sttts, tsweeney, walters, wking, xxia
Version: 4.6Keywords: Reopened, UpcomingSprint
Target Milestone: ---   
Target Release: 4.6.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-17 19:25:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1882750, 1908378, 1927102, 1929674, 1929695    
Bug Blocks:    

Comment 3 Ke Wang 2021-02-07 08:35:53 UTC
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2021-02-05-192240   True        False         6h51m   Cluster version is 4.6.0-0.nightly-2021-02-05-192240

Connect to one master node,
oc debug node/<master>

sh-4.4# vi kube-apiserver-pod.yaml  #Changed some and saved.

sh-4.4# date; ps -ef |grep ' kube-apiserver '  | grep -v grep | awk '{print $2}'
Sun Feb  7 08:16:12 UTC 2021
3083
3401

Wait for a while, the process id of kube-apiserver were changed.
sh-4.4# date; ps -ef |grep ' kube-apiserver '  | grep -v grep | awk '{print $2}'
Sun Feb  7 08:25:24 UTC 2021
322029
322065

Checking the termination.log
...
I0207 08:18:35.442101      18 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
I0207 08:18:37.260965      18 genericapiserver.go:667] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-kube-apiserver", Name:"kube-apiserver-ip-xxxx-86.us-east-2.compute.internal", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TerminationGracefulTerminationFinished' All pending requests processed
I0207 08:18:37.294318       1 main.go:198] Termination finished with exit code 0


In another terminal console, check if the kube-apiserver has been restarted.
$ oc get pods -n openshift-kube-apiserver --show-labels -l apiserver
NAME                                                        READY   STATUS    RESTARTS   AGE    LABELS
kube-apiserver-ip-xx-xx-xxx-86.us-east-2.compute.internal   5/5     Running   18         5h6m   apiserver=true,app=openshift-kube-apiserver,revision=5
...

Checked the pod describe,
...
Events:
  Type    Reason          Age                  From     Message
  ----    ------          ----                 ----     -------
  Normal  Killing         15m                  kubelet  Stopping container kube-apiserver-insecure-readyz
  Normal  Killing         15m                  kubelet  Stopping container kube-apiserver
  Normal  Killing         15m                  kubelet  Stopping container kube-apiserver-cert-regeneration-controller
  Normal  Killing         15m                  kubelet  Stopping container kube-apiserver-check-endpoints
  Normal  Killing         15m                  kubelet  Stopping container kube-apiserver-cert-syncer
  Normal  SandboxChanged  13m                  kubelet  Pod sandbox changed, it will be killed and re-created.
  Normal  Started         13m (x2 over 6h39m)  kubelet  Started container setup
  Normal  Created         13m (x2 over 6h39m)  kubelet  Created container setup
  Normal  Pulled          13m (x2 over 6h39m)  kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:39ac816c8207ee242ba05606e4e85b7d2328176a4bd518fde05cdf481a60fe52" already present on machine
  Normal  Pulled          12m (x2 over 6h39m)  kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:39ac816c8207ee242ba05606e4e85b7d2328176a4bd518fde05cdf481a60fe52" already present on machine
  Normal  Pulled          11m (x2 over 6h39m)  kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7450e4a112aabfa72b58c1733900a0ff9cfb4004601228ad8d5799284bb8b754" already present on machine
  Normal  Started         11m (x2 over 6h39m)  kubelet  Started container kube-apiserver-cert-regeneration-controller
  Normal  Started         11m (x2 over 6h39m)  kubelet  Started container kube-apiserver-cert-syncer
  Normal  Created         11m (x2 over 6h39m)  kubelet  Created container kube-apiserver-insecure-readyz
  Normal  Started         11m (x2 over 6h39m)  kubelet  Started container kube-apiserver-insecure-readyz
  Normal  Pulled          11m (x2 over 6h39m)  kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7450e4a112aabfa72b58c1733900a0ff9cfb4004601228ad8d5799284bb8b754" already present on machine
  Normal  Created         11m (x2 over 6h39m)  kubelet  Created container kube-apiserver-check-endpoints
  Normal  Created         11m (x2 over 6h39m)  kubelet  Created container kube-apiserver-cert-regeneration-controller
  Normal  Created         11m (x2 over 6h39m)  kubelet  Created container kube-apiserver-cert-syncer
  Normal  Pulled          11m (x2 over 6h39m)  kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7450e4a112aabfa72b58c1733900a0ff9cfb4004601228ad8d5799284bb8b754" already present on machine
  Normal  Started         11m (x2 over 6h39m)  kubelet  Started container kube-apiserver
  Normal  Created         11m (x2 over 6h39m)  kubelet  Created container kube-apiserver
  Normal  Pulled          11m (x2 over 6h39m)  kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7450e4a112aabfa72b58c1733900a0ff9cfb4004601228ad8d5799284bb8b754" already present on machine
  Normal  Started         11m (x2 over 6h39m)  kubelet  Started container kube-apiserver-check-endpoints


After 135s(terminationGracePeriodSeconds), the new kube-apiserver was started, checked the kube-apiserver events, from 13m killed and 11m started, the process is as expected, so move the bug VERIFIED.

Comment 5 errata-xmlrpc 2021-02-17 19:25:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.17 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0424