Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1828606

Summary: Graceful termination broken, was: Managed cluster should have no crashlooping pods in core namespaces over two minutes: kube-apiserver pod is failing to start
Product: OpenShift Container Platform Reporter: Abhinav Dahiya <adahiya>
Component: NodeAssignee: Harshal Patil <harpatil>
Status: CLOSED DUPLICATE QA Contact: Sunil Choudhary <schoudha>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.5CC: aos-bugs, estroczy, jokerman, kgarriso, mfojtik, sttts, zyu
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-22 15:28:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Abhinav Dahiya 2020-04-27 21:50:47 UTC
Description of problem:


[sig-arch] Managed cluster should have no crashlooping pods in core namespaces over two minutes [Suite:openshift/conformance/parallel] expand_less 	2m9s
fail [github.com/openshift/origin/test/extended/operators/cluster.go:114]: Expected
    <[]string | len:2, cap:2>: [
        "Pod openshift-kube-apiserver/kube-apiserver-ci-op-cyr77ydz-3b021-lctgx-master-1 is not healthy: back-off 40s restarting failed container=kube-apiserver pod=kube-apiserver-ci-op-cyr77ydz-3b021-lctgx-master-1_openshift-kube-apiserver(402659332c36cc00dc64ab1a77d7714e)",
        "Pod openshift-kube-apiserver/kube-apiserver-ci-op-cyr77ydz-3b021-lctgx-master-0 is not healthy: back-off 20s restarting failed container=kube-apiserver pod=kube-apiserver-ci-op-cyr77ydz-3b021-lctgx-master-0_openshift-kube-apiserver(402659332c36cc00dc64ab1a77d7714e)",
    ]
to be empty

Version-Release number of selected component (if applicable):


How reproducible:

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_release/8582/rehearse-8582-pull-ci-openshift-installer-master-e2e-vsphere/1
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_telemeter/328/pull-ci-openshift-telemeter-master-e2e-aws/846
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_console-operator/419/pull-ci-openshift-console-operator-master-e2e-gcp/460
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-etcd-operator/322/pull-ci-openshift-cluster-etcd-operator-master-e2e-aws/1386
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-4.4/1481
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-4.4/1545

https://search.svc.ci.openshift.org/?search=back-off+40s+restarting+failed+container%3Dkube-apiserver&maxAge=48h&context=1&type=junit&name=&maxMatches=5&maxBytes=20971520&groupBy=job

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Stefan Schimanski 2020-05-19 08:47:50 UTC
Looked into the logs of the first 4 linked runs and always found this:

  Error: failed to create listener: failed to listen on 0.0.0.0:6443: listen tcp 0.0.0.0:6443: bind: address already in use

The kube-apiserver pods are equally named per node. So either the kernel does not release the port, or graceful termination logic is broken kubelet or cri-o again hides terminating pods such that kubelet starts the new pod early.

Comment 2 Stefan Schimanski 2020-05-19 09:20:15 UTC
Investigated https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/pr-logs/pull/openshift_release/8582/rehearse-8582-pull-ci-openshift-installer-master-e2e-vsphere/1 further:

The kube-apiserver on master-1 fails to bind to the port:

  I0427 19:45:02.128271       1 server.go:194] Version: v1.18.0-rc.1
  Error: failed to create listener: failed to listen on 0.0.0.0:6443: listen tcp 0.0.0.0:6443: bind: address already in use

compare 
https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_release/8582/rehearse-8582-pull-ci-openshift-installer-master-e2e-vsphere/1/artifacts/e2e-vsphere/gather-extra/pods/openshift-kube-apiserver_kube-apiserver-ci-op-cyr77ydz-3b021-lctgx-master-1_kube-apiserver_previous.log.

From events.json we see this:

Normal	apiserver	TerminationStart	Received signal to terminate, becoming unready, but keeping serving	2020-04-27T19:44:02Z	2020-04-27T19:44:02Z	kube-apiserver-ci-op-cyr77ydz-3b021-lctgx-master-1	1
Normal	apiserver	TerminationPreShutdownHooksFinished	All pre-shutdown hooks have been finished	2020-04-27T19:44:02Z	2020-04-27T19:44:02Z	kube-apiserver-ci-op-cyr77ydz-3b021-lctgx-master-1	1
Normal	apiserver	TerminationMinimalShutdownDurationFinished	The minimal shutdown duration of 1m10s finished	2020-04-27T19:45:12Z	2020-04-27T19:45:12Z	kube-apiserver-ci-op-cyr77ydz-3b021-lctgx-master-1	1
Normal	apiserver	TerminationStoppedServing	Server has stopped listening	2020-04-27T19:45:12Z	2020-04-27T19:45:12Z	kube-apiserver-ci-op-cyr77ydz-3b021-lctgx-master-1	1
Normal	apiserver	TerminationGracefulTerminationFinished	All pending requests processed	2020-04-27T19:45:12Z	2020-04-27T19:45:12Z	kube-apiserver-ci-op-cyr77ydz-3b021-lctgx-master-1	1

I.e. the old kube-apiserver is asked to shut down at 19:44:02z. It has a graceful termination period of 135s. Hence, the kubelet is supposed to wait up to 135s. Yes, the kubelet starts launching the new pod exactly after 60s (19:45:02). The old kube-apiserver needs at least 70s to shutdown, and it does according to the events. Kubelet does not wait for that.

Moving to node team for deeper investigation why graceful termination is broken.

Comment 3 Stefan Schimanski 2020-05-19 09:39:40 UTC
*** Bug 1832511 has been marked as a duplicate of this bug. ***

Comment 4 Stefan Schimanski 2020-05-19 09:43:18 UTC
*** Bug 1835375 has been marked as a duplicate of this bug. ***

Comment 6 Harshal Patil 2020-05-22 11:08:45 UTC
Looking in kubelet_service.log in must-gather logs from, https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_release/8582/rehearse-8582-pull-ci-openshift-installer-master-e2e-vsphere/1/artifacts/e2e-vsphere/gather-must-gather/must-gather.tar


Static pod for kube-apiserver is removed first.

Apr 27 19:44:02.206818 ci-op-cyr77ydz-3b021-lctgx-master-1 hyperkube[1468]: I0427 19:44:02.206717    1468 file.go:201] Reading config file "/etc/kubernetes/manifests/kube-apiserver-pod.yaml"
Apr 27 19:44:02.208928 ci-op-cyr77ydz-3b021-lctgx-master-1 hyperkube[1468]: I0427 19:44:02.208847    1468 kubelet.go:1924] SyncLoop (REMOVE, "file"): "kube-apiserver-ci-op-cyr77ydz-3b021-lctgx-master-1_openshift-kube-apiserver(098ca1cabf32e4a852a68b61bffcb235)"


Static pod for kube-apiserver is added back immediately. 

Apr 27 19:44:02.211239 ci-op-cyr77ydz-3b021-lctgx-master-1 hyperkube[1468]: I0427 19:44:02.211189    1468 kubelet.go:1914] SyncLoop (ADD, "file"): "kube-apiserver-ci-op-cyr77ydz-3b021-lctgx-master-1_openshift-kube-apiserver(402659332c36cc00dc64ab1a77d7714e)"


The existing kube-apiserver pod is getting killed. 

Apr 27 19:44:02.212544 ci-op-cyr77ydz-3b021-lctgx-master-1 hyperkube[1468]: I0427 19:44:02.212470    1468 event.go:278] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-kube-apiserver", Name:"kube-apiserver-ci-op-cyr77ydz-3b021-lctgx-master-1", UID:"098ca1cabf32e4a852a68b61bffcb235", APIVersion:"v1", ResourceVersion:"", FieldPath:"spec.containers{kube-apiserver}"}): type: 'Normal' reason: 'Killing' Stopping container kube-apiserver
Apr 27 19:44:02.212544 ci-op-cyr77ydz-3b021-lctgx-master-1 hyperkube[1468]: I0427 19:44:02.212496    1468 event.go:278] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-kube-apiserver", Name:"kube-apiserver-ci-op-cyr77ydz-3b021-lctgx-master-1", UID:"098ca1cabf32e4a852a68b61bffcb235", APIVersion:"v1", ResourceVersion:"", FieldPath:"spec.containers{kube-apiserver-insecure-readyz}"}): type: 'Normal' reason: 'Killing' Stopping container kube-apiserver-insecure-readyz
Apr 27 19:44:02.212544 ci-op-cyr77ydz-3b021-lctgx-master-1 hyperkube[1468]: I0427 19:44:02.212513    1468 event.go:278] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-kube-apiserver", Name:"kube-apiserver-ci-op-cyr77ydz-3b021-lctgx-master-1", UID:"098ca1cabf32e4a852a68b61bffcb235", APIVersion:"v1", ResourceVersion:"", FieldPath:"spec.containers{kube-apiserver-cert-regeneration-controller}"}): type: 'Normal' reason: 'Killing' Stopping container kube-apiserver-cert-regeneration-controller
Apr 27 19:44:02.212544 ci-op-cyr77ydz-3b021-lctgx-master-1 hyperkube[1468]: I0427 19:44:02.212527    1468 event.go:278] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-kube-apiserver", Name:"kube-apiserver-ci-op-cyr77ydz-3b021-lctgx-master-1", UID:"098ca1cabf32e4a852a68b61bffcb235", APIVersion:"v1", ResourceVersion:"", FieldPath:"spec.containers{kube-apiserver-cert-syncer}"}): type: 'Normal' reason: 'Killing' Stopping container kube-apiserver-cert-syncer
Apr 27 19:44:02.230507 ci-op-cyr77ydz-3b021-lctgx-master-1 hyperkube[1468]: I0427 19:44:02.229150    1468 kubelet_pods.go:1355] Generating status for "kube-apiserver-ci-op-cyr77ydz-3b021-lctgx-master-1_openshift-kube-apiserver(402659332c36cc00dc64ab1a77d7714e)"

</snip>

Apr 27 19:44:02.254413 ci-op-cyr77ydz-3b021-lctgx-master-1 hyperkube[1468]: I0427 19:44:02.253599    1468 manager.go:1007] Destroyed container: "/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod098ca1cabf32e4a852a68b61bffcb235.slice/crio-2e19679098aa828fa4e81c8ca58135c191feb048f4126cd9e40c12a37cfa7166.scope" (aliases: [k8s_kube-apiserver-insecure-readyz_kube-apiserver-ci-op-cyr77ydz-3b021-lctgx-master-1_openshift-kube-apiserver_098ca1cabf32e4a852a68b61bffcb235_0 2e19679098aa828fa4e81c8ca58135c191feb048f4126cd9e40c12a37cfa7166], namespace: "crio")
Apr 27 19:44:02.255408 ci-op-cyr77ydz-3b021-lctgx-master-1 hyperkube[1468]: I0427 19:44:02.255368    1468 manager.go:1007] Destroyed container: "/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod098ca1cabf32e4a852a68b61bffcb235.slice/crio-conmon-2e19679098aa828fa4e81c8ca58135c191feb048f4126cd9e40c12a37cfa7166.scope" (aliases: [], namespace: "")
Apr 27 19:44:02.273151 ci-op-cyr77ydz-3b021-lctgx-master-1 hyperkube[1468]: I0427 19:44:02.265756    1468 kubelet.go:1653] Trying to delete pod kube-apiserver-ci-op-cyr77ydz-3b021-lctgx-master-1_openshift-kube-apiserver da77b5ea-3ab8-4c22-bfbb-be120a277564



Looking at the time stamps you can see, the mirror pod is deleted immediately too. 

Apr 27 19:44:02.273151 ci-op-cyr77ydz-3b021-lctgx-master-1 hyperkube[1468]: I0427 19:44:02.265785    1468 mirror_client.go:125] Deleting a mirror pod "kube-apiserver-ci-op-cyr77ydz-3b021-lctgx-master-1_openshift-kube-apiserver" (uid (*types.UID)(0xc001c2bc60))
Apr 27 19:44:02.273151 ci-op-cyr77ydz-3b021-lctgx-master-1 hyperkube[1468]: I0427 19:44:02.273104    1468 manager.go:1007] Destroyed container: "/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod098ca1cabf32e4a852a68b61bffcb235.slice/crio-conmon-04033c95f5fdd8354de951e77a87068d783362c1d7fbcb649f7614bad3d247a9.scope" (aliases: [], namespace: "")
Apr 27 19:44:02.284601 ci-op-cyr77ydz-3b021-lctgx-master-1 hyperkube[1468]: I0427 19:44:02.283827    1468 manager.go:1007] Destroyed container: "/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod098ca1cabf32e4a852a68b61bffcb235.slice/crio-04033c95f5fdd8354de951e77a87068d783362c1d7fbcb649f7614bad3d247a9.scope" (aliases: [k8s_kube-apiserver-cert-syncer_kube-apiserver-ci-op-cyr77ydz-3b021-lctgx-master-1_openshift-kube-apiserver_098ca1cabf32e4a852a68b61bffcb235_0 04033c95f5fdd8354de951e77a87068d783362c1d7fbcb649f7614bad3d247a9], namespace: "crio")




Without waiting for the terminationGracePeriodSeconds (135s) a new kube-apiserver pod is created. 

Apr 27 19:44:02.599048 ci-op-cyr77ydz-3b021-lctgx-master-1 hyperkube[1468]: I0427 19:44:02.599016    1468 kubelet.go:1914] SyncLoop (ADD, "api"): "kube-apiserver-ci-op-cyr77ydz-3b021-lctgx-master-1_openshift-kube-apiserver(83ac4da1-4907-4354-b08d-15049e074a0e)"

Apr 27 19:44:02.741383 ci-op-cyr77ydz-3b021-lctgx-master-1 hyperkube[1468]: I0427 19:44:02.741351    1468 manager.go:950] Added container: "/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod402659332c36cc00dc64ab1a77d7714e.slice/crio-f4fd1828ce54142bedf7749417e1464a96be8f531c77b6fd8b400e0be43ccdab.scope" (aliases: [k8s_POD_kube-apiserver-ci-op-cyr77ydz-3b021-lctgx-master-1_openshift-kube-apiserver_402659332c36cc00dc64ab1a77d7714e_0 f4fd1828ce54142bedf7749417e1464a96be8f531c77b6fd8b400e0be43ccdab], namespace: "crio")


The old kube-apiserver pod has not exited gracefully yet within it's set terminationGracePeriodSeconds. 

Apr 27 19:44:02.815018 ci-op-cyr77ydz-3b021-lctgx-master-1 hyperkube[1468]: I0427 19:44:02.814978    1468 kubelet_pods.go:1731] Orphaned pod "098ca1cabf32e4a852a68b61bffcb235" found, but volumes not yet removed.  Reducing cpu to minimum
Apr 27 19:44:02.847649 ci-op-cyr77ydz-3b021-lctgx-master-1 hyperkube[1468]: I0427 19:44:02.847609    1468 status_manager.go:563] Pod "kube-apiserver-ci-op-cyr77ydz-3b021-lctgx-master-1_openshift-kube-apiserver(83ac4da1-4907-4354-b08d-15049e074a0e)" was deleted and then recreated, skipping status update; old UID "098ca1cabf32e4a852a68b61bffcb235", new UID "402659332c36cc00dc64ab1a77d7714e"

As a result, we have have old kube-apiserver pod which is still listening on 0.0.0.0:6443, while the new pod is trying to bind to same address. From, https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_release/8582/rehearse-8582-pull-ci-openshift-installer-master-e2e-vsphere/1/artifacts/e2e-vsphere/gather-extra/pods/openshift-kube-apiserver_kube-apiserver-ci-op-cyr77ydz-3b021-lctgx-master-1_kube-apiserver_previous.log, you can see following line, 

I0427 19:45:02.128271       1 server.go:194] Version: v1.18.0-rc.1
Error: failed to create listener: failed to listen on 0.0.0.0:6443: listen tcp 0.0.0.0:6443: bind: address already in use
Usage:
  kube-apiserver [flags]



The events in above logs strongly correlate with https://bugzilla.redhat.com/show_bug.cgi?id=1700279, specifically with the,
1. Description Stefan provided, https://bugzilla.redhat.com/show_bug.cgi?id=1700279#c0
2. Seth's comment, https://bugzilla.redhat.com/show_bug.cgi?id=1700279#c2
3. Ted's comment, https://bugzilla.redhat.com/show_bug.cgi?id=1700279#c8

Comment 7 Harshal Patil 2020-05-22 15:28:23 UTC
My bad, instead of respecting the terminationGracePeriodSeconds, as Ted puts in this comment (https://github.com/kubernetes/kubernetes/issues/90999#issuecomment-632742253), we need conditional cancellable grace period.

*** This bug has been marked as a duplicate of bug 1700279 ***