Bug 1882750 - [sig-api-machinery][Feature:APIServer][Late] kubelet terminates kube-apiserver gracefully
Summary: [sig-api-machinery][Feature:APIServer][Late] kubelet terminates kube-apiserve...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: 4.7.0
Assignee: Harshal Patil
QA Contact: Ke Wang
URL:
Whiteboard:
: 1882111 1883512 1886563 1886920 1892536 1916880 1917133 1918930 (view as bug list)
Depends On:
Blocks: 1919069 1929695
TreeView+ depends on / blocked
 
Reported: 2020-09-25 14:54 UTC by Russell Teague
Modified: 2021-02-24 15:21 UTC (History)
23 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1929695 (view as bug list)
Environment:
[sig-api-machinery][Feature:APIServer][Late] kubelet terminates kube-apiserver gracefully
Last Closed: 2021-02-24 15:21:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kubernetes pull 379 0 None closed Bug 1882750: UPSTREAM: <carry>: watch-termination: avoid false positives of NonGracefulTermination events 2021-02-22 00:08:33 UTC
Github openshift kubernetes pull 531 0 None closed Bug 1882750: UPSTREAM: 98103: kubelet: Delete static pod gracefully and fix mirrorPodTerminationMap leak 2021-02-22 00:08:33 UTC
Github openshift origin pull 25568 0 None closed Bug 1882750: extended/apiserver: add test looking out for GracefulTerminationTimeout events 2021-02-22 00:08:33 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:21:56 UTC

Description Russell Teague 2020-09-25 14:54:58 UTC
test:
[sig-api-machinery][Feature:APIServer][Late] kubelet terminates kube-apiserver gracefully 

is failing frequently in CI, see search results:
https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=%5C%5Bsig-api-machinery%5C%5D%5C%5BFeature%3AAPIServer%5C%5D%5C%5BLate%5C%5D+kubelet+terminates+kube-apiserver+gracefully


https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.6/1308308963082637312

I0922 08:37:17.056692    3165 test_context.go:429] Tolerating taints "node-role.kubernetes.io/master" when considering if nodes are ready
Sep 22 08:37:17.113: INFO: Waiting up to 30m0s for all (but 100) nodes to be schedulable
Sep 22 08:37:17.247: INFO: Waiting up to 10m0s for all pods (need at least 0) in namespace 'kube-system' to be running and ready
Sep 22 08:37:17.321: INFO: 0 / 0 pods in namespace 'kube-system' are running and ready (0 seconds elapsed)
Sep 22 08:37:17.321: INFO: expected 0 pod replicas in namespace 'kube-system', 0 are Running and Ready.
Sep 22 08:37:17.321: INFO: Waiting up to 5m0s for all daemonsets in namespace 'kube-system' to start
Sep 22 08:37:17.346: INFO: e2e test version: v0.0.0-master+$Format:%h$
Sep 22 08:37:17.361: INFO: kube-apiserver version: v1.19.0+f5121a6
Sep 22 08:37:17.390: INFO: Cluster IP family: ipv4
[BeforeEach] [Top Level]
  github.com/openshift/origin/test/extended/util/framework.go:1425
[BeforeEach] [Top Level]
  github.com/openshift/origin/test/extended/util/framework.go:1425
[BeforeEach] [Top Level]
  github.com/openshift/origin/test/extended/util/test.go:59
[BeforeEach] [sig-api-machinery][Feature:APIServer][Late]
  github.com/openshift/origin/test/extended/util/client.go:135
STEP: Creating a kubernetes client
[BeforeEach] [sig-api-machinery][Feature:APIServer][Late]
  github.com/openshift/origin/test/extended/util/client.go:111
Sep 22 08:37:17.709: INFO: configPath is now "/tmp/configfile135850088"
Sep 22 08:37:17.709: INFO: The user is now "e2e-test-terminating-kube-apiserver-q24v9-user"
Sep 22 08:37:17.709: INFO: Creating project "e2e-test-terminating-kube-apiserver-q24v9"
Sep 22 08:37:17.888: INFO: Waiting on permissions in project "e2e-test-terminating-kube-apiserver-q24v9" ...
Sep 22 08:37:17.906: INFO: Waiting for ServiceAccount "default" to be provisioned...
Sep 22 08:37:18.030: INFO: Waiting for ServiceAccount "deployer" to be provisioned...
Sep 22 08:37:18.149: INFO: Waiting for ServiceAccount "builder" to be provisioned...
Sep 22 08:37:18.273: INFO: Waiting for RoleBinding "system:image-pullers" to be provisioned...
Sep 22 08:37:18.307: INFO: Waiting for RoleBinding "system:image-builders" to be provisioned...
Sep 22 08:37:18.340: INFO: Waiting for RoleBinding "system:deployers" to be provisioned...
Sep 22 08:37:33.539: INFO: Project "e2e-test-terminating-kube-apiserver-q24v9" has been fully provisioned.
[It] kubelet terminates kube-apiserver gracefully [Suite:openshift/conformance/parallel]
  github.com/openshift/origin/test/extended/apiserver/graceful_termination.go:20
[AfterEach] [sig-api-machinery][Feature:APIServer][Late]
  github.com/openshift/origin/test/extended/util/client.go:133
STEP: Collecting events from namespace "e2e-test-terminating-kube-apiserver-q24v9".
STEP: Found 0 events.
Sep 22 08:37:33.656: INFO: POD  NODE  PHASE  GRACE  CONDITIONS
Sep 22 08:37:33.656: INFO: 
Sep 22 08:37:33.689: INFO: skipping dumping cluster info - cluster too large
Sep 22 08:37:33.736: INFO: Deleted {user.openshift.io/v1, Resource=users  e2e-test-terminating-kube-apiserver-q24v9-user}, err: <nil>
Sep 22 08:37:33.772: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthclients  e2e-client-e2e-test-terminating-kube-apiserver-q24v9}, err: <nil>
Sep 22 08:37:33.820: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthaccesstokens  WsIj9qyeQACGSu1dMPeMcgAAAAAAAAAA}, err: <nil>
[AfterEach] [sig-api-machinery][Feature:APIServer][Late]
  github.com/openshift/origin/test/extended/util/client.go:134
Sep 22 08:37:33.820: INFO: Waiting up to 7m0s for all (but 100) nodes to be ready
STEP: Destroying namespace "e2e-test-terminating-kube-apiserver-q24v9" for this suite.
Sep 22 08:37:33.871: INFO: Running AfterSuite actions on all nodes
Sep 22 08:37:33.871: INFO: Running AfterSuite actions on node 1
fail [github.com/onsi/ginkgo.0-origin.1+incompatible/internal/leafnodes/runner.go:64]: kube-apiserver reports a non-graceful termination: v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"kube-apiserver-ip-10-0-210-234.ec2.internal.16370d2ec3a8d999", GenerateName:"", Namespace:"openshift-kube-apiserver", SelfLink:"/api/v1/namespaces/openshift-kube-apiserver/events/kube-apiserver-ip-10-0-210-234.ec2.internal.16370d2ec3a8d999", UID:"5ea1e09a-83d8-4769-b84b-1b8dd1ae9deb", ResourceVersion:"64686", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63736359490, loc:(*time.Location)(0x9af3ae0)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry{v1.ManagedFieldsEntry{Manager:"watch-termination", Operation:"Update", APIVersion:"v1", Time:(*v1.Time)(0xc0020baba0), FieldsType:"FieldsV1", FieldsV1:(*v1.FieldsV1)(0xc0020babc0)}}}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"openshift-kube-apiserver", Name:"kube-apiserver-ip-10-0-210-234.ec2.internal", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}, Reason:"NonGracefulTermination", Message:"Previous pod kube-apiserver-ip-10-0-210-234.ec2.internal started at 2020-09-22 08:04:53.782347207 +0000 UTC did not terminate gracefully", Source:v1.EventSource{Component:"apiserver", Host:"ip-10-0-210-234"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63736359487, loc:(*time.Location)(0x9af3ae0)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63736359487, loc:(*time.Location)(0x9af3ae0)}}, Count:1, Type:"Warning", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}. Probably kubelet or CRI-O is not giving the time to cleanly shut down. This can lead to connection refused and network I/O timeout errors in other components.

Comment 1 Seth Jennings 2020-09-28 16:30:22 UTC
Trying to break down how this test works

https://github.com/openshift/kubernetes/blob/master/cmd/watch-termination/main.go

copied from o/o

carried in https://github.com/openshift/kubernetes/pull/332

reworked in rebase to 1.19 https://github.com/openshift/kubernetes/pull/325

https://github.com/openshift/origin/blob/master/test/extended/apiserver/graceful_termination.go#L20-L40

The test just checks for the existence of an event with reason NonGracefulTermination.

watch-termination is exec'ed by the kube-apiserver pod and actually wraps the execution of `hyperkube kube-apiserver` and emits the event

https://github.com/openshift/cluster-kube-apiserver-operator/blob/0a69c78cea511a8c8e70b3910c637ffaf2a1dc36/bindata/v4.1.0/kube-apiserver/pod.yaml#L57

It seems like this test condition could be triggered by the kubelet not honoring the terminationGracePeriodSeconds (though we have an upstream test for this) _or_ the kube-apiserver is not terminating within its grace period and is being ungracefully killed, but it isn't the kublet's fault.

Do we know which it is?

Comment 2 Seth Jennings 2020-09-28 16:31:41 UTC
The also seems like a dup of https://bugzilla.redhat.com/show_bug.cgi?id=1882111

Comment 3 Seth Jennings 2020-09-29 13:52:01 UTC
*** Bug 1882111 has been marked as a duplicate of this bug. ***

Comment 4 Seth Jennings 2020-09-29 13:52:13 UTC
*** Bug 1883512 has been marked as a duplicate of this bug. ***

Comment 5 Seth Jennings 2020-09-29 13:53:12 UTC
dup'ed bug 1883512 indicates higher rate of occurrence on metal-upi

Comment 6 David Eads 2020-09-29 14:08:20 UTC
Increasing the severity and bringing back into 4.6. to indicate the frequency of the failure on metal-upi: 30% failure rate versus the normal 93+% pass rate on aws, gcp, azure, openstack, metal-ipi.

Comment 7 Stefan Schimanski 2020-09-30 15:30:08 UTC
We are looking into this. We see kube-apiserver not terminating and receiving requests far too late in the termination process. The former might be a regression (Golang 1.15?), the later is related to load balancers.

Comment 8 Stefan Schimanski 2020-10-02 11:50:37 UTC
Move back to NEW and assigned back to nodes team. @Lukasz can you attach the test failures that are real?

Comment 9 Stefan Schimanski 2020-10-02 12:08:37 UTC
@sjenning we have only found the one case where kube-apiserver kept running and then we got killed. These cases will not fail this test anymore, but another one watching GracefulTerminationTimeout events. The other test runs we looked at with NonGracefulTermination were real. 

Some of them showed that the process was started and killed 2s later. The pod API spec does not allow this behaviour, even if the pod is not ready.

My assessment for priority and target release is that this should not block the release. But we have to keep digging, and reasses if we notice a critical root cause.

Comment 10 Lukasz Szaszkiewicz 2020-10-02 12:17:37 UTC
For example https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-node-tuning-operator/158/pull-ci-openshift-cluster-node-tuning-operator-master-e2e-aws/1311886485250641920

Failed on "Previous pod kube-apiserver-ip-10-0-159-220.us-west-2.compute.internal started at 2020-10-02 04:56:47.852086204 +0000 UTC did not terminate gracefully"

In the termination logs of that node we found

"
I1002 04:56:46.586764       1 main.go:130] Deleting termination lock file "/var/log/kube-apiserver/.terminating"
---
I1002 04:56:48.551676       1 main.go:155] Received signal terminated. Forwarding to sub-process "hyperkube".
...
I1002 04:56:58.569124      18 httplog.go:94] "HTTP" verb="GET" URI="/readyz" latency="7.006882ms" userAgent="kube-apiserver/v1.19.0 (linux/amd64) kubernetes/1fc699e" srcIP="[::1]:47626" resp=0
W1002 04:56:58.582588      18 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://10.0.175.202:2379  <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 10.0.175.202:2379: connect: connection refused". Reconnecting...
Previous pod kube-apiserver-ip-10-0-159-220.us-west-2.compute.internal started at 2020-10-02 04:56:47.852086204 +0000 UTC did not terminate gracefully
---
"

which suggest that the new pod received the signal to terminate 2s after the start. In kubelet logs we found:

"
Oct 02 04:56:48.448044 ip-10-0-159-220 hyperkube[1500]: I1002 04:56:48.448025    1500 config.go:278] Setting pods for source file
Oct 02 04:56:48.448330 ip-10-0-159-220 hyperkube[1500]: I1002 04:56:48.448310    1500 config.go:383] Receiving a new pod "kube-apiserver-ip-10-0-159-220.us-west-2.compute.internal_openshift-kube-apiserver(cbaf1e3447758b0fd5a64e27b90d07df)"
Oct 02 04:56:48.448614 ip-10-0-159-220 hyperkube[1500]: I1002 04:56:48.448600    1500 kubelet.go:1844] SyncLoop (REMOVE, "file"): "kube-apiserver-ip-10-0-159-220.us-west-2.compute.internal_openshift-kube-apiserver(bfad365b105ae5d6b142f99cc022b13c)"
Oct 02 04:56:48.448701 ip-10-0-159-220 hyperkube[1500]: I1002 04:56:48.448629    1500 kubelet_pods.go:1212] marking pod pending termination "bfad365b105ae5d6b142f99cc022b13c"
Oct 02 04:56:48.448701 ip-10-0-159-220 hyperkube[1500]: I1002 04:56:48.448669    1500 kubelet.go:1834] SyncLoop (ADD, "file"): "kube-apiserver-ip-10-0-159-220.us-west-2.compute.internal_openshift-kube-apiserver(cbaf1e3447758b0fd5a64e27b90d07df)"
Oct 02 04:56:48.448773 ip-10-0-159-220 hyperkube[1500]: I1002 04:56:48.448724    1500 topology_manager.go:233] [topologymanager] Topology Admit Handler
Oct 02 04:56:48.448914 ip-10-0-159-220 hyperkube[1500]: I1002 04:56:48.448870    1500 kubelet_pods.go:1247] Killing unwanted pod "kube-apiserver-ip-10-0-159-220.us-west-2.compute.internal"
...
Oct 02 04:56:48.470144 ip-10-0-159-220 hyperkube[1500]: I1002 04:56:48.470130    1500 kubelet.go:1566] Trying to delete pod kube-apiserver-ip-10-0-159-220.us-west-2.compute.internal_openshift-kube-apiserver 17c5add8-4dba-4b33-8589-ee9c562f274e
Oct 02 04:56:48.470144 ip-10-0-159-220 hyperkube[1500]: I1002 04:56:48.470146    1500 mirror_client.go:125] Deleting a mirror pod "kube-apiserver-ip-10-0-159-220.us-west-2.compute.internal_openshift-kube-apiserver" (uid (*types.UID)(0xc002304460))
"

we would like to know why kubelet killed the pod 2s after the start and why it didn't wait TerminationGracePeriodSeconds

Comment 11 Lukasz Szaszkiewicz 2020-10-02 12:37:26 UTC
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ocp-4.6-e2e-vsphere-upi/1311384650140094464

Failed on "Previous pod kube-apiserver-control-plane-0 started at 2020-09-30 19:32:23.948560597 +0000 UTC did not terminate gracefully"

In the termination logs of that node we found:

"
---
---
Previous pod kube-apiserver-control-plane-0 started at 2020-09-30 19:32:23.948560597 +0000 UTC did not terminate gracefully
---
"

In kubelet logs we found:

"
Sep 30 19:32:03.052045 control-plane-0 hyperkube[1554]: I0930 19:32:03.052015    1554 config.go:383] Receiving a new pod "kube-apiserver-control-plane-0_openshift-kube-apiserver(d11650bf506825db2daf49214b55233f)"
...
Sep 30 19:32:39.112803 control-plane-0 hyperkube[1554]: I0930 19:32:39.112783    1554 config.go:383] Receiving a new pod "kube-apiserver-control-plane-0_openshift-kube-apiserver(935ad700ba6660dbb6b16e08d69161d2)"
Sep 30 19:32:39.113374 control-plane-0 hyperkube[1554]: I0930 19:32:39.113356    1554 config.go:278] Setting pods for source file
Sep 30 19:32:39.113447 control-plane-0 hyperkube[1554]: I0930 19:32:39.113382    1554 kubelet.go:1844] SyncLoop (REMOVE, "file"): "kube-apiserver-control-plane-0_openshift-kube-apiserver(d11650bf506825db2daf49214b55233f)"
Sep 30 19:32:39.113447 control-plane-0 hyperkube[1554]: I0930 19:32:39.113412    1554 kubelet_pods.go:1212] marking pod pending termination "d11650bf506825db2daf49214b55233f"
Sep 30 19:32:39.115834 control-plane-0 hyperkube[1554]: I0930 19:32:39.115806    1554 kubelet.go:1834] SyncLoop (ADD, "file"): "kube-apiserver-control-plane-0_openshift-kube-apiserver(935ad700ba6660dbb6b16e08d69161d2)"
Sep 30 19:32:39.115967 control-plane-0 hyperkube[1554]: I0930 19:32:39.115900    1554 topology_manager.go:233] [topologymanager] Topology Admit Handler
Sep 30 19:32:39.116340 control-plane-0 hyperkube[1554]: I0930 19:32:39.116310    1554 kubelet_pods.go:1247] Killing unwanted pod "kube-apiserver-control-plane-0"
"


we would like to know why kubelet killed the pod and if it waited TerminationGracePeriodSeconds before starting a new one.

Comment 12 Seth Jennings 2020-10-02 14:45:37 UTC
per https://bugzilla.redhat.com/show_bug.cgi?id=1882750#c9 deferring to 4.7 and setting priority high

Comment 13 David Eads 2020-10-08 19:24:00 UTC
Since comment #9, we have seen a dramatic shift in frequency.  Some of the top movers are

release-openshift-ocp-installer-e2e-azure-4.6			88.00% (0.00%)(25 runs)		100.00% (0.00%)(39 runs)
release-openshift-origin-installer-e2e-remote-libvirt-s390x-4.6	90.00% (0.00%)(10 runs)		100.00% (0.00%)(6 runs)
release-openshift-ocp-installer-e2e-aws-4.6			92.00% (0.00%)(50 runs)		97.06% (0.00%)(68 runs)
periodic-ci-openshift-release-master-ocp-4.6-e2e-vsphere	93.94% (0.00%)(33 runs)		100.00% (0.00%)(35 runs)

@sjenning.  This suddenly got a lot more severe and would explain our upgrade 10% increase in availability downtime that happened at the same time.

Given a rapid shift in frequency and a direct impact on on azure 4.6 to 4.6 upgrade API availability, I think this deserves more investigation.

Comment 14 David Eads 2020-10-08 19:24:12 UTC
*** Bug 1886563 has been marked as a duplicate of this bug. ***

Comment 15 David Eads 2020-10-08 19:39:18 UTC
Cross linking to https://bugzilla.redhat.com/show_bug.cgi?id=1886499 which is marked by a significant drop in availability of kube-apiserver on azure upgrades, 4.6 to 4.6 fails a lot more often.  4.5 to 4.6 fails a little more often.

Comment 16 Stefan Schimanski 2020-10-09 11:55:58 UTC
Triage of https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_installer/4250/pull-ci-openshift-installer-master-e2e-aws/1314345355613048832:

Master node 10.0.248.159 termination.log:

  I1008 23:52:27.380785       1 main.go:176] Received signal terminated. Forwarding to sub-process "hyperkube".
  ...
  I1008 23:52:37.330384      17 httplog.go:94] "HTTP" verb="GET" URI="/readyz" latency="2.792032ms" userAgent="kube-apiserver/v1.19.0 (linux/amd64) kubernetes/75c8d48" srcIP="[::1]:55198" resp=0
  Previous pod kube-apiserver-ip-10-0-248-159.us-east-2.compute.internal started at 2020-10-08 23:52:27.031713422 +0000 UTC did not terminate gracefully

Conclusion: this looks like a real non-graceful termination after exactly 10s.

Comment 17 Stefan Schimanski 2020-10-09 11:58:20 UTC
Addition to #16: the signal was received before the pod got ready. But during the following 10s it became ready and received traffic.

Comment 20 Ryan Phillips 2020-11-06 19:59:01 UTC
*** Bug 1892536 has been marked as a duplicate of this bug. ***

Comment 21 W. Trevor King 2020-11-10 00:45:55 UTC
I dunno why the subcomponent wasn't set.  Taking a guess now.  Pretty common in recent CI:

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=24h&search=kube-apiserver%20reports%20a%20non-graceful%20termination' | grep 'failures match' | sort
periodic-ci-openshift-release-master-ocp-4.7-e2e-aws-workers-rhel7 - 3 runs, 67% failed, 50% of failures match
pull-ci-openshift-cloud-credential-operator-master-e2e-aws - 6 runs, 100% failed, 17% of failures match
...
pull-ci-operator-framework-operator-marketplace-master-e2e-aws-upgrade - 6 runs, 17% failed, 100% of failures match
release-openshift-ocp-installer-e2e-aws-4.7 - 6 runs, 50% failed, 67% of failures match
release-openshift-ocp-installer-e2e-ovirt-4.7 - 7 runs, 71% failed, 20% of failures match
release-openshift-okd-installer-e2e-aws-4.6 - 10 runs, 60% failed, 33% of failures match
release-openshift-origin-installer-e2e-aws-ovn-upgrade-4.5-stable-to-4.6-ci - 10 runs, 20% failed, 50% of failures match
release-openshift-origin-installer-e2e-aws-sdn-multitenant-4.8 - 1 runs, 100% failed, 100% of failures match
release-openshift-origin-installer-e2e-aws-upgrade - 32 runs, 19% failed, 67% of failures match
release-openshift-origin-installer-e2e-aws-upgrade-4.6-stable-to-4.7-ci - 4 runs, 50% failed, 150% of failures match
release-openshift-origin-installer-e2e-azure-upgrade-4.6-stable-to-4.7-ci - 4 runs, 25% failed, 100% of failures match

Not sure if we expect aws-4.7 to still be showing these with bug 1888015 landed almost a month ago.

Comment 22 Ryan Phillips 2020-11-11 15:41:14 UTC
I am not seeing the UID changing when the static pod is laid down again. One other issue with this for static pods is StopPodSandbox is only using 10 seconds for the sandbox container shutdowns.

https://github.com/cri-o/cri-o/blob/928edf243bf18b3451d858f3132a6b76a64a6a03/server/sandbox_stop_linux.go#L75

Comment 23 David Eads 2020-11-11 15:56:08 UTC
Filling in a few details from a recent 4.7 cluster.

Different UIDs are generated for each revision on disk

sh-4.4# cat /etc/kubernetes/static-pod-resources/kube-apiserver-pod-10/kube-apiserver-pod.yaml | jq
{
  "kind": "Pod",
  "apiVersion": "v1",
  "metadata": {
    "name": "kube-apiserver",
    "namespace": "openshift-kube-apiserver",
    "uid": "ecef213c-0269-414a-85dd-a84e3728f3b6",
sh-4.4# cat /etc/kubernetes/static-pod-resources/kube-apiserver-pod-9/kube-apiserver-pod.yaml | jq
{
  "kind": "Pod",
  "apiVersion": "v1",
  "metadata": {
    "name": "kube-apiserver",
    "namespace": "openshift-kube-apiserver",
    "uid": "0313bdc1-5cbb-4bf0-8120-3a9b35d5bff0",



But the kubelet appears to be ignoring the UID specified on disk and creating its own.  See 


from API (which the kubelet sets)
  annotations:
    kubectl.kubernetes.io/default-logs-container: kube-apiserver
    kubernetes.io/config.hash: e184e042-37e1-426c-89fb-46e7bc404b96
    kubernetes.io/config.mirror: e184e042-37e1-426c-89fb-46e7bc404b96
  ...
  resourceVersion: "548425"
  selfLink: /api/v1/namespaces/openshift-kube-apiserver/pods/kube-apiserver-ip-10-0-150-183.us-west-1.compute.internal
  uid: f7baa88e-9d03-4c13-80a7-0131a1c9efa8

from disk (which the operator sets in the manifest
sh-4.4# cat /etc/kubernetes/manifests/kube-apiserver-pod.yaml | jq
{
  "kind": "Pod",
  "apiVersion": "v1",
  "metadata": {
    "name": "kube-apiserver",
    "namespace": "openshift-kube-apiserver",
    "uid": "e184e042-37e1-426c-89fb-46e7bc404b96",



So I think it's being specified, but the kubelet may be creating duplicates.

Comment 24 Ryan Phillips 2020-11-11 17:48:54 UTC
Talked to Mrunal and debugged this further. When the static pod is updated the Kubelet sees a stop and start event. Kubelet sends down a StopPodSandbox and crio only waits for 10 seconds (in parallel) for each container. This is by design. We added support earlier in the year for graceful termination on static pod deletion, but updates are proving to be a different scenario.

Comment 25 Ryan Phillips 2020-11-12 14:06:20 UTC
When a static pod is created, a similar mirror pod is created within the API. These two objects will have different UIDs. The kubelet maintains a mapping between the static pod and mirror pod, vice versa.

Comment 26 jamo luhrsen 2020-11-17 19:33:54 UTC
This bug is also affecting at least https://testgrid.k8s.io/redhat-openshift-ocp-release-4.7-blocking#release-openshift-origin-installer-e2e-azure-4.7, which does block payload
promotion. It's failing for this job aprox 10% of the time recently.

example:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-4.7/1328700621242503168

Comment 27 Ryan Phillips 2020-11-17 19:42:07 UTC
Master team added a UID to the static pod manifests recently. I am seeing the static pod be destroyed and recreated on an update. Do you see an improvement within CI?

Comment 28 jamo luhrsen 2020-11-17 21:10:27 UTC
(In reply to Ryan Phillips from comment #27)
> Master team added a UID to the static pod manifests recently. I am seeing
> the static pod be destroyed and recreated on an update. Do you see an
> improvement within CI?

No, actually the job I was starting to look at (release-openshift-origin-installer-e2e-azure-4.7) seems to be seeing this test
case fail more recently than before. It failed aprox 7 in 75 jobs over the past week, but zero failures the week before that.

Comment 30 Neelesh Agrawal 2020-11-24 16:24:03 UTC
(In reply to Russell Teague from comment #29)
> I'm still seeing this fail frequently in the last 24 hours:

I took a sample (14) of these failures after removing cases where there were too many other failures. I have notices few things:
1. The graceful termination failure does not happen in isolation.
2. Following failures occur very frequently with graceful termination failure:
  ": [sig-network] pods should successfully create sandboxes by getting pod" 11 out of 14
  ": [sig-network] pods should successfully create sandboxes by reading container" 9 out of 14
  ": [sig-network] pods should successfully create sandboxes by other" 8 out of 14
3. I did not see any graceful termination failure on its own i.e. it is always accompanied with other failures.

Comment 31 Ryan Phillips 2020-11-30 21:37:23 UTC
I tested this locally and seeing the following log with a clean exit, then a partial log, and an exit code of 1. The test looks at any non-zero exit code.

I1121 02:36:31.517184     152 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
I1121 02:36:38.407664     152 client.go:360] parsed scheme: "passthrough"
I1121 02:36:38.407714     152 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{https://10.0.2.70:2379  <nil> 0 <nil>}] <nil> <nil>}
I1121 02:36:38.407722     152 clientconn.go:948] ClientConn switching balancer to "pick_first"
I1121 02:36:38.407983     152 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc02e03c380, {CONNECTING <nil>}
I1121 02:36:38.414470     152 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc02e03c380, {READY <nil>}
I1121 02:36:38.415207     152 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
I1121 02:36:41.385931     152 client.go:360] parsed scheme: "passthrough"
I1121 02:36:41.386102     152 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{https://10.0.2.71:2379  <nil> 0 <nil>}] <nil> <nil>}
I1121 02:36:41.386120     152 clientconn.go:948] ClientConn switching balancer to "pick_first"
I1121 02:36:41.386226     152 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc02e0f4ef0, {CONNECTING <nil>}
I1121 02:36:41.392751     152 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc02e0f4ef0, {READY <nil>}
I1121 02:36:41.393398     152 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
I1121 02:36:46.453117     152 genericapiserver.go:667] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-kube-apiserver", Name:"kube-apiserver-master-0.ocp.b1b.in", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TerminationGracefulTerminationFinished' All pending requests processed
W1121 02:36:46.453335     152 genericapiserver.go:670] failed to create event openshift-kube-apiserver/kube-apiserver-master-0.ocp.b1b.in.164965604f209a93: Event "kube-apiserver-master-0.ocp.b1b.in.164965604f209a93" is invalid: [eventTime: Required value, source: Invalid value: "": needs to be unset]
I1121 02:36:46.478210       1 main.go:198] Termination finished with exit code 0
---
I1121 02:41:56.954001       1 main.go:176] Received signal terminated. Forwarding to sub-process "hyperkube".
I1121 02:41:56.954798      18 genericapiserver.go:667] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-kube-apiserver", Name:"kube-apiserver-master-0.ocp.b1b.in", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TerminationStart' Received signal to terminate, becoming unready, but keeping serving
W1121 02:41:56.954935      18 genericapiserver.go:670] failed to create event openshift-kube-apiserver/kube-apiserver-master-0.ocp.b1b.in.164965a89a77e485: Event "kube-apiserver-master-0.ocp.b1b.in.164965a89a77e485" is invalid: [eventTime: Required value, source: Invalid value: "": needs to be unset]
I1121 02:41:56.954965      18 controller.go:181] Shutting down kubernetes service endpoint reconciler
I1121 02:41:57.026911       1 main.go:176] Received signal terminated. Forwarding to sub-process "hyperkube".
I1121 02:41:57.097255       1 main.go:198] Termination finished with exit code 1
I1121 02:41:57.097381       1 main.go:151] Deleting termination lock file "/var/log/kube-apiserver/.terminating"
---
Previous pod kube-apiserver-master-0.ocp.b1b.in started at 2020-11-21 02:43:11.186755311 +0000 UTC did not terminate gracefully
---

Additionally, the next log shows an exit code of 0, deleting the lock file, and still restarting with a did not terminate gracefully message:

I1121 01:32:27.079341      19 genericapiserver.go:667] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-kube-apiserver", Name:"kube-apiserver-master-0.ocp.b1b.in", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TerminationGracefulTerminationFinished' All pending requests processed
W1121 01:32:27.079554      19 genericapiserver.go:670] failed to create event openshift-kube-apiserver/kube-apiserver-master-0.ocp.b1b.in.164961ddba863f43: Event "kube-apiserver-master-0.ocp.b1b.in.164961ddba863f43" is invalid: [eventTime: Required value, source: Invalid value: "": needs to be unset]
I1121 01:32:27.109742       1 main.go:198] Termination finished with exit code 0
I1121 01:32:27.109876       1 main.go:151] Deleting termination lock file "/var/log/kube-apiserver/.terminating"
---
Previous pod kube-apiserver-master-0.ocp.b1b.in started at 2020-11-21 01:33:33.919585476 +0000 UTC did not terminate gracefully
---

Comment 35 jamo luhrsen 2020-12-11 17:36:34 UTC
we are still seeing this failure. search results for 4.7:
https://search.ci.openshift.org/?search=failed%3A.*%5C%5Bsig-api-machinery%5C%5D%5C%5BFeature%3AAPIServer%5C%5D%5C%5BLate%5C%5D+kubelet+terminates+kube-apiserver+gracefully&maxAge=24h&context=1&type=build-log&name=4.7&maxMatches=5&maxBytes=20971520&groupBy=job

is this closed as NOTABUG because of comment #30 which seems to say this failure is just fall-out from other troubles/failures
in the job? If so, do we have other BZs for those things?

Comment 37 Stefan Schimanski 2020-12-14 18:40:50 UTC
From comment #32:

> 4. Our pod uses a resource on the host that cannot be shared (hostPort in our case). Since we have specified terminationGracePeriodSeconds (60) seconds in our pod yaml, our existing static pod won't be gone before that time, this also causes the new replacement static pod to fail to start,

This is wrong. Pods in graceful termination must keep existing, and no other pod of the same name is allowed to launch. That's the invariant that pods have, how normal pods behave, and the same should be true for static pods. If the new pod is started before termination period is over, this is a bug.

Comment 38 Stefan Schimanski 2020-12-14 18:54:19 UTC
If we don't have that invariant, health check probes will be broken. Imagine the graceful terminating pod keeps the /health port open. The new one is starting and blocks because it cannot get the port, but the old one happily answers /health on its behalf, kubelet will set the wrong status of the pod.

We have a number of static pods, all of them have an undefined startup behaviour because they expect that the pod name is an exclusive resource. Kubelet breaks that assumption, leading to undefined behaviour and hard to debug bugs. We have to get this right. OpenShift depends on it.

Comment 42 Ryan Phillips 2021-01-08 16:06:05 UTC
*** Bug 1886920 has been marked as a duplicate of this bug. ***

Comment 46 Harshal Patil 2021-01-12 10:18:44 UTC
*** Bug 1913532 has been marked as a duplicate of this bug. ***

Comment 48 Harshal Patil 2021-01-20 09:57:58 UTC
*** Bug 1917133 has been marked as a duplicate of this bug. ***

Comment 50 Ryan Phillips 2021-01-22 19:34:24 UTC
*** Bug 1916880 has been marked as a duplicate of this bug. ***

Comment 51 Ke Wang 2021-01-25 09:49:27 UTC
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-01-22-134922   True        False         5h      Cluster version is 4.7.0-0.nightly-2021-01-22-134922

Connect to one master node,
oc debug node/ip-xx-xx-xxx-182.us-east-2.compute.internal

-For kube-apiserver, change one container requested memory size in kube-apiserver-pod.yaml, before change, check the current process ID of kube-apiserver.

# ps -ef |grep ' kube-apiserver '  | grep -v grep | awk '{print $2}'
26614
26651

Wait for a while, the process id of kube-apiserver were changed.
# ps -ef |grep ' kube-apiserver '  | grep -v grep | awk '{print $2}'
183910
183955

Checking the termination.log
...
I0125 08:16:57.961330      18 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
I0125 08:17:11.107619      18 genericapiserver.go:693] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-kube-apiserver", Name:"kube-apiserver-ip-xx-xx-xxx-182.us-east-2.compute.internal", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TerminationGracefulTerminationFinished' All pending requests processed
I0125 08:17:11.155293       1 main.go:198] Termination finished with exit code 0

In another terminal console, check if the kube-apiserver has been restarted.
$ oc get pods -n openshift-kube-apiserver --show-labels -l apiserver
NAME                                                        READY   STATUS    RESTARTS   AGE    LABELS
kube-apiserver-ip-xx-xx-xxx-182.us-east-2.compute.internal   5/5     Running   18         5h6m   apiserver=true,app=openshift-kube-apiserver,revision=5
...

Checked the pod describe,
...
  Normal   Killing    3m57s (x2 over 83m)   kubelet  Stopping container kube-apiserver-insecure-readyz
  Normal   Killing    3m57s (x2 over 83m)   kubelet  Stopping container kube-apiserver-cert-regeneration-controller
  Normal   Killing    3m57s (x2 over 83m)   kubelet  Stopping container kube-apiserver-cert-syncer
  Normal   Killing    3m57s (x2 over 83m)   kubelet  Stopping container kube-apiserver-check-endpoints
  Warning  Unhealthy  3m57s                 kubelet  Readiness probe failed: Get "https://xx.xx.xxx.182:17697/healthz": dial tcp xx.xx.xxx.182:17697: connect: connection refused
  Normal   Killing    3m57s (x2 over 83m)   kubelet  Stopping container kube-apiserver
  Normal   Pulled     3m57s (x4 over 5h6m)  kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bdd57b27d1cbd43709ebf411a6519e251a9b6d63867abe5997b3417c12adc4bf" already present on machine
  Normal   Killing    3m57s                 kubelet  Container kube-apiserver-cert-syncer definition changed, will be restarted
  Normal   Started    3m56s (x4 over 5h6m)  kubelet  Started container kube-apiserver-cert-syncer
  Normal   Created    3m56s (x4 over 5h6m)  kubelet  Created container kube-apiserver-cert-syncer
  Normal   Pulled     3m53s (x4 over 5h6m)  kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bdd57b27d1cbd43709ebf411a6519e251a9b6d63867abe5997b3417c12adc4bf" already present on machine
  Normal   Created    3m52s (x4 over 5h6m)  kubelet  Created container kube-apiserver-cert-regeneration-controller
  Normal   Created    3m52s (x4 over 5h6m)  kubelet  Created container kube-apiserver-insecure-readyz
  Normal   Pulled     3m52s (x4 over 5h6m)  kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bdd57b27d1cbd43709ebf411a6519e251a9b6d63867abe5997b3417c12adc4bf" already present on machine
  Normal   Started    3m52s (x4 over 5h6m)  kubelet  Started container kube-apiserver-cert-regeneration-controller
  Warning  Unhealthy  2m47s (x7 over 82m)   kubelet  Readiness probe failed: Get "https://xx.xx.xxx.182:6443/healthz": dial tcp xx.xx.xxx.182:6443: connect: connection refused
Mon Jan 25 17:38:45 [kewang@kewang-fedora env471]$ oc get pods -n openshift-kube-apiserver --show-labels -l apiserver

Based on above results, the static pod was updated by respecting terminationGracePeriodSeconds, so move the bug VERIFIED.

Comment 53 Ke Wang 2021-01-26 08:18:35 UTC
I will try again with the latest payloads, not sure why this problem still arises。

Comment 54 Ke Wang 2021-01-26 09:31:52 UTC
Repeated https://bugzilla.redhat.com/show_bug.cgi?id=1882750#c51, 

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-01-26-024127   True        False         46m     Cluster version is 4.7.0-0.nightly-2021-01-26-024127

sh-4.4# vi kube-apiserver-pod.yaml  #Changed some and saved.

sh-4.4# date
Tue Jan 26 09:05:32 UTC 2021

sh-4.4# ps -ef |grep ' kube-apiserver '  | grep -v grep | awk '{print $2}'
42598
42657

sh-4.4# date; ps -ef |grep ' kube-apiserver '  | grep -v grep | awk '{print $2}'
Tue Jan 26 09:07:49 UTC 2021
85534
85574

After 135s(terminationGracePeriodSeconds), the new kube-apiserver was started, checked the kube-apiserver events, from 7m14s killed and 5m1s started, the process is as expected.
...
Events:
  Type    Reason          Age                 From     Message
  ----    ------          ----                ----     -------
  Normal  Pulled          33m                 kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:032b991ac3321b32478efc76c159ac5d918e751a63840f0cb52ffe6ec344341b" already present on machine
  Normal  Created         33m                 kubelet  Created container kube-apiserver-cert-syncer
  Normal  Started         33m                 kubelet  Started container kube-apiserver-cert-syncer
  Normal  Pulled          33m                 kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:032b991ac3321b32478efc76c159ac5d918e751a63840f0cb52ffe6ec344341b" already present on machine
  Normal  Pulled          33m                 kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:032b991ac3321b32478efc76c159ac5d918e751a63840f0cb52ffe6ec344341b" already present on machine
  Normal  Started         33m                 kubelet  Started container kube-apiserver-cert-regeneration-controller
  Normal  Created         33m                 kubelet  Created container kube-apiserver-cert-regeneration-controller
  Normal  Pulled          33m                 kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:032b991ac3321b32478efc76c159ac5d918e751a63840f0cb52ffe6ec344341b" already present on machine
  Normal  Started         33m                 kubelet  Started container kube-apiserver-insecure-readyz
  Normal  Created         33m                 kubelet  Created container kube-apiserver-insecure-readyz
  Normal  Created         33m                 kubelet  Created container kube-apiserver-check-endpoints
  Normal  Started         33m                 kubelet  Started container kube-apiserver-check-endpoints
  Normal  Killing         7m14s               kubelet  Stopping container kube-apiserver
  Normal  Killing         7m14s               kubelet  Stopping container kube-apiserver-check-endpoints
  Normal  Killing         7m14s               kubelet  Stopping container kube-apiserver-cert-regeneration-controller
  Normal  Killing         7m14s               kubelet  Stopping container kube-apiserver-insecure-readyz
  Normal  Killing         7m14s               kubelet  Stopping container kube-apiserver-cert-syncer
  Normal  SandboxChanged  5m4s                kubelet  Pod sandbox changed, it will be killed and re-created.
  Normal  Pulled          5m3s (x2 over 33m)  kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:68bc81cc71914f43dac3f6915497904aec847ee1ab8fee2ee156832efd5cd247" already present on machine
  Normal  Pulled          5m2s (x2 over 33m)  kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:68bc81cc71914f43dac3f6915497904aec847ee1ab8fee2ee156832efd5cd247" already present on machine
  Normal  Started         5m2s (x2 over 33m)  kubelet  Started container setup
  Normal  Created         5m2s (x2 over 33m)  kubelet  Created container setup
  Normal  Started         5m1s (x2 over 33m)  kubelet  Started container kube-apiserver
  Normal  Created         5m1s (x2 over 33m)  kubelet  Created container kube-apiserver

For the e2e case https://github.com/openshift/origin/pull/25568/files running in above https://bugzilla.redhat.com/show_bug.cgi?id=1882750#c52, I guess it's just a events check for keyword 'GracefulTerminationTimeout', but the root cause we didn't know what happened.

Comment 55 Clayton Coleman 2021-02-03 15:41:30 UTC
This is still happening, even with the different test.  The underlying issue may not have been fully resolved, or the fix may have moved the issue to a different area of code.

Comment 60 Fabian von Feilitzsch 2021-02-10 20:07:05 UTC
We're still seeing this in aws-serial jobs, after https://bugzilla.redhat.com/show_bug.cgi?id=1925262 was verified. There may be a second unsolved issue here. Reopening.

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-serial-4.7/1359556162759954432

Comment 61 Ryan Phillips 2021-02-10 20:53:26 UTC
Still waiting on other patches... Marking as verified.

Comment 62 Fabian von Feilitzsch 2021-02-10 21:02:40 UTC
Proper bug for tracking this is https://bugzilla.redhat.com/show_bug.cgi?id=1927500

Comment 64 ravig 2021-02-15 18:54:33 UTC
*** Bug 1918930 has been marked as a duplicate of this bug. ***

Comment 67 errata-xmlrpc 2021-02-24 15:21:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.