Bug 1886563 - [sig-api-machinery][Feature:APIServer][Late] kubelet terminates kube-apiserver gracefully
Summary: [sig-api-machinery][Feature:APIServer][Late] kubelet terminates kube-apiserve...
Keywords:
Status: CLOSED DUPLICATE of bug 1882750
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Stefan Schimanski
QA Contact: Ke Wang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-08 18:43 UTC by Benjamin Gilbert
Modified: 2020-10-08 19:24 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
[sig-api-machinery][Feature:APIServer][Late] kubelet terminates kube-apiserver gracefully
Last Closed: 2020-10-08 19:24:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Benjamin Gilbert 2020-10-08 18:43:46 UTC
test:
[sig-api-machinery][Feature:APIServer][Late] kubelet terminates kube-apiserver gracefully 

is failing frequently in CI, see search results:
https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=%5C%5Bsig-api-machinery%5C%5D%5C%5BFeature%3AAPIServer%5C%5D%5C%5BLate%5C%5D+kubelet+terminates+kube-apiserver+gracefully

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-compact-4.6/1314250018076495872

fail [github.com/onsi/ginkgo.0-origin.1+incompatible/internal/leafnodes/runner.go:64]: kube-apiserver reports a non-graceful termination: v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"kube-apiserver-master-1.ci-op-jwq0wjcr-35904.origin-ci-int-aws.dev.rhcloud.com.163c1542b8f9d61a", GenerateName:"", Namespace:"openshift-kube-apiserver", SelfLink:"/api/v1/namespaces/openshift-kube-apiserver/events/kube-apiserver-master-1.ci-op-jwq0wjcr-35904.origin-ci-int-aws.dev.rhcloud.com.163c1542b8f9d61a", UID:"a55424d2-daf0-4ce1-8de1-1d414b5233da", ResourceVersion:"21297", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63737775743, loc:(*time.Location)(0x9003460)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry{v1.ManagedFieldsEntry{Manager:"watch-termination", Operation:"Update", APIVersion:"v1", Time:(*v1.Time)(0xc001ebea00), FieldsType:"FieldsV1", FieldsV1:(*v1.FieldsV1)(0xc001ebea20)}}}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"openshift-kube-apiserver", Name:"kube-apiserver-master-1.ci-op-jwq0wjcr-35904.origin-ci-int-aws.dev.rhcloud.com", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}, Reason:"NonGracefulTermination", Message:"Previous pod kube-apiserver-master-1.ci-op-jwq0wjcr-35904.origin-ci-int-aws.dev.rhcloud.com started at 2020-10-08 17:33:31.267794933 +0000 UTC did not terminate gracefully", Source:v1.EventSource{Component:"apiserver", Host:"master-1.ci-op-jwq0wjcr-35904.origin-ci-int-aws.dev.rhcloud.com"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63737775743, loc:(*time.Location)(0x9003460)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63737775743, loc:(*time.Location)(0x9003460)}}, Count:1, Type:"Warning", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}. Probably kubelet or CRI-O is not giving the time to cleanly shut down. This can lead to connection refused and network I/O timeout errors in other components.

Comment 1 David Eads 2020-10-08 19:20:58 UTC
What's noteworthy is less the failure and more the sudden shift in frequency for some jobs

release-openshift-ocp-installer-e2e-azure-4.6			88.00% (0.00%)(25 runs)		100.00% (0.00%)(39 runs)
release-openshift-origin-installer-e2e-remote-libvirt-s390x-4.6	90.00% (0.00%)(10 runs)		100.00% (0.00%)(6 runs)
release-openshift-ocp-installer-e2e-aws-4.6			92.00% (0.00%)(50 runs)		97.06% (0.00%)(68 runs)
periodic-ci-openshift-release-master-ocp-4.6-e2e-vsphere	93.94% (0.00%)(33 runs)		100.00% (0.00%)(35 runs)

@sjenning.  This suddenly got a lot more severe and would explain our upgrade 10% increase in availability downtime.

Comment 2 David Eads 2020-10-08 19:24:12 UTC

*** This bug has been marked as a duplicate of bug 1882750 ***


Note You need to log in before you can comment on or make changes to this bug.