1955669 – release-openshift-origin-installer-old-rhcos-e2e-aws-4.7 is permfailing

Bug 1955669 - release-openshift-origin-installer-old-rhcos-e2e-aws-4.7 is permfailing

Summary: release-openshift-origin-installer-old-rhcos-e2e-aws-4.7 is permfailing

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.7.z
Assignee:	Ryan Phillips
QA Contact:	Sunil Choudhary
Docs Contact:
URL:
Whiteboard:
Depends On:	1955610
Blocks:
TreeView+	depends on / blocked

Reported:	2021-04-30 16:01 UTC by Ryan Phillips
Modified:	2021-05-19 15:17 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1955610
Environment:	job=release-openshift-origin-installer-old-rhcos-e2e-aws-4.7=all
Last Closed:	2021-05-19 15:17:01 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift origin pull 26121	0	None	open	Bug 1955669: flake testKubeletToAPIServerGracefulTermination	2021-04-30 16:03:13 UTC
Red Hat Product Errata	RHBA-2021:1550	0	None	None	None	2021-05-19 15:17:16 UTC

Description Ryan Phillips 2021-04-30 16:01:59 UTC

+++ This bug was initially created as a clone of Bug #1955610 +++

job:
release-openshift-origin-installer-old-rhcos-e2e-aws-4.7 

is always failing in CI, see testgrid results:
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.7-informing#release-openshift-origin-installer-old-rhcos-e2e-aws-4.7


Note: this job attempts to run the current 4.7 codebase on top of the previous rhcos image (so older crio/kubelet).

The main concerning error is that the apiserver is getting terminated non-gracefully, which can lead to failures in other tests (since they can't reach the apiserver or lose connection to it).

see:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-old-rhcos-e2e-aws-4.7/1387145769860993024

fail [github.com/onsi/ginkgo.0-origin.1+incompatible/internal/leafnodes/runner.go:64]: kube-apiserver reports a non-graceful termination: v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"kube-apiserver-ip-10-0-194-195.ec2.internal.1679d369bf517c63", GenerateName:"", Namespace:"openshift-kube-apiserver", SelfLink:"/api/v1/namespaces/openshift-kube-apiserver/events/kube-apiserver-ip-10-0-194-195.ec2.internal.1679d369bf517c63", UID:"589e86a5-28cb-4ff4-991a-9977b27b0e73", ResourceVersion:"22443", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63755154792, loc:(*time.Location)(0x9068880)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry{v1.ManagedFieldsEntry{Manager:"watch-termination", Operation:"Update", APIVersion:"v1", Time:(*v1.Time)(0xc0004a46c0), FieldsType:"FieldsV1", FieldsV1:(*v1.FieldsV1)(0xc0004a46e0)}}}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"openshift-kube-apiserver", Name:"kube-apiserver-ip-10-0-194-195.ec2.internal", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}, Reason:"NonGracefulTermination", Message:"Previous pod kube-apiserver-ip-10-0-194-195.ec2.internal started at 2021-04-27 21:11:52.712098115 +0000 UTC did not terminate gracefully", Source:v1.EventSource{Component:"apiserver", Host:"ip-10-0-194-195"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63755154792, loc:(*time.Location)(0x9068880)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63755154792, loc:(*time.Location)(0x9068880)}}, Count:1, Type:"Warning", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}. Probably kubelet or CRI-O is not giving the time to cleanly shut down. This can lead to connection refused and network I/O timeout errors in other components.


So i am starting this with the kubelet.

--- Additional comment from Ryan Phillips on 2021-04-30 16:01:30 UTC ---

Self verifying... patch has merged into 4.8 to flake the test. Going to backport it to 4.7.

Comment 3 Sunil Choudhary 2021-05-05 04:39:32 UTC

Patch has merged into 4.7 to flake the test

Comment 4 Siddharth Sharma 2021-05-10 17:59:54 UTC

This bug will be shipped as part of next z-stream release 4.7.11 on May 19th, as 4.7.10 was dropped due to a blocker https://bugzilla.redhat.com/show_bug.cgi?id=1958518.

Comment 8 errata-xmlrpc 2021-05-19 15:17:01 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.11 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1550

Note You need to log in before you can comment on or make changes to this bug.