Bug 1955610

Summary: release-openshift-origin-installer-old-rhcos-e2e-aws-4.7 is permfailing
Product: OpenShift Container Platform Reporter: Ben Parees <bparees>
Component: NodeAssignee: Ryan Phillips <rphillips>
Node sub component: Kubelet QA Contact: Sunil Choudhary <schoudha>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: aos-bugs
Version: 4.7   
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1955669 (view as bug list) Environment:
job=release-openshift-origin-installer-old-rhcos-e2e-aws-4.7=all
Last Closed: 2021-07-27 23:05:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1955669    

Description Ben Parees 2021-04-30 14:16:04 UTC
job:
release-openshift-origin-installer-old-rhcos-e2e-aws-4.7 

is always failing in CI, see testgrid results:
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.7-informing#release-openshift-origin-installer-old-rhcos-e2e-aws-4.7


Note: this job attempts to run the current 4.7 codebase on top of the previous rhcos image (so older crio/kubelet).

The main concerning error is that the apiserver is getting terminated non-gracefully, which can lead to failures in other tests (since they can't reach the apiserver or lose connection to it).

see:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-old-rhcos-e2e-aws-4.7/1387145769860993024

fail [github.com/onsi/ginkgo.0-origin.1+incompatible/internal/leafnodes/runner.go:64]: kube-apiserver reports a non-graceful termination: v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"kube-apiserver-ip-10-0-194-195.ec2.internal.1679d369bf517c63", GenerateName:"", Namespace:"openshift-kube-apiserver", SelfLink:"/api/v1/namespaces/openshift-kube-apiserver/events/kube-apiserver-ip-10-0-194-195.ec2.internal.1679d369bf517c63", UID:"589e86a5-28cb-4ff4-991a-9977b27b0e73", ResourceVersion:"22443", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63755154792, loc:(*time.Location)(0x9068880)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry{v1.ManagedFieldsEntry{Manager:"watch-termination", Operation:"Update", APIVersion:"v1", Time:(*v1.Time)(0xc0004a46c0), FieldsType:"FieldsV1", FieldsV1:(*v1.FieldsV1)(0xc0004a46e0)}}}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"openshift-kube-apiserver", Name:"kube-apiserver-ip-10-0-194-195.ec2.internal", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}, Reason:"NonGracefulTermination", Message:"Previous pod kube-apiserver-ip-10-0-194-195.ec2.internal started at 2021-04-27 21:11:52.712098115 +0000 UTC did not terminate gracefully", Source:v1.EventSource{Component:"apiserver", Host:"ip-10-0-194-195"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63755154792, loc:(*time.Location)(0x9068880)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63755154792, loc:(*time.Location)(0x9068880)}}, Count:1, Type:"Warning", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}. Probably kubelet or CRI-O is not giving the time to cleanly shut down. This can lead to connection refused and network I/O timeout errors in other components.


So i am starting this with the kubelet.

Comment 1 Ryan Phillips 2021-04-30 16:01:30 UTC
Self verifying... patch has merged into 4.8 to flake the test. Going to backport it to 4.7.

Comment 4 errata-xmlrpc 2021-07-27 23:05:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438