Bug 1955669 - release-openshift-origin-installer-old-rhcos-e2e-aws-4.7 is permfailing
Summary: release-openshift-origin-installer-old-rhcos-e2e-aws-4.7 is permfailing
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.7
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.7.z
Assignee: Ryan Phillips
QA Contact: Sunil Choudhary
URL:
Whiteboard:
Depends On: 1955610
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-30 16:01 UTC by Ryan Phillips
Modified: 2021-05-19 15:17 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1955610
Environment:
job=release-openshift-origin-installer-old-rhcos-e2e-aws-4.7=all
Last Closed: 2021-05-19 15:17:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 26121 0 None open Bug 1955669: flake testKubeletToAPIServerGracefulTermination 2021-04-30 16:03:13 UTC
Red Hat Product Errata RHBA-2021:1550 0 None None None 2021-05-19 15:17:16 UTC

Description Ryan Phillips 2021-04-30 16:01:59 UTC
+++ This bug was initially created as a clone of Bug #1955610 +++

job:
release-openshift-origin-installer-old-rhcos-e2e-aws-4.7 

is always failing in CI, see testgrid results:
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.7-informing#release-openshift-origin-installer-old-rhcos-e2e-aws-4.7


Note: this job attempts to run the current 4.7 codebase on top of the previous rhcos image (so older crio/kubelet).

The main concerning error is that the apiserver is getting terminated non-gracefully, which can lead to failures in other tests (since they can't reach the apiserver or lose connection to it).

see:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-old-rhcos-e2e-aws-4.7/1387145769860993024

fail [github.com/onsi/ginkgo.0-origin.1+incompatible/internal/leafnodes/runner.go:64]: kube-apiserver reports a non-graceful termination: v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"kube-apiserver-ip-10-0-194-195.ec2.internal.1679d369bf517c63", GenerateName:"", Namespace:"openshift-kube-apiserver", SelfLink:"/api/v1/namespaces/openshift-kube-apiserver/events/kube-apiserver-ip-10-0-194-195.ec2.internal.1679d369bf517c63", UID:"589e86a5-28cb-4ff4-991a-9977b27b0e73", ResourceVersion:"22443", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63755154792, loc:(*time.Location)(0x9068880)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry{v1.ManagedFieldsEntry{Manager:"watch-termination", Operation:"Update", APIVersion:"v1", Time:(*v1.Time)(0xc0004a46c0), FieldsType:"FieldsV1", FieldsV1:(*v1.FieldsV1)(0xc0004a46e0)}}}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"openshift-kube-apiserver", Name:"kube-apiserver-ip-10-0-194-195.ec2.internal", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}, Reason:"NonGracefulTermination", Message:"Previous pod kube-apiserver-ip-10-0-194-195.ec2.internal started at 2021-04-27 21:11:52.712098115 +0000 UTC did not terminate gracefully", Source:v1.EventSource{Component:"apiserver", Host:"ip-10-0-194-195"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63755154792, loc:(*time.Location)(0x9068880)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63755154792, loc:(*time.Location)(0x9068880)}}, Count:1, Type:"Warning", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}. Probably kubelet or CRI-O is not giving the time to cleanly shut down. This can lead to connection refused and network I/O timeout errors in other components.


So i am starting this with the kubelet.

--- Additional comment from Ryan Phillips on 2021-04-30 16:01:30 UTC ---

Self verifying... patch has merged into 4.8 to flake the test. Going to backport it to 4.7.

Comment 3 Sunil Choudhary 2021-05-05 04:39:32 UTC
Patch has merged into 4.7 to flake the test

Comment 4 Siddharth Sharma 2021-05-10 17:59:54 UTC
This bug will be shipped as part of next z-stream release 4.7.11 on May 19th, as 4.7.10 was dropped due to a blocker https://bugzilla.redhat.com/show_bug.cgi?id=1958518.

Comment 8 errata-xmlrpc 2021-05-19 15:17:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.11 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1550


Note You need to log in before you can comment on or make changes to this bug.