Bug 1955610 - release-openshift-origin-installer-old-rhcos-e2e-aws-4.7 is permfailing
Summary: release-openshift-origin-installer-old-rhcos-e2e-aws-4.7 is permfailing
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.7
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.8.0
Assignee: Ryan Phillips
QA Contact: Sunil Choudhary
URL:
Whiteboard:
Depends On:
Blocks: 1955669
TreeView+ depends on / blocked
 
Reported: 2021-04-30 14:16 UTC by Ben Parees
Modified: 2021-07-27 23:05 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1955669 (view as bug list)
Environment:
job=release-openshift-origin-installer-old-rhcos-e2e-aws-4.7=all
Last Closed: 2021-07-27 23:05:17 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:05:39 UTC

Description Ben Parees 2021-04-30 14:16:04 UTC
job:
release-openshift-origin-installer-old-rhcos-e2e-aws-4.7 

is always failing in CI, see testgrid results:
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.7-informing#release-openshift-origin-installer-old-rhcos-e2e-aws-4.7


Note: this job attempts to run the current 4.7 codebase on top of the previous rhcos image (so older crio/kubelet).

The main concerning error is that the apiserver is getting terminated non-gracefully, which can lead to failures in other tests (since they can't reach the apiserver or lose connection to it).

see:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-old-rhcos-e2e-aws-4.7/1387145769860993024

fail [github.com/onsi/ginkgo@v4.5.0-origin.1+incompatible/internal/leafnodes/runner.go:64]: kube-apiserver reports a non-graceful termination: v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"kube-apiserver-ip-10-0-194-195.ec2.internal.1679d369bf517c63", GenerateName:"", Namespace:"openshift-kube-apiserver", SelfLink:"/api/v1/namespaces/openshift-kube-apiserver/events/kube-apiserver-ip-10-0-194-195.ec2.internal.1679d369bf517c63", UID:"589e86a5-28cb-4ff4-991a-9977b27b0e73", ResourceVersion:"22443", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63755154792, loc:(*time.Location)(0x9068880)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry{v1.ManagedFieldsEntry{Manager:"watch-termination", Operation:"Update", APIVersion:"v1", Time:(*v1.Time)(0xc0004a46c0), FieldsType:"FieldsV1", FieldsV1:(*v1.FieldsV1)(0xc0004a46e0)}}}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"openshift-kube-apiserver", Name:"kube-apiserver-ip-10-0-194-195.ec2.internal", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}, Reason:"NonGracefulTermination", Message:"Previous pod kube-apiserver-ip-10-0-194-195.ec2.internal started at 2021-04-27 21:11:52.712098115 +0000 UTC did not terminate gracefully", Source:v1.EventSource{Component:"apiserver", Host:"ip-10-0-194-195"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63755154792, loc:(*time.Location)(0x9068880)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63755154792, loc:(*time.Location)(0x9068880)}}, Count:1, Type:"Warning", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}. Probably kubelet or CRI-O is not giving the time to cleanly shut down. This can lead to connection refused and network I/O timeout errors in other components.


So i am starting this with the kubelet.

Comment 1 Ryan Phillips 2021-04-30 16:01:30 UTC
Self verifying... patch has merged into 4.8 to flake the test. Going to backport it to 4.7.

Comment 4 errata-xmlrpc 2021-07-27 23:05:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.