Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1776402

Summary:	Test Failure: release-openshift-origin-installer-e2e-aws-ovn-kubernetes-4.3 #49 : Run template e2e-aws - e2e-aws-ovn-kubernetes container setup
Product:	OpenShift Container Platform	Reporter:	Lokesh Mandvekar <lsm5>
Component:	Etcd	Assignee:	Sam Batschelet <sbatsche>
Status:	CLOSED INSUFFICIENT_DATA	QA Contact:	ge liu <geliu>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	4.3.0	CC:	aos-bugs, gblomqui, jokerman, lszaszki, mfojtik, nagrawal
Target Milestone:	---
Target Release:	4.4.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-02-25 10:30:54 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1775878
Bug Blocks:

Description Lokesh Mandvekar 2019-11-25 15:54:13 UTC

Description of problem:

See: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-ovn-kubernetes-4.3/49
First and only occurrence at 06:31:27

Run template e2e-aws - e2e-aws-ovn-kubernetes container setup expand_less 	48m28s
ping to complete..."
level=info msg="Destroying the bootstrap resources..."
level=info msg="Waiting up to 30m0s for the cluster at https://api.ci-op-5n3t4y6h-bb4ea.origin-ci-int-aws.dev.rhcloud.com:6443 to initialize..."
level=info msg="Cluster operator authentication Progressing is True with ProgressingWellKnownNotReady: Progressing: got '404 Not Found' status while trying to GET the OAuth well-known https://10.0.128.138:6443/.well-known/oauth-authorization-server endpoint data"
level=info msg="Cluster operator authentication Available is False with Available: "
level=info msg="Cluster operator console Progressing is True with SyncLoopRefreshProgressingInProgress: SyncLoopRefreshProgressing: Working toward version 4.3.0-0.nightly-2019-11-25-022933"
level=info msg="Cluster operator console Available is False with DeploymentAvailableInsufficientReplicas: DeploymentAvailable: 0 pods available for console deployment"
level=info msg="Cluster operator insights Disabled is False with : "
level=error msg="Cluster operator kube-apiserver Degraded is True with NodeInstallerDegradedInstallerPodFailed: NodeInstallerDegraded: 1 nodes are failing on revision 3:\nNodeInstallerDegraded: "
level=info msg="Cluster operator kube-apiserver Progressing is True with Progressing: Progressing: 1 nodes are at revision 0; 1 nodes are at revision 2; 1 nodes are at revision 3; 0 nodes have achieved new revision 5"
level=error msg="Cluster operator kube-controller-manager Degraded is True with NodeInstallerDegradedInstallerPodFailed: NodeInstallerDegraded: 1 nodes are failing on revision 5:\nNodeInstallerDegraded: "
level=info msg="Cluster operator kube-controller-manager Progressing is True with Progressing: Progressing: 2 nodes are at revision 4; 1 nodes are at revision 6"
level=info msg="Cluster operator kube-scheduler Progressing is True with Progressing: Progressing: 2 nodes are at revision 4; 1 nodes are at revision 5"
level=fatal msg="failed to initialize the cluster: Working towards 4.3.0-0.nightly-2019-11-25-022933: 100% complete"

Comment 1 Abhinav Dahiya 2019-11-25 21:35:32 UTC

> level=error msg="Cluster operator kube-apiserver Degraded is True with NodeInstallerDegradedInstallerPodFailed: NodeInstallerDegraded: 1 nodes are failing on revision 3:\nNodeInstallerDegraded: "
> level=info msg="Cluster operator kube-apiserver Progressing is True with Progressing: Progressing: 1 nodes are at revision 0; 1 nodes are at revision 2; 1 nodes are at revision 3; 0 nodes have achieved new revision 5"
> level=error msg="Cluster operator kube-controller-manager Degraded is True with NodeInstallerDegradedInstallerPodFailed: NodeInstallerDegraded: 1 nodes are failing on revision 5:\nNodeInstallerDegraded: "
> level=info msg="Cluster operator kube-controller-manager Progressing is True with Progressing: Progressing: 2 nodes are at revision 4; 1 nodes are at revision 6"
> level=info msg="Cluster operator kube-scheduler Progressing is True with Progressing: Progressing: 2 nodes are at revision 4; 1 nodes are at revision 5"

seems like problem with one of the nodes.

Comment 2 Ryan Phillips 2019-12-02 18:06:42 UTC

https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-ovn-kubernetes-4.3/49/artifacts/e2e-aws/pods/openshift-etcd_etcd-member-ip-10-0-128-138.ec2.internal_etcd-member.log

etcd started throwing errors of requests taking too long at around ~11:48:00. Likely the install didn't converge because of this reason.

Moving this ticket to the etcd team.

Comment 4 Greg Blomquist 2019-12-11 19:21:20 UTC

Adding dependency on bug #1775878

The timeout issues noted in the logs appear to overlap.  Not positive, but wanted to dry the link.

Comment 6 Lukasz Szaszkiewicz 2020-02-21 09:09:29 UTC

I wanted to see the logs from the other components just to see if they were reporting network errors but the link seems to be broken - https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-ovn-kubernetes-4.3/49/