Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1776402

Summary: Test Failure: release-openshift-origin-installer-e2e-aws-ovn-kubernetes-4.3 #49 : Run template e2e-aws - e2e-aws-ovn-kubernetes container setup
Product: OpenShift Container Platform Reporter: Lokesh Mandvekar <lsm5>
Component: EtcdAssignee: Sam Batschelet <sbatsche>
Status: CLOSED INSUFFICIENT_DATA QA Contact: ge liu <geliu>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.3.0CC: aos-bugs, gblomqui, jokerman, lszaszki, mfojtik, nagrawal
Target Milestone: ---   
Target Release: 4.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-25 10:30:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1775878    
Bug Blocks:    

Description Lokesh Mandvekar 2019-11-25 15:54:13 UTC
Description of problem:

See: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-ovn-kubernetes-4.3/49
First and only occurrence at 06:31:27

Run template e2e-aws - e2e-aws-ovn-kubernetes container setup expand_less 	48m28s
ping to complete..."
level=info msg="Destroying the bootstrap resources..."
level=info msg="Waiting up to 30m0s for the cluster at https://api.ci-op-5n3t4y6h-bb4ea.origin-ci-int-aws.dev.rhcloud.com:6443 to initialize..."
level=info msg="Cluster operator authentication Progressing is True with ProgressingWellKnownNotReady: Progressing: got '404 Not Found' status while trying to GET the OAuth well-known https://10.0.128.138:6443/.well-known/oauth-authorization-server endpoint data"
level=info msg="Cluster operator authentication Available is False with Available: "
level=info msg="Cluster operator console Progressing is True with SyncLoopRefreshProgressingInProgress: SyncLoopRefreshProgressing: Working toward version 4.3.0-0.nightly-2019-11-25-022933"
level=info msg="Cluster operator console Available is False with DeploymentAvailableInsufficientReplicas: DeploymentAvailable: 0 pods available for console deployment"
level=info msg="Cluster operator insights Disabled is False with : "
level=error msg="Cluster operator kube-apiserver Degraded is True with NodeInstallerDegradedInstallerPodFailed: NodeInstallerDegraded: 1 nodes are failing on revision 3:\nNodeInstallerDegraded: "
level=info msg="Cluster operator kube-apiserver Progressing is True with Progressing: Progressing: 1 nodes are at revision 0; 1 nodes are at revision 2; 1 nodes are at revision 3; 0 nodes have achieved new revision 5"
level=error msg="Cluster operator kube-controller-manager Degraded is True with NodeInstallerDegradedInstallerPodFailed: NodeInstallerDegraded: 1 nodes are failing on revision 5:\nNodeInstallerDegraded: "
level=info msg="Cluster operator kube-controller-manager Progressing is True with Progressing: Progressing: 2 nodes are at revision 4; 1 nodes are at revision 6"
level=info msg="Cluster operator kube-scheduler Progressing is True with Progressing: Progressing: 2 nodes are at revision 4; 1 nodes are at revision 5"
level=fatal msg="failed to initialize the cluster: Working towards 4.3.0-0.nightly-2019-11-25-022933: 100% complete"

Comment 1 Abhinav Dahiya 2019-11-25 21:35:32 UTC
> level=error msg="Cluster operator kube-apiserver Degraded is True with NodeInstallerDegradedInstallerPodFailed: NodeInstallerDegraded: 1 nodes are failing on revision 3:\nNodeInstallerDegraded: "
> level=info msg="Cluster operator kube-apiserver Progressing is True with Progressing: Progressing: 1 nodes are at revision 0; 1 nodes are at revision 2; 1 nodes are at revision 3; 0 nodes have achieved new revision 5"
> level=error msg="Cluster operator kube-controller-manager Degraded is True with NodeInstallerDegradedInstallerPodFailed: NodeInstallerDegraded: 1 nodes are failing on revision 5:\nNodeInstallerDegraded: "
> level=info msg="Cluster operator kube-controller-manager Progressing is True with Progressing: Progressing: 2 nodes are at revision 4; 1 nodes are at revision 6"
> level=info msg="Cluster operator kube-scheduler Progressing is True with Progressing: Progressing: 2 nodes are at revision 4; 1 nodes are at revision 5"

seems like problem with one of the nodes.

Comment 2 Ryan Phillips 2019-12-02 18:06:42 UTC
https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-ovn-kubernetes-4.3/49/artifacts/e2e-aws/pods/openshift-etcd_etcd-member-ip-10-0-128-138.ec2.internal_etcd-member.log

etcd started throwing errors of requests taking too long at around ~11:48:00. Likely the install didn't converge because of this reason.

Moving this ticket to the etcd team.

Comment 4 Greg Blomquist 2019-12-11 19:21:20 UTC
Adding dependency on bug #1775878

The timeout issues noted in the logs appear to overlap.  Not positive, but wanted to dry the link.

Comment 6 Lukasz Szaszkiewicz 2020-02-21 09:09:29 UTC
I wanted to see the logs from the other components just to see if they were reporting network errors but the link seems to be broken - https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-ovn-kubernetes-4.3/49/