Bug 1776402 - Test Failure: release-openshift-origin-installer-e2e-aws-ovn-kubernetes-4.3 #49 : Run template e2e-aws - e2e-aws-ovn-kubernetes container setup
Summary: Test Failure: release-openshift-origin-installer-e2e-aws-ovn-kubernetes-4.3 #...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Etcd
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.4.0
Assignee: Sam Batschelet
QA Contact: ge liu
URL:
Whiteboard:
Depends On: 1775878
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-25 15:54 UTC by Lokesh Mandvekar
Modified: 2020-02-25 10:30 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-02-25 10:30:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Lokesh Mandvekar 2019-11-25 15:54:13 UTC
Description of problem:

See: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-ovn-kubernetes-4.3/49
First and only occurrence at 06:31:27

Run template e2e-aws - e2e-aws-ovn-kubernetes container setup expand_less 	48m28s
ping to complete..."
level=info msg="Destroying the bootstrap resources..."
level=info msg="Waiting up to 30m0s for the cluster at https://api.ci-op-5n3t4y6h-bb4ea.origin-ci-int-aws.dev.rhcloud.com:6443 to initialize..."
level=info msg="Cluster operator authentication Progressing is True with ProgressingWellKnownNotReady: Progressing: got '404 Not Found' status while trying to GET the OAuth well-known https://10.0.128.138:6443/.well-known/oauth-authorization-server endpoint data"
level=info msg="Cluster operator authentication Available is False with Available: "
level=info msg="Cluster operator console Progressing is True with SyncLoopRefreshProgressingInProgress: SyncLoopRefreshProgressing: Working toward version 4.3.0-0.nightly-2019-11-25-022933"
level=info msg="Cluster operator console Available is False with DeploymentAvailableInsufficientReplicas: DeploymentAvailable: 0 pods available for console deployment"
level=info msg="Cluster operator insights Disabled is False with : "
level=error msg="Cluster operator kube-apiserver Degraded is True with NodeInstallerDegradedInstallerPodFailed: NodeInstallerDegraded: 1 nodes are failing on revision 3:\nNodeInstallerDegraded: "
level=info msg="Cluster operator kube-apiserver Progressing is True with Progressing: Progressing: 1 nodes are at revision 0; 1 nodes are at revision 2; 1 nodes are at revision 3; 0 nodes have achieved new revision 5"
level=error msg="Cluster operator kube-controller-manager Degraded is True with NodeInstallerDegradedInstallerPodFailed: NodeInstallerDegraded: 1 nodes are failing on revision 5:\nNodeInstallerDegraded: "
level=info msg="Cluster operator kube-controller-manager Progressing is True with Progressing: Progressing: 2 nodes are at revision 4; 1 nodes are at revision 6"
level=info msg="Cluster operator kube-scheduler Progressing is True with Progressing: Progressing: 2 nodes are at revision 4; 1 nodes are at revision 5"
level=fatal msg="failed to initialize the cluster: Working towards 4.3.0-0.nightly-2019-11-25-022933: 100% complete"

Comment 1 Abhinav Dahiya 2019-11-25 21:35:32 UTC
> level=error msg="Cluster operator kube-apiserver Degraded is True with NodeInstallerDegradedInstallerPodFailed: NodeInstallerDegraded: 1 nodes are failing on revision 3:\nNodeInstallerDegraded: "
> level=info msg="Cluster operator kube-apiserver Progressing is True with Progressing: Progressing: 1 nodes are at revision 0; 1 nodes are at revision 2; 1 nodes are at revision 3; 0 nodes have achieved new revision 5"
> level=error msg="Cluster operator kube-controller-manager Degraded is True with NodeInstallerDegradedInstallerPodFailed: NodeInstallerDegraded: 1 nodes are failing on revision 5:\nNodeInstallerDegraded: "
> level=info msg="Cluster operator kube-controller-manager Progressing is True with Progressing: Progressing: 2 nodes are at revision 4; 1 nodes are at revision 6"
> level=info msg="Cluster operator kube-scheduler Progressing is True with Progressing: Progressing: 2 nodes are at revision 4; 1 nodes are at revision 5"

seems like problem with one of the nodes.

Comment 2 Ryan Phillips 2019-12-02 18:06:42 UTC
https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-ovn-kubernetes-4.3/49/artifacts/e2e-aws/pods/openshift-etcd_etcd-member-ip-10-0-128-138.ec2.internal_etcd-member.log

etcd started throwing errors of requests taking too long at around ~11:48:00. Likely the install didn't converge because of this reason.

Moving this ticket to the etcd team.

Comment 4 Greg Blomquist 2019-12-11 19:21:20 UTC
Adding dependency on bug #1775878

The timeout issues noted in the logs appear to overlap.  Not positive, but wanted to dry the link.

Comment 6 Lukasz Szaszkiewicz 2020-02-21 09:09:29 UTC
I wanted to see the logs from the other components just to see if they were reporting network errors but the link seems to be broken - https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-ovn-kubernetes-4.3/49/


Note You need to log in before you can comment on or make changes to this bug.