Bug 1729525

Summary: Installer timing out while waiting for load balancer to become available
Product: OpenShift Container Platform Reporter: Kevin Rizza <krizza>
Component: InstallerAssignee: Abhinav Dahiya <adahiya>
Installer sub component: openshift-installer QA Contact: Johnny Liu <jialiu>
Status: CLOSED DUPLICATE Docs Contact:
Severity: medium    
Priority: unspecified CC: bparees, kgarriso
Version: 4.2.0   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-07-12 20:53:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Full CI build log none

Description Kevin Rizza 2019-07-12 13:44:26 UTC
Created attachment 1589923 [details]
Full CI build log

Description of problem:
During infrastructure creation, the installer is sometimes timing out waiting for all resources to become available:

Container setup exited with code 1, reason Error
---
Installing from release registry.svc.ci.openshift.org/ocp/release:4.2.0-0.ci-2019-07-12-112847
level=warning msg="Found override for ReleaseImage. Please be warned, this is not advised"
level=info msg="Consuming \"Install Config\" from target directory"
level=info msg="Creating infrastructure resources..."
level=error
level=error msg="Error: timeout while waiting for state to become 'active' (last state: 'provisioning', timeout: 10m0s)"
level=error
level=error msg="  on ../tmp/openshift-install-953301218/vpc/master-elb.tf line 19, in resource \"aws_lb\" \"api_external\":"
level=error msg="  19: resource \"aws_lb\" \"api_external\" {"
level=error
level=error
level=fatal msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed to apply using Terraform"
---
Container test exited with code 1, reason Error


According to Abhinav, this seems to mean that either AWS didn't create the load balancer or rate limiting prevented the installer from discovering the load balancer before the timeout.

How reproducible:

Steps to Reproduce:
1. Run installer
2. See above error log

Comment 1 Kirsten Garrison 2019-07-12 16:26:07 UTC
This is a duplicate of : https://bugzilla.redhat.com/show_bug.cgi?id=1717604

Comment 2 Ben Parees 2019-07-12 20:53:56 UTC

*** This bug has been marked as a duplicate of bug 1717604 ***