Bug 1571381 - Install fails on FAILED - RETRYING: Wait for API to become available
Summary: Install fails on FAILED - RETRYING: Wait for API to become available
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.9.z
Assignee: Scott Dodson
QA Contact: Johnny Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-24 16:10 UTC by Steven Walter
Modified: 2019-09-19 06:59 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-05 14:06:30 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Steven Walter 2018-04-24 16:10:00 UTC
Description of problem:
TASK [openshift_master : Wait for API to become available]

*FAILED - RETRYING: Wait for API to become available (xx retries left).*

However, re-running the playbook allows it to continue.

Version-Release number of the following components:
openshift-ansible-3.7.42-1.git.2.9ee4e71.el7.noarch
Tried with:

ansible 2.5.0-2, 2.4.1.0, and 2.3.1.0


How reproducible:
Unconfirmed


Actual results:


TASK [openshift_master : Wait for API to become available]
************************************************************
************************************************************
*********************

*FAILED - RETRYING: Wait for API to become available (120 retries left).*
. . .
*FAILED - RETRYING: Wait for API to become available (1 retries left).*

* [WARNING]: Consider using get_url or uri module rather than running curl*

fatal: [ec2-54-242-126-28.compute-1.amazonaws.com]: FAILED! => {
    "attempts": 120,
    "changed": false,
    "cmd": [
        "curl",
        "--silent",
        "--tlsv1.2",
        "--cacert",
        "/etc/origin/master/ca-bundle.crt",
        "https://console.sandbox.example.com/healthz/ready"
    ],
    "delta": "0:00:00.104755",
    "end": "2018-03-29 22:11:12.608324",
    "rc": 35,
    "start": "2018-03-29 22:11:12.503569"
}
MSG:
non-zero return code



Expected results:
Install completes

Additional info:
I know this is a somewhat common error, however the common explanations I've found for the issue do not apply. And, it is suspicious given that the second runthrough passes this step normally and install completes.

I will upload logs etc

Comment 3 Scott Dodson 2018-04-24 20:02:46 UTC
How long does the load balancer take to insert hosts into the pool? I assume

Comment 4 Steven Walter 2018-04-25 21:46:48 UTC
I don't understand the question -- was "I assume" cut off?

Comment 5 Scott Dodson 2018-04-26 12:51:13 UTC
Yes, sorry about that.

I'm assuming that clusterHostname = 'console.sandbox.example.com' ? And that it's some sort of load balancer. Can you verify that the load balancer is routing traffic as expected to the hosts at the time of the failure? The fact that simply re-running things fixes it leads me to believe that there's a delay in the load balancer routing traffic to the API servers.

Comment 6 Madhusudan Upadhyay 2018-05-14 10:24:14 UTC
Hi,
A little input from the customers end:

I tried to hit the loadbalancer host name. I'm getting nothing back at the current stage. 

[ec2-user@ip-xx-xx-10-xxx ~]$ curl -k http://console.x.sandbox.x.com/
curl: (7) Failed connect to console.x.sandbox.x.com:80; Connection refused


Please let me know if more information is required for the same.

Comment 7 Scott Dodson 2018-05-14 12:35:56 UTC
(In reply to Madhusudan Upadhyay from comment #6)
> Hi,
> A little input from the customers end:
> 
> I tried to hit the loadbalancer host name. I'm getting nothing back at the
> current stage. 
> 
> [ec2-user@ip-xx-xx-10-xxx ~]$ curl -k http://console.x.sandbox.x.com/
> curl: (7) Failed connect to console.x.sandbox.x.com:80; Connection refused
> 
> 
> Please let me know if more information is required for the same.

Then this sounds as if we need to debug why their load balancer isn't routing traffic to the api servers. Can you please work on that with the customer? BTW, that curl is against http not https, where as the ansible scripts are checking https so that may be the reason. Anyway, we need to understand where this is failing. Can they curl the api servers directly?


Note You need to log in before you can comment on or make changes to this bug.