Description of problem: - When curl command is executed to check services health with "retries: 120 and delay: 1", it looks like it takes 120 sec at a maximum until it gives up. However, in case that peer does not reply and makes timeout due to some issue, curl takes around 1 min (=connection timeout) to fail, so 120(sec) * 1(min) = 2 hours in total. It takes too long until users wait for the end of ansible. Version-Release number of the following components: ansible-2.4.2.0-2.el7.noarch Tue Feb 27 19:56:35 2018 openshift-ansible-3.6.173.0.96-1.git.0.2954b4a.el7.noarch Tue Feb 27 19:56:44 2018 openshift-ansible-callback-plugins-3.6.173.0.96-1.git.0.2954b4a.el7.noarch Tue Feb 27 19:56:41 2018 openshift-ansible-docs-3.6.173.0.96-1.git.0.2954b4a.el7.noarch Tue Feb 27 19:56:41 2018 openshift-ansible-filter-plugins-3.6.173.0.96-1.git.0.2954b4a.el7.noarch Tue Feb 27 19:56:41 2018 openshift-ansible-lookup-plugins-3.6.173.0.96-1.git.0.2954b4a.el7.noarch Tue Feb 27 19:56:44 2018 openshift-ansible-playbooks-3.6.173.0.96-1.git.0.2954b4a.el7.noarch Tue Feb 27 19:56:44 2018 openshift-ansible-roles-3.6.173.0.96-1.git.0.2954b4a.el7.noarch Tue Feb 27 19:56:43 2018 Steps to Reproduce: 1. It depends on how it fails. For example, when curl to wrong peer due to wrong DNS configuration and peer did not reply quickly, it happens. Actual results: - TASK [openshift_master : Wait for API to become available] takes more than 2 hours. Expected results: - Stop verifying curl API to verify in several minutes. Additional info: - Proposal patch https://github.com/openshift/openshift-ansible/pull/7339
Verified this bug with openshift-ansible-3.7.46-1.git.0.37f607e.el7.noarch, and PASS. Installation log: TASK [openshift_master : Wait for API to become available] ********************* Tuesday 08 May 2018 04:20:26 -0400 (0:00:00.055) 0:06:28.763 *********** FAILED - RETRYING: Wait for API to become available (120 retries left). FAILED - RETRYING: Wait for API to become available (119 retries left). [WARNING]: Consider using get_url or uri module rather than running curl ok: [qe-jialiu37-master-etcd-1.0508-g15.qe.rhcloud.com] => {"attempts": 3, "changed": false, "cmd": ["curl", "--silent", "--tlsv1.2", "--max-time", "2", "--cacert", "/etc/origin/master/ca-bundle.crt", "https://qe-jialiu37-master-etcd-1:8443/healthz/ready"], "delta": "0:00:00.122739", "end": "2018-05-08 04:22:20.578836", "failed": false, "rc": 0, "start": "2018-05-08 04:22:20.456097", "stderr": "", "stderr_lines": [], "stdout": "ok", "stdout_lines": ["ok"]} The "--max-time 2" is shown in curl command line.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1576