Bug 1556679 - ansible installer does not stop for long time when it waits for timeout of curl verifying
Summary: ansible installer does not stop for long time when it waits for timeout of cu...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 3.7.z
Assignee: Scott Dodson
QA Contact: Johnny Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-15 03:28 UTC by Kenjiro Nakayama
Modified: 2018-05-18 03:55 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
When verifying various API endpoints a timeout was not set. This meant that the verification would wait 120 seconds for the connection to fail and then wait the prescribed delay before retrying. This would've led to certain tasks waiting up to two hours rather than moving forward with the installation. This has been corrected and now a connection timeout of two seconds has been set.
Clone Of:
Environment:
Last Closed: 2018-05-18 03:54:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:1576 0 None None None 2018-05-18 03:55:23 UTC

Description Kenjiro Nakayama 2018-03-15 03:28:05 UTC
Description of problem:

- When curl command is executed to check services health with "retries: 120 and delay: 1", it looks like it takes 120 sec at a maximum until it gives up. However, in case that peer does not reply and makes timeout due to some issue, curl takes around 1 min (=connection timeout) to fail, so 120(sec) * 1(min) = 2 hours in total. It takes too long until users wait for the end of ansible.

Version-Release number of the following components:

  ansible-2.4.2.0-2.el7.noarch                                Tue Feb 27 19:56:35 2018
  openshift-ansible-3.6.173.0.96-1.git.0.2954b4a.el7.noarch   Tue Feb 27 19:56:44 2018
  openshift-ansible-callback-plugins-3.6.173.0.96-1.git.0.2954b4a.el7.noarch Tue Feb 27 19:56:41 2018
  openshift-ansible-docs-3.6.173.0.96-1.git.0.2954b4a.el7.noarch Tue Feb 27 19:56:41 2018
  openshift-ansible-filter-plugins-3.6.173.0.96-1.git.0.2954b4a.el7.noarch Tue Feb 27 19:56:41 2018
  openshift-ansible-lookup-plugins-3.6.173.0.96-1.git.0.2954b4a.el7.noarch Tue Feb 27 19:56:44 2018
  openshift-ansible-playbooks-3.6.173.0.96-1.git.0.2954b4a.el7.noarch Tue Feb 27 19:56:44 2018
  openshift-ansible-roles-3.6.173.0.96-1.git.0.2954b4a.el7.noarch Tue Feb 27 19:56:43 2018

Steps to Reproduce:
1. It depends on how it fails. For example, when curl to wrong peer due to wrong DNS configuration and peer did not reply quickly, it happens.

Actual results:
- TASK [openshift_master : Wait for API to become available] takes more than 2 hours.

Expected results:
- Stop verifying curl API to verify in several minutes.

Additional info:
- Proposal patch https://github.com/openshift/openshift-ansible/pull/7339

Comment 6 Johnny Liu 2018-05-08 10:18:54 UTC
Verified this bug with openshift-ansible-3.7.46-1.git.0.37f607e.el7.noarch, and PASS.

Installation log:
TASK [openshift_master : Wait for API to become available] *********************
Tuesday 08 May 2018  04:20:26 -0400 (0:00:00.055)       0:06:28.763 *********** 
FAILED - RETRYING: Wait for API to become available (120 retries left).

FAILED - RETRYING: Wait for API to become available (119 retries left).

 [WARNING]: Consider using get_url or uri module rather than running curl

ok: [qe-jialiu37-master-etcd-1.0508-g15.qe.rhcloud.com] => {"attempts": 3, "changed": false, "cmd": ["curl", "--silent", "--tlsv1.2", "--max-time", "2", "--cacert", "/etc/origin/master/ca-bundle.crt", "https://qe-jialiu37-master-etcd-1:8443/healthz/ready"], "delta": "0:00:00.122739", "end": "2018-05-08 04:22:20.578836", "failed": false, "rc": 0, "start": "2018-05-08 04:22:20.456097", "stderr": "", "stderr_lines": [], "stdout": "ok", "stdout_lines": ["ok"]}

The "--max-time 2" is shown in curl command line.

Comment 9 errata-xmlrpc 2018-05-18 03:54:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1576


Note You need to log in before you can comment on or make changes to this bug.