Bug 1410250 - RHV: Ansible timeout trying to configure engine
Summary: RHV: Ansible timeout trying to configure engine
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Quickstart Cloud Installer
Classification: Red Hat
Component: Installation - RHEV
Version: 1.1
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 1.1
Assignee: jkim
QA Contact: Tasos Papaioannou
Dan Macpherson
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-04 22:10 UTC by Chandler Wilkerson
Modified: 2017-02-28 01:43 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-02-28 01:43:00 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:0335 0 normal SHIPPED_LIVE Red Hat Quickstart Installer 1.1 2017-02-28 06:36:13 UTC

Description Chandler Wilkerson 2017-01-04 22:10:52 UTC
Description of problem:

RHV installation errors at 80% with an ansible timeout trying to reach the engine during the Actions::Fusor::Deployment::Rhev::TriggerAnsibleRun step.

The step is resume-able, and will reach successful installation if resumed once the engine is available.

Version-Release number of selected component (if applicable):
QCI-1.1-RHEL-7-20161215.t.0 iso

How reproducible:
Twice in my tests, using physical Dell blades

Additional info:

Comment 2 Chandler Wilkerson 2017-01-05 16:27:28 UTC
Confirmed same bug in Jan 04 compose as well.

Comment 3 John Matthews 2017-01-06 21:17:40 UTC
There's a chance this issue relates to lazy sync and packages not being available when we need them.

When working on this also look at BZ 1410140 the retry mentioned there is likely to fix this if the problem is a lazy sync issue.

Comment 4 Chandler Wilkerson 2017-01-06 21:51:53 UTC
The error I see relates to not being able to ssh into the RHV engine host to kick off the Ansible playbook. Is it possible that with a slower bare-iron boot environment, that the timeouts you have in devel are not long enough? (can these be extended without a negative impact?)

It isn't rare or intermittent in my environment, I get this every time I deploy RHV. (also confirmed in the 20170105 ISO)

Comment 5 jkim 2017-01-10 21:07:28 UTC
https://github.com/fusor/fusor/pull/1329

Added a retry in calling the trigger_ansible_run()

Comment 7 Chandler Wilkerson 2017-01-12 03:49:54 UTC
RHV is able to successfully install off the 20170111-7 ISO. Thanks!

Comment 8 jkim 2017-01-13 22:45:25 UTC
https://github.com/fusor/fusor/pull/1343

After recreating the setup, the issue was in the distribute_key_to_host method not catching the Errno::ETIMEDOUT exception.  The PR was successfully tested on the host which consistently produced the bug.

Comment 11 Tasos Papaioannou 2017-01-25 15:20:32 UTC
Verified on QCI-1.1-RHEL-7-20170123.t.0.

To test, I waited until the RHV engine was running ansible tasks, then rebooted it. The deployment log shows ansible-playbook retrying and completing successfully:

****
E, [2017-01-24T17:44:36.020347 #22988] ERROR -- : Error running command: ansible-playbook /usr/share/ansible-ovirt/engine_and_hypervisor.yml 
[...]
TASK [subscription : disable all] **********************************************
fatal: [mac525400c24eaa.example.com]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Shared connection to mac525400c24eaa.example.com closed.\r\n", "unreachable": true}
[...]
W, [2017-01-24T17:44:36.021442 #22988]  WARN -- : Attempt [1 of 30] of the above command FAILED!... Retrying...
I, [2017-01-24T18:26:47.238696 #22988]  INFO -- : Command: ansible-playbook /usr/share/ansible-ovirt/engine_and_hypervisor.yml 
[...]
I, [2017-01-24T18:26:47.239112 #22988]  INFO -- : Status code: 0
****

Comment 13 errata-xmlrpc 2017-02-28 01:43:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:0335


Note You need to log in before you can comment on or make changes to this bug.