Bug 1410250
Summary: | RHV: Ansible timeout trying to configure engine | ||
---|---|---|---|
Product: | Red Hat Quickstart Cloud Installer | Reporter: | Chandler Wilkerson <cwilkers> |
Component: | Installation - RHEV | Assignee: | jkim |
Status: | CLOSED ERRATA | QA Contact: | Tasos Papaioannou <tpapaioa> |
Severity: | medium | Docs Contact: | Dan Macpherson <dmacpher> |
Priority: | unspecified | ||
Version: | 1.1 | CC: | bthurber, jkim, qci-bugzillas, tpapaioa |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | 1.1 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-02-28 01:43:00 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Chandler Wilkerson
2017-01-04 22:10:52 UTC
Confirmed same bug in Jan 04 compose as well. There's a chance this issue relates to lazy sync and packages not being available when we need them. When working on this also look at BZ 1410140 the retry mentioned there is likely to fix this if the problem is a lazy sync issue. The error I see relates to not being able to ssh into the RHV engine host to kick off the Ansible playbook. Is it possible that with a slower bare-iron boot environment, that the timeouts you have in devel are not long enough? (can these be extended without a negative impact?) It isn't rare or intermittent in my environment, I get this every time I deploy RHV. (also confirmed in the 20170105 ISO) https://github.com/fusor/fusor/pull/1329 Added a retry in calling the trigger_ansible_run() RHV is able to successfully install off the 20170111-7 ISO. Thanks! https://github.com/fusor/fusor/pull/1343 After recreating the setup, the issue was in the distribute_key_to_host method not catching the Errno::ETIMEDOUT exception. The PR was successfully tested on the host which consistently produced the bug. Verified on QCI-1.1-RHEL-7-20170123.t.0. To test, I waited until the RHV engine was running ansible tasks, then rebooted it. The deployment log shows ansible-playbook retrying and completing successfully: **** E, [2017-01-24T17:44:36.020347 #22988] ERROR -- : Error running command: ansible-playbook /usr/share/ansible-ovirt/engine_and_hypervisor.yml [...] TASK [subscription : disable all] ********************************************** fatal: [mac525400c24eaa.example.com]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Shared connection to mac525400c24eaa.example.com closed.\r\n", "unreachable": true} [...] W, [2017-01-24T17:44:36.021442 #22988] WARN -- : Attempt [1 of 30] of the above command FAILED!... Retrying... I, [2017-01-24T18:26:47.238696 #22988] INFO -- : Command: ansible-playbook /usr/share/ansible-ovirt/engine_and_hypervisor.yml [...] I, [2017-01-24T18:26:47.239112 #22988] INFO -- : Status code: 0 **** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:0335 |