Bug 1455063
| Summary: | Embedded Ansible role cannot be enabled | |||
|---|---|---|---|---|
| Product: | Red Hat CloudForms Management Engine | Reporter: | Dmitry Misharov <dmisharo> | |
| Component: | Appliance | Assignee: | Nick Carboni <ncarboni> | |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Dmitry Misharov <dmisharo> | |
| Severity: | high | Docs Contact: | ||
| Priority: | urgent | |||
| Version: | 5.8.0 | CC: | abellott, adahms, cpelland, jfrey, jhardy, lcouzens, obarenbo, simaishi | |
| Target Milestone: | GA | Keywords: | TestOnly | |
| Target Release: | 5.9.0 | |||
| Hardware: | All | |||
| OS: | All | |||
| Whiteboard: | ansible | |||
| Fixed In Version: | 5.9.0.1 | Doc Type: | Known Issue | |
| Doc Text: |
At current, there is a potential race condition that can occur when the Embedded Ansible role is enabled for the first time. When the worker is started for the first time, Ansible must be set up and configured on the appliance. As part of this process, the Ansible services are restarted. There is a small chance that these services are still in the process of restarting when the initial setup and configuration completes. When this happens, the worker encounters failures while communicating with Embedded Ansible, causing the worker to exit and restart. It may go through several iterations of this before the worker properly starts and comes online. Due to this issue, it can take up to 30 minutes for Embedded Ansible services to be fully online, resulting in the Embedded Ansible role being active and the worker in the started state, but the services not being available.
As a workaround, wait for embedded Ansible to come online. This can take up to approximately 30 minutes. The other option is to restart EVM on the affected appliance.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1455618 (view as bug list) | Environment: | ||
| Last Closed: | 2018-03-06 15:13:26 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | CFME Core | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1455618 | |||
|
Description
Dmitry Misharov
2017-05-24 08:04:22 UTC
From: bug 1451650 It was determined that this was being caused by the setup script exiting while a restart for the services making up tower was still pending. This caused cfme to start issuing requests to a server in an intermediate state which happened to be valid enough to process some of those requests. When the supervisor restart took effect the services came down and no more requests were possible through the previously valid endpoint. To fix this we can run the setup playbook once for configuring the installation, but then subsequent restarts should be done by starting or stopping the services directly. This will fix the issue by eliminating the chance that the setup playbook will "queue" a supervisord restart when we think the services should be running. This leaves open the possibility that the worker *could* fail the first time through (when it runs the playbook to configure everything), but would not fail the second time as everything would be configured and it would start the services normally (using systemd). *** Bug 1451650 has been marked as a duplicate of this bug. *** Resetting the component as this was an issue with enabling the role, not with the provider. New commit detected on ManageIQ/manageiq/master: https://github.com/ManageIQ/manageiq/commit/060a0c4998c32ac4a85cb5b593a4f278bd6cfc85 commit 060a0c4998c32ac4a85cb5b593a4f278bd6cfc85 Author: Nick Carboni <ncarboni> AuthorDate: Wed May 24 18:17:34 2017 -0400 Commit: Nick Carboni <ncarboni> CommitDate: Thu May 25 09:32:52 2017 -0400 Only run the setup playbook the first time we start It needs to run once to put files in place and such, but it restarts services in a way that could cause us to operate on a running stack only to have it restart from under us. This change makes us only run the setup playbook once so the chance of hitting this kind of issue should be much smaller. This effectively combines the .configure and .start methods so that we detect when we are in the first configuration state vs just starting up the services. https://bugzilla.redhat.com/show_bug.cgi?id=1455063 lib/embedded_ansible.rb | 29 ++++++------- spec/lib/embedded_ansible_spec.rb | 85 +++++++++++++++------------------------ 2 files changed, 46 insertions(+), 68 deletions(-) New commit detected on ManageIQ/manageiq/master: https://github.com/ManageIQ/manageiq/commit/0ec1271fa9e3556e068a553accf7bf071422b030 commit 0ec1271fa9e3556e068a553accf7bf071422b030 Author: Nick Carboni <ncarboni> AuthorDate: Wed May 24 18:21:49 2017 -0400 Commit: Nick Carboni <ncarboni> CommitDate: Thu May 25 09:34:24 2017 -0400 Don't call the .configure method as it was removed https://bugzilla.redhat.com/show_bug.cgi?id=1455063 app/models/embedded_ansible_worker/runner.rb | 3 --- spec/models/embedded_ansible_worker/runner_spec.rb | 20 -------------------- 2 files changed, 23 deletions(-) Moving the 'requires_doc_text' flag to '-' for now based on a discussion with Chris Pelland. Verified in 5.9.0.2.20171010190026_0413a06. Embedded ansible role starts successfully. |