Bug 1455063 - Embedded Ansible role cannot be enabled
Summary: Embedded Ansible role cannot be enabled
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Appliance
Version: 5.8.0
Hardware: All
OS: All
urgent
high
Target Milestone: GA
: 5.9.0
Assignee: Nick Carboni
QA Contact: Dmitry Misharov
URL:
Whiteboard: ansible
: 1451650 (view as bug list)
Depends On:
Blocks: 1455618
TreeView+ depends on / blocked
 
Reported: 2017-05-24 08:04 UTC by Dmitry Misharov
Modified: 2018-03-06 15:13 UTC (History)
8 users (show)

Fixed In Version: 5.9.0.1
Doc Type: Known Issue
Doc Text:
At current, there is a potential race condition that can occur when the Embedded Ansible role is enabled for the first time. When the worker is started for the first time, Ansible must be set up and configured on the appliance. As part of this process, the Ansible services are restarted. There is a small chance that these services are still in the process of restarting when the initial setup and configuration completes. When this happens, the worker encounters failures while communicating with Embedded Ansible, causing the worker to exit and restart. It may go through several iterations of this before the worker properly starts and comes online. Due to this issue, it can take up to 30 minutes for Embedded Ansible services to be fully online, resulting in the Embedded Ansible role being active and the worker in the started state, but the services not being available. As a workaround, wait for embedded Ansible to come online. This can take up to approximately 30 minutes. The other option is to restart EVM on the affected appliance.
Clone Of:
: 1455618 (view as bug list)
Environment:
Last Closed: 2018-03-06 15:13:26 UTC
Category: ---
Cloudforms Team: CFME Core
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Dmitry Misharov 2017-05-24 08:04:22 UTC
Description of problem:
Embedded Ansible role failed to enable on just provisioned appliance. After rebooting it works fine.

Version-Release number of selected component (if applicable):
5.8.0.16.20170522163900_28fa952 

How reproducible:
Always

Steps to Reproduce:
1. Navigate to Configuration.
2. Enable Embedded Ansible Role.

Actual results:
There are several notifications that Embedded Ansible role has been enable, but in fact it doesn't work. Also there are errors in the logs.

Expected results:
Embedded Ansible role should be enabled successfully.

Additional info:
[----] E, [2017-05-24T03:57:27.894042 #11931:5017a20] ERROR -- : [AnsibleTowerClient::ClientError]: <html>
<head><title>502 Bad Gateway</title></head>
<body bgcolor="white">
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.10.2</center>
</body>
</html>
  Method:[rescue in do_before_work_loop]
[----] E, [2017-05-24T03:57:27.894181 #11931:5017a20] ERROR -- : /opt/rh/cfme-gemset/gems/ansible_tower_client-0.12.2/lib/ansible_tower_client/middleware/raise_tower_error.rb:23:in `on_complete'
/opt/rh/cfme-gemset/gems/faraday-0.9.2/lib/faraday/response.rb:9:in `block in call'
/opt/rh/cfme-gemset/gems/faraday-0.9.2/lib/faraday/response.rb:57:in `on_complete'
/opt/rh/cfme-gemset/gems/faraday-0.9.2/lib/faraday/response.rb:8:in `call'
/opt/rh/cfme-gemset/gems/faraday-0.9.2/lib/faraday/request/url_encoded.rb:15:in `call'
/opt/rh/cfme-gemset/gems/faraday_middleware-0.10.1/lib/faraday_middleware/response/follow_redirects.rb:76:in `perform_with_redirection'
/opt/rh/cfme-gemset/gems/faraday_middleware-0.10.1/lib/faraday_middleware/response/follow_redirects.rb:82:in `block in perform_with_redirection'
/opt/rh/cfme-gemset/gems/faraday-0.9.2/lib/faraday/response.rb:57:in `on_complete'
/opt/rh/cfme-gemset/gems/faraday_middleware-0.10.1/lib/faraday_middleware/response/follow_redirects.rb:78:in `perform_with_redirection'
/opt/rh/cfme-gemset/gems/faraday_middleware-0.10.1/lib/faraday_middleware/response/follow_redirects.rb:64:in `call'
/opt/rh/cfme-gemset/gems/faraday_middleware-0.10.1/lib/faraday_middleware/request/encode_json.rb:23:in `call'
/opt/rh/cfme-gemset/gems/faraday-0.9.2/lib/faraday/rack_builder.rb:139:in `build_response'
/opt/rh/cfme-gemset/gems/faraday-0.9.2/lib/faraday/connection.rb:377:in `run_request'
/opt/rh/cfme-gemset/gems/faraday-0.9.2/lib/faraday/connection.rb:140:in `get'
/opt/rh/cfme-gemset/gems/ansible_tower_client-0.12.2/lib/ansible_tower_client/api.rb:91:in `method_missing'
/opt/rh/cfme-gemset/gems/ansible_tower_client-0.12.2/lib/ansible_tower_client/collection.rb:50:in `fetch_more_results'
/opt/rh/cfme-gemset/gems/ansible_tower_client-0.12.2/lib/ansible_tower_client/collection.rb:22:in `block (2 levels) in find_all_by_url'
/opt/rh/cfme-gemset/gems/ansible_tower_client-0.12.2/lib/ansible_tower_client/collection.rb:21:in `loop'
/opt/rh/cfme-gemset/gems/ansible_tower_client-0.12.2/lib/ansible_tower_client/collection.rb:21:in `block in find_all_by_url'
/var/www/miq/vmdb/app/models/embedded_ansible_worker/object_management.rb:16:in `each'
/var/www/miq/vmdb/app/models/embedded_ansible_worker/object_management.rb:16:in `each'
/var/www/miq/vmdb/app/models/embedded_ansible_worker/object_management.rb:16:in `remove_demo_data'
/var/www/miq/vmdb/app/models/embedded_ansible_worker/runner.rb:55:in `update_embedded_ansible_provider'
/var/www/miq/vmdb/app/models/embedded_ansible_worker/runner.rb:14:in `do_before_work_loop'
/var/www/miq/vmdb/app/models/embedded_ansible_worker/runner.rb:7:in `prepare'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:129:in `start'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:21:in `start_worker'
/var/www/miq/vmdb/app/models/embedded_ansible_worker.rb:10:in `block in start_runner'

Comment 3 Chris Pelland 2017-05-25 13:08:20 UTC
https://github.com/ManageIQ/manageiq/pull/15225

Comment 4 Nick Carboni 2017-05-25 13:31:51 UTC
From: bug 1451650

It was determined that this was being caused by the setup script exiting while a restart for the services making up tower was still pending.

This caused cfme to start issuing requests to a server in an intermediate state which happened to be valid enough to process some of those requests.

When the supervisor restart took effect the services came down and no more requests were possible through the previously valid endpoint.

To fix this we can run the setup playbook once for configuring the installation, but then subsequent restarts should be done by starting or stopping the services directly.

This will fix the issue by eliminating the chance that the setup playbook will "queue" a supervisord restart when we think the services should be running.

This leaves open the possibility that the worker *could* fail the first time through (when it runs the playbook to configure everything), but would not fail the second time as everything would be configured and it would start the services normally (using systemd).

Comment 5 Nick Carboni 2017-05-25 13:33:01 UTC
*** Bug 1451650 has been marked as a duplicate of this bug. ***

Comment 6 Nick Carboni 2017-05-25 13:35:32 UTC
Resetting the component as this was an issue with enabling the role, not with the provider.

Comment 7 CFME Bot 2017-05-25 14:51:57 UTC
New commit detected on ManageIQ/manageiq/master:
https://github.com/ManageIQ/manageiq/commit/060a0c4998c32ac4a85cb5b593a4f278bd6cfc85

commit 060a0c4998c32ac4a85cb5b593a4f278bd6cfc85
Author:     Nick Carboni <ncarboni>
AuthorDate: Wed May 24 18:17:34 2017 -0400
Commit:     Nick Carboni <ncarboni>
CommitDate: Thu May 25 09:32:52 2017 -0400

    Only run the setup playbook the first time we start
    
    It needs to run once to put files in place and such, but
    it restarts services in a way that could cause us to operate on
    a running stack only to have it restart from under us.
    
    This change makes us only run the setup playbook once so the
    chance of hitting this kind of issue should be much smaller.
    
    This effectively combines the .configure and .start methods
    so that we detect when we are in the first configuration state
    vs just starting up the services.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1455063

 lib/embedded_ansible.rb           | 29 ++++++-------
 spec/lib/embedded_ansible_spec.rb | 85 +++++++++++++++------------------------
 2 files changed, 46 insertions(+), 68 deletions(-)

Comment 8 CFME Bot 2017-05-25 14:52:03 UTC
New commit detected on ManageIQ/manageiq/master:
https://github.com/ManageIQ/manageiq/commit/0ec1271fa9e3556e068a553accf7bf071422b030

commit 0ec1271fa9e3556e068a553accf7bf071422b030
Author:     Nick Carboni <ncarboni>
AuthorDate: Wed May 24 18:21:49 2017 -0400
Commit:     Nick Carboni <ncarboni>
CommitDate: Thu May 25 09:34:24 2017 -0400

    Don't call the .configure method as it was removed
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1455063

 app/models/embedded_ansible_worker/runner.rb       |  3 ---
 spec/models/embedded_ansible_worker/runner_spec.rb | 20 --------------------
 2 files changed, 23 deletions(-)

Comment 10 Andrew Dahms 2017-05-25 22:46:37 UTC
Moving the 'requires_doc_text' flag to '-' for now based on a discussion with Chris Pelland.

Comment 11 Dmitry Misharov 2017-10-16 09:03:32 UTC
Verified in 5.9.0.2.20171010190026_0413a06. Embedded ansible role starts successfully.


Note You need to log in before you can comment on or make changes to this bug.