1451650 – Embedded ansible takes an extended amount of time to enable after upgrading from 5.6 to 5.8

Bug 1451650 - Embedded ansible takes an extended amount of time to enable after upgrading from 5.6 to 5.8

Summary: Embedded ansible takes an extended amount of time to enable after upgrading f...

Keywords:
Status:	CLOSED DUPLICATE of bug 1455063
Alias:	None
Product:	Red Hat CloudForms Management Engine
Classification:	Red Hat
Component:	Appliance
Sub Component:
Version:	5.8.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	GA
Target Release:	5.8.1
Assignee:	Nick Carboni
QA Contact:	luke couzens
Docs Contact:
URL:
Whiteboard:	ansible_embed:black:migration
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-05-17 08:45 UTC by luke couzens
Modified:	2017-05-25 13:33 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-05-25 13:33:01 UTC
Category:	---
Cloudforms Team:	---
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description luke couzens 2017-05-17 08:45:30 UTC

Description of problem:Embedded ansible takes an extended amount of time to enable after upgrading from 5.6 to 5.8


Version-Release number of selected component (if applicable):5.8.0.14


How reproducible:100%


Steps to Reproduce:
1.provision 5.6 appliance
2.add 5.8 repos
3.upgrade appliance [0]
4.enable embedded ansible

Actual results:Embedded ansible takes around 30mins to finally be active, it also fills the notifications every 2 mins with 'The role Embedded Ansible has been activated on server EVM' until it is successfully up and running.


Expected results:Embeddedd ansible enabled and one notification with active status


Additional info:
It seems the installation runs multiple times hence the notifications, this continues until we get a 502 error, then services get shutdown and installation re-runs to which point everything starts up correctly.

[0] https://docs.google.com/document/d/1MKHS9y9Bxdx7I3kOJGX_etVYWvuHItoBBNK0VhIfZiQ/edit?ts=58cf7d25#

Comment 2 Nick Carboni 2017-05-24 20:26:43 UTC

It was determined that this was being caused by the setup script exiting while a restart for the services making up tower was still pending.

This caused cfme to start issuing requests to a server in an intermediate state which happened to be valid enough to process some of those requests.

When the supervisor restart took effect the services came down and no more requests were possible through the previously valid endpoint.

To fix this we can run the setup playbook once for configuring the installation, but then subsequent restarts should be done by starting or stopping the services directly.

This will fix the issue by eliminating the chance that the setup playbook will "queue" a supervisord restart when we think the services should be running.

This leaves open the possibility that the worker *could* fail the first time through (when it runs the playbook to configure everything), but would not fail the second time as everything would be configured and it would start the services normally (using systemd).

Comment 3 CFME Bot 2017-05-24 22:36:36 UTC

https://github.com/ManageIQ/manageiq/pull/15225

Comment 4 Nick Carboni 2017-05-25 13:33:01 UTC

Marking this as a duplicate of bug 1455063 as that's the one people started to attach doc text to.

*** This bug has been marked as a duplicate of bug 1455063 ***

Note You need to log in before you can comment on or make changes to this bug.