1571278 – Ansible job failed after restoring the database

Bug 1571278 - Ansible job failed after restoring the database

Summary: Ansible job failed after restoring the database

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat CloudForms Management Engine
Classification:	Red Hat
Component:	Appliance
Sub Component:
Version:	5.9.0
Hardware:	All
OS:	Linux
Priority:	high
Severity:	medium
Target Milestone:	GA
Target Release:	5.9.3
Assignee:	Joe Rafaniello
QA Contact:	Dave Johnson
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1561041
TreeView+	depends on / blocked

Reported:	2018-04-24 12:38 UTC by Neha Chugh
Modified:	2021-09-09 13:50 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-05-18 15:32:58 UTC
Category:	---
Cloudforms Team:	---
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Neha Chugh 2018-04-24 12:38:45 UTC

Description of problem:
Job launching failed after executing the Service catalog based on Ansible playbook after restoring the database

Version-Release number of selected component (if applicable):
5.8.1.5

How reproducible:
at customer environment.

Steps to Reproduce:

1. After restoring the cfme appliance, the service catalog based on Ansible playbook failed with below exception at "check_completed" stage:

[----] I, [2018-04-20T06:14:32.204298 #2923:53cdd4]  INFO -- : Q-task_id([service_template_provision_task_1000000000064]) <AEMethod check_completed> Starting check_completed
[----] E, [2018-04-20T06:14:32.348547 #2923:53cdd4] ERROR -- : Q-task_id([service_template_provision_task_1000000000064]) <AEMethod check_completed> Error in check completed: Job launching failed
[----] I, [2018-04-20T06:14:32.374230 #2923:53cdd4]  INFO -- : Q-task_id([service_template_provision_task_1000000000064]) <AEMethod check_completed> Ending check_completed
[----] I, [2018-04-20T06:14:32.392113 #2923:54b140]  INFO -- : Q-task_id([service_template_provision_task_1000000000064]) <AEMethod [/ManageIQ/Service/Generic/StateMachines/GenericLifecycle/check_completed]> Ending

We have checked the git repositories path, credentials,repositories, playbooks all are seems to be in sync with the original environment but still the catalog is getting failed after ordering the service.



Actual results:

Getting "job launching failed" exception after ordering the service.

Expected results:
Should execute without any exception.

Additional info:
After restoring the environment, the "Embedded Ansible" role was not enabled by default, manually we enabled the role. Not sure if this is intended behavior

Comment 3 Joe Rafaniello 2018-04-24 16:25:42 UTC

Hi Neha,

Embedded ansible uses the config/database.yml setting for the host of the awx database.  If you're backing up the appliance that is also the database, you need to backup and then restore the awx database along with the vmdb_production database.

Also, if you want to replace the appliance but also keep the original appliance's identity (and the assigned roles such as embedded ansible), you need to backup and restore the GUID file in /var/www/miq/vmdb as mentioned in 1.2. Backing Up Current Appliances in https://access.redhat.com/documentation/en-us/red_hat_cloudforms/4.6/html/migrating_to_red_hat_cloudforms_4.6/index#backup_45-46

Comment 4 Joe Rafaniello 2018-04-25 18:20:26 UTC

Neha, I just realized I provided the 4.6 documentation.  Here is the 4.5 doc 
showing how to do a binary backup which will also backup the awx database:

https://access.redhat.com/documentation/en-us/red_hat_cloudforms/4.5/html/general_configuration/configuration#binary-backup-and-restore-database

In 4.6, we recommend using pg_basebackup if you're using embedded ansible since  it will "Preserve data at the file system level by performing a binary backup. This includes all databases, users and roles, and other objects."

In addition, you should backup/restore the appliance identity, GUID and REGION, as described here (otherwise you won't inherit the existing server roles):

https://access.redhat.com/documentation/en-
us/red_hat_cloudforms/4.5/html/migrating_to_red_hat_cloudforms_4.5/index#backup_42-45

In 4.5, our internal backup/restore still uses pg_dump/pg_restore so you will also need to manually backup/restore the awx database or use pg_basebackup to do everything.

Note, our internal backup/restore was moved to use pg_basebackup in 4.6, see this:
https://bugzilla.redhat.com/show_bug.cgi?id=1495192

Comment 5 Neha Chugh 2018-04-26 07:10:05 UTC

Thanks Joe for the update. I have updated the case with all the required BZ updates and currently we are waiting for his response after implementing the same.

Will update the BZ once I ll get the confirmation from the customer and will move to BZ closure if everything goes well.

Regards,
Neha Chugh

Comment 6 Joe Rafaniello 2018-05-03 18:18:04 UTC

Hi Neha,

Do you have an update on this case? Were they able to test restoring of a CFME database appliance?

Comment 7 Neha Chugh 2018-05-04 03:59:02 UTC

Hello Joe,

I got an update from customer and they are still checking the configurations at their end. I am still waiting for their confirmation on the same.

I guess we can wait for day or so for their confirmation, else we can close this BZ, will reopened if required.

Thank you once again for all your inputs.

Regards,
Neha Chugh

Comment 8 Joe Rafaniello 2018-05-09 21:49:41 UTC

I'll close this bug since we've not received a response.  If it's still an issue, we can reopen.

Comment 9 Neha Chugh 2018-05-18 08:08:21 UTC

Hello Joe,

I got an update from customer and they have tried with both logical and binary backup but somehow logical backup didn't work for them but binary do so that are fine with that.

The customer has encountered one more issue  i.e.

~~~

 If you go to services-> my services in CloudForms in the restore environment and select one service from the active services and go to the provisioning tab, I don't see the standard output produced by an Ansible execution that happened before the backup. While this ouput is present in the original cloudforms instance where the backup was taken from.  It shows the error message: "Standard Output stdout capture is missing". Any idea why that is happening? 


~~~

Do we have the solution for "stdout capture is missing" exception. I found some solution for the same but for ansible tower but I am not sure for Embedded Ansible part. 

Can you please suggest the solution for the same.

Thanks,
Neha Chugh

Comment 10 Nick Carboni 2018-05-18 15:31:01 UTC

Standard out for ansible is stored on the filesystem, not in the database. So a database backup and restore is not expected to also include the stdout for previously run jobs.

Can you post the solution you found for Ansible Tower? It's likely that the same will apply here.

Comment 11 Nick Carboni 2018-05-18 15:32:58 UTC

Also, this is a very different issue that the original problem. I'm going to close this again as the original issue was addressed. If the stdout issue is something we want to address in CF then please open another bug.

Note You need to log in before you can comment on or make changes to this bug.