Description of problem: Job launching failed after executing the Service catalog based on Ansible playbook after restoring the database Version-Release number of selected component (if applicable): 5.8.1.5 How reproducible: at customer environment. Steps to Reproduce: 1. After restoring the cfme appliance, the service catalog based on Ansible playbook failed with below exception at "check_completed" stage: [----] I, [2018-04-20T06:14:32.204298 #2923:53cdd4] INFO -- : Q-task_id([service_template_provision_task_1000000000064]) <AEMethod check_completed> Starting check_completed [----] E, [2018-04-20T06:14:32.348547 #2923:53cdd4] ERROR -- : Q-task_id([service_template_provision_task_1000000000064]) <AEMethod check_completed> Error in check completed: Job launching failed [----] I, [2018-04-20T06:14:32.374230 #2923:53cdd4] INFO -- : Q-task_id([service_template_provision_task_1000000000064]) <AEMethod check_completed> Ending check_completed [----] I, [2018-04-20T06:14:32.392113 #2923:54b140] INFO -- : Q-task_id([service_template_provision_task_1000000000064]) <AEMethod [/ManageIQ/Service/Generic/StateMachines/GenericLifecycle/check_completed]> Ending We have checked the git repositories path, credentials,repositories, playbooks all are seems to be in sync with the original environment but still the catalog is getting failed after ordering the service. Actual results: Getting "job launching failed" exception after ordering the service. Expected results: Should execute without any exception. Additional info: After restoring the environment, the "Embedded Ansible" role was not enabled by default, manually we enabled the role. Not sure if this is intended behavior
Hi Neha, Embedded ansible uses the config/database.yml setting for the host of the awx database. If you're backing up the appliance that is also the database, you need to backup and then restore the awx database along with the vmdb_production database. Also, if you want to replace the appliance but also keep the original appliance's identity (and the assigned roles such as embedded ansible), you need to backup and restore the GUID file in /var/www/miq/vmdb as mentioned in 1.2. Backing Up Current Appliances in https://access.redhat.com/documentation/en-us/red_hat_cloudforms/4.6/html/migrating_to_red_hat_cloudforms_4.6/index#backup_45-46
Neha, I just realized I provided the 4.6 documentation. Here is the 4.5 doc showing how to do a binary backup which will also backup the awx database: https://access.redhat.com/documentation/en-us/red_hat_cloudforms/4.5/html/general_configuration/configuration#binary-backup-and-restore-database In 4.6, we recommend using pg_basebackup if you're using embedded ansible since it will "Preserve data at the file system level by performing a binary backup. This includes all databases, users and roles, and other objects." In addition, you should backup/restore the appliance identity, GUID and REGION, as described here (otherwise you won't inherit the existing server roles): https://access.redhat.com/documentation/en- us/red_hat_cloudforms/4.5/html/migrating_to_red_hat_cloudforms_4.5/index#backup_42-45 In 4.5, our internal backup/restore still uses pg_dump/pg_restore so you will also need to manually backup/restore the awx database or use pg_basebackup to do everything. Note, our internal backup/restore was moved to use pg_basebackup in 4.6, see this: https://bugzilla.redhat.com/show_bug.cgi?id=1495192
Thanks Joe for the update. I have updated the case with all the required BZ updates and currently we are waiting for his response after implementing the same. Will update the BZ once I ll get the confirmation from the customer and will move to BZ closure if everything goes well. Regards, Neha Chugh
Hi Neha, Do you have an update on this case? Were they able to test restoring of a CFME database appliance?
Hello Joe, I got an update from customer and they are still checking the configurations at their end. I am still waiting for their confirmation on the same. I guess we can wait for day or so for their confirmation, else we can close this BZ, will reopened if required. Thank you once again for all your inputs. Regards, Neha Chugh
I'll close this bug since we've not received a response. If it's still an issue, we can reopen.
Hello Joe, I got an update from customer and they have tried with both logical and binary backup but somehow logical backup didn't work for them but binary do so that are fine with that. The customer has encountered one more issue i.e. ~~~ If you go to services-> my services in CloudForms in the restore environment and select one service from the active services and go to the provisioning tab, I don't see the standard output produced by an Ansible execution that happened before the backup. While this ouput is present in the original cloudforms instance where the backup was taken from. It shows the error message: "Standard Output stdout capture is missing". Any idea why that is happening? ~~~ Do we have the solution for "stdout capture is missing" exception. I found some solution for the same but for ansible tower but I am not sure for Embedded Ansible part. Can you please suggest the solution for the same. Thanks, Neha Chugh
Standard out for ansible is stored on the filesystem, not in the database. So a database backup and restore is not expected to also include the stdout for previously run jobs. Can you post the solution you found for Ansible Tower? It's likely that the same will apply here.
Also, this is a very different issue that the original problem. I'm going to close this again as the original issue was addressed. If the stdout issue is something we want to address in CF then please open another bug.