Bug 1809502

Summary: Embedded Ansible playbooks do not run (regression)
Product: Red Hat CloudForms Management Engine Reporter: Peter McGowan <pmcgowan>
Component: Embedded AnsibleAssignee: Nick LaMuro <nlamuro>
Status: CLOSED NOTABUG QA Contact: Sudhir Mallamprabhakara <smallamp>
Severity: high Docs Contact: Red Hat CloudForms Documentation <cloudforms-docs>
Priority: high    
Version: 5.11.2CC: dmetzger, gmccullo, lufu, mkanoor, obarenbo, tfitzger
Target Milestone: GA   
Target Release: 5.11.5   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-04 14:13:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: CFME Core Target Upstream Version:
Embargoed:

Description Peter McGowan 2020-03-03 09:32:18 UTC
Description of problem:
Embedded Ansible playbook services or methods do not run, but timeout with an error such as the following:

MIQ(ManageIQ::Providers::AnsiblePlaybookWorkflow#process_abort) job aborting, job timed out after 319.623096573 seconds of inactivity.  Inactivity threshold [300 seconds]

This is a regression over CFME 5.11.1 where playbook services and methods used to run successfully

Version-Release number of selected component (if applicable):
5.11.2.2

How reproducible:
Every time


Steps to Reproduce:
1. Enable embedded Ansible on a CFME 5.11.2.2 appliance. Add a repository containing a simple playbook (such as https://github.com/pemcg/ansible_playbooks/blob/master/listvars.yml)
2. Create an embedded Ansible playbook service to run the simple playbook. Use the CFME Default Credential and run the service on localhost

Actual results:
The service template provision task is created but does not run. From evm.log it looks like the job is launched as intended:

[----] I, [2020-03-03T08:54:22.659452 #7662:2b0adc2d45bc]  INFO -- : Q-task_id([r134_service_template_provision_task_161]) MIQ(ServiceAnsiblePlaybook#launch_ansible_job) Launching Ansible job with options:
[----] I, [2020-03-03T08:54:22.660097 #7662:2b0adc2d45bc]  INFO -- : Q-task_id([r134_service_template_provision_task_161])
---
become_enabled: false
execution_ttl: '5'
extra_vars:
  manageiq:
    service: services/74
    action: Provision
    api_url: https://10.19.2.80
    api_token: [FILTERED]
    user: users/1
    group: groups/2
    X_MIQ_Group: EvmGroup-super_administrator
    request_task: requests/134/request_tasks/161
    request: requests/134
  manageiq_connection:
    url: https://10.19.2.80
    token: [FILTERED]
    X_MIQ_Group: EvmGroup-super_administrator
verbosity: '1'
credential: 7
hosts:
- localhost

The playbook is not executed however and the job times out.

Expected results:
The embedded Ansible playbook should run successfully.

Additional info:

Comment 2 Lucy Fu 2020-03-03 15:47:10 UTC
Recreated the issue on 10.8.99.115.
Ansible runner workflow did not run the playbook and returned the result.
Forward BZ to embedded ansible team for debugging.

Comment 3 Lucy Fu 2020-03-03 16:06:42 UTC
Playbook service works well with 5.11.2.1 but failed with 5.11.2.2.

Comment 6 Lucy Fu 2020-03-03 23:37:48 UTC
There is a file called cmdline which ansible runner creates with the following content:
  --become --ask-become-pass --user root --become-method sudo --ask-pass

Seems option --ask-become-pass would hang the playbook execution. 
Remove this option from cmdline file then playbook got executed from command line.

On Billy's appliance where playbook services worked, the content of cmdline file has only:
  --become-method sudo

Not sure why ansible runner creates different content for cmdline file. Both appliances are running 5.11.2.2.

Comment 8 Nick LaMuro 2020-03-04 00:12:33 UTC
Hi all,


After debugging on the reported appliance and another reproducer, the error seems to be an issue with the combination of the following:

1.  The use of "Escalate Privilege" in the playbook catalog item
2.  Using a credential that doesn't have a "Privilege Escalation Password" field


After creating a new credential that includes the secondary appliance machine credential that includes both the password for the appliance in the "Password" and "Privilege Escalation Password" fields, the affected playbook ran without issue.



I would argue that this is not a bug, but possibly a lack of documentation or some UX that could be improved to provide better insight for failing playbook runs (which might require some backend changes to support).



-Nick

Comment 11 dmetzger 2020-03-04 14:13:34 UTC
Based on findings thus far, closing this ticket. If the problem is encountered again, please re-open or create a new ticket.