Bug 1513616 - Cloning repositories in Embedded Ansible within CFME without trusted SSL certificates leads to silent failure of project
Summary: Cloning repositories in Embedded Ansible within CFME without trusted SSL cert...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Providers
Version: 5.8.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: GA
: 5.10.0
Assignee: Tomas Coufal
QA Contact: Dmitry Misharov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-15 17:01 UTC by Robb Manes
Modified: 2019-06-25 14:39 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-02-07 23:00:39 UTC
Category: ---
Cloudforms Team: Ansible
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3242311 None None None 2017-11-15 17:12:17 UTC
Red Hat Product Errata RHSA-2019:0212 None None None 2019-02-07 23:00:46 UTC

Description Robb Manes 2017-11-15 17:01:41 UTC
Description of problem:

If, from a CFME appliance with the embedded ansible role enabled, one attempts to add a repository from an unverified source, instead of being given any option to ignore SSL or any error message describing why the cloning of the repository failed, the job is put in "failed" state.

Ideally an option to disable SSL verification for a specific repository would be present, or at least some method to describe the error, although it does seem the information from the Embedded Ansible API endpoint is limited at best about why it failed in the first place.

I note that via the Tower/AWX API I see no way to disable SSL verifications either, however.

Version-Release number of selected component (if applicable):
Tested on 5.8.2.3 and upstream/latest fine-4 build

How reproducible:
Every time

Steps to Reproduce:
- Verify that as the AWX user you are unable to clone the repository without adding the certificate as trusted (or manually disabling SSL validation):

# su - awx

$ git clone https://insecure-repo/rmanes/manageiq-ansible-playbook-demo.git 
Cloning into 'manageiq-ansible-playbook-demo'...
fatal: unable to access 'https://insecure-repo/rmanes/manageiq-ansible-playbook-demo.git/': Peer's certificate issuer has been marked as not trusted by the user.

- Disable SSL validation for the AWX user for git:

$ git config --global http.sslVerify "false"

$ git config -l
http.sslverify=false

$ git clone https://insecure-repo/rmanes/manageiq-ansible-playbook-demo.git 
Cloning into 'manageiq-ansible-playbook-demo'...
remote: Counting objects: 253, done.
remote: Compressing objects: 100% (148/148), done.
remote: Total 253 (delta 98), reused 253 (delta 98)
Receiving objects: 100% (253/253), 30.83 KiB | 0 bytes/s, done.
Resolving deltas: 100% (98/98), done.

- Attempt to add a git repository via Embedded Ansible that the appliance does not have SSL certificates to validate it with and watch it enter "Failed" state in the CFME Ops UI.

- Use the embedded ansible API to inspect current projects and determine that the project is in failed state:

irb(main):001:0> ManageIQ::Providers::EmbeddedAnsible::Provider.first.authentications.first.password
PostgreSQLAdapter#log_after_checkout, connection_pool: size: 5, connections: 1, in use: 1, waiting_in_queue: 0
=> "i39hKxy6RJOAkQyFSAeiRnHe"

# curl -k -u admin:i39hKxy6RJOAkQyFSAeiRnHe https://localhost/ansibleapi/v1/projects/ | python -mjson.tool
{
    "count": 1,
    "next": null,
    "previous": null,
    "results": [
	{
	    "created": "2017-11-15T16:22:15.048Z",
	    "credential": null,
	    "description": "",
	    "id": 6,
	    "last_job_failed": true,
	    "last_job_run": "2017-11-15T16:22:23.216858Z",
	    "last_update_failed": true,
	    "last_updated": "2017-11-15T16:22:23.216858Z",
	    "local_path": "_6__test_ssl_repo",
	    "modified": "2017-11-15T16:22:21.076Z",
	    "name": "test-ssl-repo",
	    "next_job_run": null,
	    "organization": null,
	    "related": {
	        "access_list": "/api/v1/projects/6/access_list/",
	        "activity_stream": "/api/v1/projects/6/activity_stream/",
	        "created_by": "/api/v1/users/1/",
	        "last_job": "/api/v1/project_updates/2/",
	        "last_update": "/api/v1/project_updates/2/",
	        "notification_templates_any": "/api/v1/projects/6/notification_templates_any/",
	        "notification_templates_error": "/api/v1/projects/6/notification_templates_error/",
	        "notification_templates_success": "/api/v1/projects/6/notification_templates_success/",
	        "object_roles": "/api/v1/projects/6/object_roles/",
	        "playbooks": "/api/v1/projects/6/playbooks/",
	        "project_updates": "/api/v1/projects/6/project_updates/",
	        "schedules": "/api/v1/projects/6/schedules/",
	        "teams": "/api/v1/projects/6/teams/",
	        "update": "/api/v1/projects/6/update/"
	    },
	    "scm_branch": "",
	    "scm_clean": false,
	    "scm_delete_on_next_update": false,
	    "scm_delete_on_update": false,
	    "scm_revision": "",
	    "scm_type": "git",
	    "scm_update_cache_timeout": 0,
	    "scm_update_on_launch": false,
	    "scm_url": "https://insecure-repo/rmanes/manageiq-ansible-playbook-demo.git",
	    "status": "failed",
	    "summary_fields": {
	        "created_by": {
	            "first_name": "",
	            "id": 1,
	            "last_name": "",
	            "username": "admin"
	        },
	        "last_job": {
	            "description": "",
	            "failed": true,
	            "finished": "2017-11-15T16:22:23.216Z",
	            "id": 2,
	            "name": "test-ssl-repo",
	            "status": "failed"
	        },
	        "last_update": {
	            "description": "",
	            "failed": true,
	            "id": 2,
	            "name": "test-ssl-repo",
	            "status": "failed"
	        },
	        "object_roles": {
	            "admin_role": {
	                "description": "Can manage all aspects of the project",
	                "id": 35,
	                "name": "Admin"
	            },
	            "read_role": {
	                "description": "May view settings for the project",
	                "id": 36,
	                "name": "Read"
	            },
	            "update_role": {
	                "description": "May update project or inventory or group using the configured source update system",
	                "id": 38,
	                "name": "Update"
	            },
	            "use_role": {
	                "description": "Can use the project in a job template",
	                "id": 37,
	                "name": "Use"
	            }
	        },
	        "user_capabilities": {
	            "delete": true,
	            "edit": true,
	            "schedule": true,
	            "start": true
	        }
	    },
	    "timeout": 0,
	    "type": "project",
	    "url": "/api/v1/projects/6/"
	}
    ]
}

# curl -k -u admin:i39hKxy6RJOAkQyFSAeiRnHe https://localhost/ansibleapi/v1/job_templates/ | python -mjson.tool
{
    "count": 0,
    "next": null,
    "previous": null,
    "results": []
}

Actual results:
Repository enters "failed" state when SSL verification fails with no further information.

Expected results:
Repository should either produce some message about why it failed (SSL Verification failure) and/or have an option to disable SSL verification.

Additional info:
Since CFME is a consumer of the Tower API in this case, and I can't find any Tower API call for creating a project while disabling SSL verification when cloing the repo (http://docs.ansible.com/ansible-tower/latest/html/towerapi/projects.html), I suspect this eventually ends up being a Tower issue first unless there is some suitable workaround or I've missed something.

This situation is avoided by ensuring that the correct SSL certificates to each repository are stored on the CFME appliance with the embedded ansible role enabled.

Comment 1 Bronagh Sorota 2017-11-15 18:48:00 UTC
Sending your way for assignment.

Bronagh

Comment 2 Adam Grare 2017-11-15 19:58:51 UTC
James can you see if we can get any more information about this error?

Comment 3 James Wong 2017-11-15 22:48:55 UTC
Currently, Tower doesn't provide a concise error message about what fails the repo clone/sync.

So we'll need 
1) Tower to return e.g. a 1-liner message about the failure
2) or we have to retrieve detail Tower log for that update and let user read the log to find out what caused the failure.


The following is a sample of failed repo clone/sync. 


===============================
Using /etc/ansible/ansible.cfg as config file

PLAY [all] *********************************************************************

TASK [delete project directory before update] **********************************
skipping: [localhost] => {"changed": false, "skip_reason": "Conditional check failed", "skipped": true}

TASK [update project using git and accept hostkey] *****************************
fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "failed to add bar hostkey: getaddrinfo bar: Name or service not known\r\n"}

NO MORE HOSTS LEFT *************************************************************
	to retry, use: --limit @project_update.retry

PLAY RECAP *********************************************************************
localhost                  : ok=0    changed=0    unreachable=0    failed=1

Comment 4 Robb Manes 2017-11-16 15:04:14 UTC
Am I correct in thinking there is no setting in Tower to disable SSL verification upon cloning, that can be set in standalone Tower or Embedded Ansible?  I am trying to determine if there is any workaround we can set via any means to disable SSL verification.

Comment 8 Robb Manes 2017-11-20 18:35:34 UTC
Note that in /ansibleapi via Tower you can actually retrieve some information from project_updates/#{project} that confirms this is the issue:

# curl -k -H "Content-Type: application/json" -u admin:Dp7C3cH60RkPhF9ijQeQrueH https://localhost/ansibleapi/v1/project_updates/3/ | python -mjson.tool
- - - - - - - - 8< - - - - - - - - 
    "id": 3,
- - - - - - - - 8< - - - - - - - - 
    "name": "test-ansible-repo",
    "project": 9,
    "related": {
        "cancel": "/api/v1/project_updates/3/cancel/",
        "created_by": "/api/v1/users/1/",
        "notifications": "/api/v1/project_updates/3/notifications/",
        "project": "/api/v1/projects/9/",
        "stdout": "/api/v1/project_updates/3/stdout/",
        "unified_job_template": "/api/v1/projects/9/"
    },
    "result_stdout": "Using /etc/ansible/ansible.cfg as config file\r\n\r\nPLAY [all] *********************************************************************\r\n\r\nTASK [delete project directory before update] **********************************\r\nskipping: [localhost]\r\n\r\nTASK [update project using git and accept hostkey] *****************************\r\nskipping: [localhost]\r\n\r\nTASK [Set the git repository version] ******************************************\r\nskipping: [localhost]\r\n\r\nTASK [update project using git] ************************************************\r\nfatal: [localhost]: FAILED! => {\"changed\": false, \"cmd\": \"/usr/bin/git clone --origin origin https://example.com/rmanes/manageiq-ansible-playbook-demo.git /var/lib/awx/projects/_9__test_ansible_repo\", \"failed\": true, \"msg\": \"fatal: unable to access 'https://example.com/rmanes/manageiq-ansible-playbook-demo.git/': Peer's certificate issuer has been marked as not trusted by the user.\", \"rc\": 128, \"stderr\": \"fatal: unable to access 'https://example.com/rmanes/manageiq-ansible-playbook-demo.git/': Peer's certificate issuer has been marked as not trusted by the user.\\n\", \"stderr_lines\": [\"fatal: unable to access 'https://example.com/rmanes/manageiq-ansible-playbook-demo.git/': Peer's certificate issuer has been marked as not trusted by the user.\"], \"stdout\": \"Cloning into '/var/lib/awx/projects/_9__test_ansible_repo'...\\n\", \"stdout_lines\": [\"Cloning into '/var/lib/awx/projects/_9__test_ansible_repo'...\"]}\r\n\r\nPLAY RECAP *********************************************************************\r\nlocalhost                  : ok=0    changed=0    unreachable=0    failed=1   \r\n\r\n",
- - - - - - - - 8< - - - - - - - - 
    "url": "/api/v1/project_updates/3/"
}

Comment 21 Graham Mainwaring 2018-03-20 18:58:14 UTC
One possibility would be to add

>    "AWX_TASK_ENV": {
>        "GIT_SSL_NO_VERIFY": "True"
>    }

to /api/v1/settings/jobs/.  This will set this environment variable in all job runs, including SCM updates.

Comment 22 Štěpán Tomsa 2018-03-21 07:25:56 UTC
Graham: I personally think that globally bypassing security would be a bad thing to do. Even an option to bypass SSL verification for a single repo should be used with caution.

Comment 23 Štěpán Tomsa 2018-03-28 11:27:03 UTC
We might have problems getting stdout of the project update. There is a bug in Tower/AWX and sometimes there is only "stdout capture is missing".

This issue (https://github.com/ansible/awx/issues/200) should be related, but it is not fixed yet.

Comment 24 Ryan Petrello 2018-04-03 13:20:17 UTC
Štěpán,

Could you confirm which version of Tower you're encountering https://github.com/ansible/awx/issues/200 on?

When older versions of Tower ran jobs, they wrote the stdout to a file, and requests to e.g., GET /api/v1/project_updates/#{id}/stdout/ would read that file from the filesystem.  The "stdout capture is missing" message you're seeing means the file isn't there, which can be caused by a number of bugs that Tower has had over various releases.

Comment 25 Ryan Petrello 2018-04-03 13:27:25 UTC
Štěpán,

Also, if it's easier, I'm open to chatting about this on RH Slack or IRC to speed up troubleshooting.  Let me know - I'm on RH Slack and am ryanpetrello on freenode.

Comment 26 Štěpán Tomsa 2018-04-03 15:17:42 UTC
Ryan,

We encounter this bug with rather old Tower 3.0.2. I’ll try to reproduce it with more recent instance (3.2.2) to see whether the issue is present there as well.

However, it looks like we can avoid encountering this bug at all. We are triggering a project update right after a project is created, even though a newly created project updates itself automatically. This manual update job does not have its stdout captured, but the automatic one does. I guess that if we’d get rid of this manual update, it would solve this issue getting around the Tower/AWX bug even for older releases.

Comment 27 Ryan Petrello 2018-04-03 17:10:13 UTC
Štěpán,

Good to know - let me know if you find yourself stuck and need some additional help!

Comment 28 Štěpán Tomsa 2018-04-04 06:59:38 UTC
I can confirm that the stdout capture bug does not happen in newer Tower 3.2.2. Still it would be good to get rid of the unnecessary update triggered from our side. That would also get around the stdout issue for older Tower instances. I’d do that as a separate task, not mixing it with this BZ. Thanks Ryan for help!

Comment 33 Štěpán Tomsa 2018-04-12 14:11:43 UTC
As CFME Bot already mentioned, I packed up some PR’s that solve the issue as it is for now:

* https://github.com/ManageIQ/manageiq/pull/17290
* https://github.com/ManageIQ/manageiq-schema/pull/187
* https://github.com/ansible/ansible_tower_client_ruby/pull/102
* https://github.com/ManageIQ/manageiq-providers-ansible_tower/pull/72
* https://github.com/ManageIQ/manageiq-ui-classic/pull/3762

For improvements like compatibility with older Tower versions and create/refresh refactoring I’ll create new BZ’s.

Comment 35 CFME Bot 2018-04-26 16:58:40 UTC
New commit detected on ansible/ansible_tower_client_ruby/master:

https://github.com/ManageIQ/ansible_tower_client/commit/7aadb7ebe918ebad117eb03e8bff18a1a364e642
commit 7aadb7ebe918ebad117eb03e8bff18a1a364e642
Author:     Štěpán Tomsa <stomsa@redhat.com>
AuthorDate: Tue Apr  3 06:41:57 2018 -0400
Commit:     Štěpán Tomsa <stomsa@redhat.com>
CommitDate: Tue Apr  3 06:41:57 2018 -0400

    Add Project#last_update

    Add a project model method that allows to fetch its last update. This is
    useful to find some more information about why the update failed.
    ManageIQ Ansible Tower provider is going to fetch the stdout capture
    from this resource.

    https://bugzilla.redhat.com/show_bug.cgi?id=1513616

 CHANGELOG.md | 3 +
 lib/ansible_tower_client/base_models/project.rb | 5 +
 spec/factories/responses.rb | 9 +-
 spec/project_spec.rb | 24 +
 4 files changed, 40 insertions(+), 1 deletion(-)

Comment 36 Štěpán Tomsa 2018-05-02 06:14:23 UTC
Status update:

These PRs are waiting to be merged. Those marked as ready were reviewed and I pushed the requested changes. If these are accepted by the reviewers, the PRs are really ready to be merged.

* https://github.com/ManageIQ/manageiq-schema/pull/187 schema update is ready to be merged
* https://github.com/ManageIQ/manageiq/pull/17290 model update is waiting for the aforementioned schema to be merged, otherwise ready
* https://github.com/ManageIQ/manageiq-providers-ansible_tower/pull/73 is required for the following PR and is ready to be merged
* https://github.com/ManageIQ/manageiq-providers-ansible_tower/pull/72 is waiting of the aforementioned PR and for the aforementioned model update to be merged, otherwise ready
* https://github.com/ManageIQ/manageiq-ui-classic/pull/3762 is waiting for the aforementioned model update to be merged, otherwise ready

These PRs have been merged:

* https://github.com/ansible/ansible_tower_client_ruby/pull/102 Tower API client method
* https://github.com/ansible/ansible_tower_client_ruby/pull/103 Version release.

Comment 37 Štěpán Tomsa 2018-05-02 06:15:18 UTC
This issue is tracking the merging progress: https://github.com/ManageIQ/manageiq/issues/17307

Comment 39 Štěpán Tomsa 2018-05-24 07:13:22 UTC
I’d like to ask Brad: Do we still support Tower v2? Unfortunately I found out during testing that my PR would not work in a specific case when using this legacy version of tower. Thanks!

Comment 40 bascar 2018-05-24 12:18:49 UTC
I would say we should be guided by the Tower lifecycle[1] and the last tower 2.x version went end of life on 20-July-2017. If it is easy enough to call out in the errata docs (and docs) that it has a particular issue then call it out but do not worry about EOL Tower versions.


[1] https://access.redhat.com/support/policy/updates/ansible-tower

Comment 41 CFME Bot 2018-07-27 10:38:51 UTC
New commit detected on ManageIQ/manageiq-providers-ansible_tower/master:

https://github.com/ManageIQ/manageiq-providers-ansible_tower/commit/f7c616e8767752551fefb78b5e75bb1fcf58b418
commit f7c616e8767752551fefb78b5e75bb1fcf58b418
Author:     Štěpán Tomsa <stomsa@redhat.com>
AuthorDate: Tue Apr  3 06:34:23 2018 -0400
Commit:     Štěpán Tomsa <stomsa@redhat.com>
CommitDate: Tue Apr  3 06:34:23 2018 -0400

    Save projects’ last update stdout

    Load stdout of projects’ last update job upon collecting. This standard
    output capture can contain useful information. Save this output to a new
    database field.

    Re-record cassettes so they make additional request to grab every
    project’s last update.

    https://bugzilla.redhat.com/show_bug.cgi?id=1513616

 app/models/manageiq/providers/ansible_tower/shared/inventory/parser/automation_manager.rb | 10 +
 spec/support/ansible_shared/automation_manager/refresh_configuartion_script_source.rb | 18 +-
 spec/support/ansible_shared/automation_manager/refresher.rb | 64 +-
 spec/vcr_cassettes/manageiq/providers/ansible_tower/automation_manager/refresher.yml | 2312 +-
 4 files changed, 1894 insertions(+), 510 deletions(-)

Comment 42 Štěpán Tomsa 2018-09-20 12:10:56 UTC
https://github.com/ManageIQ/manageiq-providers-ansible_tower/pull/72 has been merged for some time already. I think this is resolved. Moving to ON_QA and assigning to @tcoufal, who took over this BZ. Please move to VERIFIED if it has been verified. Thanks!

Comment 43 Dmitry Misharov 2018-10-01 07:59:45 UTC
Verified in 5.10.0.17.20180927011235_1b5cf54. Now it possible to check the stdout of ansible repository.

Comment 44 errata-xmlrpc 2019-02-07 23:00:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:0212


Note You need to log in before you can comment on or make changes to this bug.