Description of problem: If, from a CFME appliance with the embedded ansible role enabled, one attempts to add a repository from an unverified source, instead of being given any option to ignore SSL or any error message describing why the cloning of the repository failed, the job is put in "failed" state. Ideally an option to disable SSL verification for a specific repository would be present, or at least some method to describe the error, although it does seem the information from the Embedded Ansible API endpoint is limited at best about why it failed in the first place. I note that via the Tower/AWX API I see no way to disable SSL verifications either, however. Version-Release number of selected component (if applicable): Tested on 5.8.2.3 and upstream/latest fine-4 build How reproducible: Every time Steps to Reproduce: - Verify that as the AWX user you are unable to clone the repository without adding the certificate as trusted (or manually disabling SSL validation): # su - awx $ git clone https://insecure-repo/rmanes/manageiq-ansible-playbook-demo.git Cloning into 'manageiq-ansible-playbook-demo'... fatal: unable to access 'https://insecure-repo/rmanes/manageiq-ansible-playbook-demo.git/': Peer's certificate issuer has been marked as not trusted by the user. - Disable SSL validation for the AWX user for git: $ git config --global http.sslVerify "false" $ git config -l http.sslverify=false $ git clone https://insecure-repo/rmanes/manageiq-ansible-playbook-demo.git Cloning into 'manageiq-ansible-playbook-demo'... remote: Counting objects: 253, done. remote: Compressing objects: 100% (148/148), done. remote: Total 253 (delta 98), reused 253 (delta 98) Receiving objects: 100% (253/253), 30.83 KiB | 0 bytes/s, done. Resolving deltas: 100% (98/98), done. - Attempt to add a git repository via Embedded Ansible that the appliance does not have SSL certificates to validate it with and watch it enter "Failed" state in the CFME Ops UI. - Use the embedded ansible API to inspect current projects and determine that the project is in failed state: irb(main):001:0> ManageIQ::Providers::EmbeddedAnsible::Provider.first.authentications.first.password PostgreSQLAdapter#log_after_checkout, connection_pool: size: 5, connections: 1, in use: 1, waiting_in_queue: 0 => "i39hKxy6RJOAkQyFSAeiRnHe" # curl -k -u admin:i39hKxy6RJOAkQyFSAeiRnHe https://localhost/ansibleapi/v1/projects/ | python -mjson.tool { "count": 1, "next": null, "previous": null, "results": [ { "created": "2017-11-15T16:22:15.048Z", "credential": null, "description": "", "id": 6, "last_job_failed": true, "last_job_run": "2017-11-15T16:22:23.216858Z", "last_update_failed": true, "last_updated": "2017-11-15T16:22:23.216858Z", "local_path": "_6__test_ssl_repo", "modified": "2017-11-15T16:22:21.076Z", "name": "test-ssl-repo", "next_job_run": null, "organization": null, "related": { "access_list": "/api/v1/projects/6/access_list/", "activity_stream": "/api/v1/projects/6/activity_stream/", "created_by": "/api/v1/users/1/", "last_job": "/api/v1/project_updates/2/", "last_update": "/api/v1/project_updates/2/", "notification_templates_any": "/api/v1/projects/6/notification_templates_any/", "notification_templates_error": "/api/v1/projects/6/notification_templates_error/", "notification_templates_success": "/api/v1/projects/6/notification_templates_success/", "object_roles": "/api/v1/projects/6/object_roles/", "playbooks": "/api/v1/projects/6/playbooks/", "project_updates": "/api/v1/projects/6/project_updates/", "schedules": "/api/v1/projects/6/schedules/", "teams": "/api/v1/projects/6/teams/", "update": "/api/v1/projects/6/update/" }, "scm_branch": "", "scm_clean": false, "scm_delete_on_next_update": false, "scm_delete_on_update": false, "scm_revision": "", "scm_type": "git", "scm_update_cache_timeout": 0, "scm_update_on_launch": false, "scm_url": "https://insecure-repo/rmanes/manageiq-ansible-playbook-demo.git", "status": "failed", "summary_fields": { "created_by": { "first_name": "", "id": 1, "last_name": "", "username": "admin" }, "last_job": { "description": "", "failed": true, "finished": "2017-11-15T16:22:23.216Z", "id": 2, "name": "test-ssl-repo", "status": "failed" }, "last_update": { "description": "", "failed": true, "id": 2, "name": "test-ssl-repo", "status": "failed" }, "object_roles": { "admin_role": { "description": "Can manage all aspects of the project", "id": 35, "name": "Admin" }, "read_role": { "description": "May view settings for the project", "id": 36, "name": "Read" }, "update_role": { "description": "May update project or inventory or group using the configured source update system", "id": 38, "name": "Update" }, "use_role": { "description": "Can use the project in a job template", "id": 37, "name": "Use" } }, "user_capabilities": { "delete": true, "edit": true, "schedule": true, "start": true } }, "timeout": 0, "type": "project", "url": "/api/v1/projects/6/" } ] } # curl -k -u admin:i39hKxy6RJOAkQyFSAeiRnHe https://localhost/ansibleapi/v1/job_templates/ | python -mjson.tool { "count": 0, "next": null, "previous": null, "results": [] } Actual results: Repository enters "failed" state when SSL verification fails with no further information. Expected results: Repository should either produce some message about why it failed (SSL Verification failure) and/or have an option to disable SSL verification. Additional info: Since CFME is a consumer of the Tower API in this case, and I can't find any Tower API call for creating a project while disabling SSL verification when cloing the repo (http://docs.ansible.com/ansible-tower/latest/html/towerapi/projects.html), I suspect this eventually ends up being a Tower issue first unless there is some suitable workaround or I've missed something. This situation is avoided by ensuring that the correct SSL certificates to each repository are stored on the CFME appliance with the embedded ansible role enabled.
Sending your way for assignment. Bronagh
James can you see if we can get any more information about this error?
Currently, Tower doesn't provide a concise error message about what fails the repo clone/sync. So we'll need 1) Tower to return e.g. a 1-liner message about the failure 2) or we have to retrieve detail Tower log for that update and let user read the log to find out what caused the failure. The following is a sample of failed repo clone/sync. =============================== Using /etc/ansible/ansible.cfg as config file PLAY [all] ********************************************************************* TASK [delete project directory before update] ********************************** skipping: [localhost] => {"changed": false, "skip_reason": "Conditional check failed", "skipped": true} TASK [update project using git and accept hostkey] ***************************** fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "failed to add bar hostkey: getaddrinfo bar: Name or service not known\r\n"} NO MORE HOSTS LEFT ************************************************************* to retry, use: --limit @project_update.retry PLAY RECAP ********************************************************************* localhost : ok=0 changed=0 unreachable=0 failed=1
Am I correct in thinking there is no setting in Tower to disable SSL verification upon cloning, that can be set in standalone Tower or Embedded Ansible? I am trying to determine if there is any workaround we can set via any means to disable SSL verification.
Note that in /ansibleapi via Tower you can actually retrieve some information from project_updates/#{project} that confirms this is the issue: # curl -k -H "Content-Type: application/json" -u admin:Dp7C3cH60RkPhF9ijQeQrueH https://localhost/ansibleapi/v1/project_updates/3/ | python -mjson.tool - - - - - - - - 8< - - - - - - - - "id": 3, - - - - - - - - 8< - - - - - - - - "name": "test-ansible-repo", "project": 9, "related": { "cancel": "/api/v1/project_updates/3/cancel/", "created_by": "/api/v1/users/1/", "notifications": "/api/v1/project_updates/3/notifications/", "project": "/api/v1/projects/9/", "stdout": "/api/v1/project_updates/3/stdout/", "unified_job_template": "/api/v1/projects/9/" }, "result_stdout": "Using /etc/ansible/ansible.cfg as config file\r\n\r\nPLAY [all] *********************************************************************\r\n\r\nTASK [delete project directory before update] **********************************\r\nskipping: [localhost]\r\n\r\nTASK [update project using git and accept hostkey] *****************************\r\nskipping: [localhost]\r\n\r\nTASK [Set the git repository version] ******************************************\r\nskipping: [localhost]\r\n\r\nTASK [update project using git] ************************************************\r\nfatal: [localhost]: FAILED! => {\"changed\": false, \"cmd\": \"/usr/bin/git clone --origin origin https://example.com/rmanes/manageiq-ansible-playbook-demo.git /var/lib/awx/projects/_9__test_ansible_repo\", \"failed\": true, \"msg\": \"fatal: unable to access 'https://example.com/rmanes/manageiq-ansible-playbook-demo.git/': Peer's certificate issuer has been marked as not trusted by the user.\", \"rc\": 128, \"stderr\": \"fatal: unable to access 'https://example.com/rmanes/manageiq-ansible-playbook-demo.git/': Peer's certificate issuer has been marked as not trusted by the user.\\n\", \"stderr_lines\": [\"fatal: unable to access 'https://example.com/rmanes/manageiq-ansible-playbook-demo.git/': Peer's certificate issuer has been marked as not trusted by the user.\"], \"stdout\": \"Cloning into '/var/lib/awx/projects/_9__test_ansible_repo'...\\n\", \"stdout_lines\": [\"Cloning into '/var/lib/awx/projects/_9__test_ansible_repo'...\"]}\r\n\r\nPLAY RECAP *********************************************************************\r\nlocalhost : ok=0 changed=0 unreachable=0 failed=1 \r\n\r\n", - - - - - - - - 8< - - - - - - - - "url": "/api/v1/project_updates/3/" }
One possibility would be to add > "AWX_TASK_ENV": { > "GIT_SSL_NO_VERIFY": "True" > } to /api/v1/settings/jobs/. This will set this environment variable in all job runs, including SCM updates.
Graham: I personally think that globally bypassing security would be a bad thing to do. Even an option to bypass SSL verification for a single repo should be used with caution.
We might have problems getting stdout of the project update. There is a bug in Tower/AWX and sometimes there is only "stdout capture is missing". This issue (https://github.com/ansible/awx/issues/200) should be related, but it is not fixed yet.
Štěpán, Could you confirm which version of Tower you're encountering https://github.com/ansible/awx/issues/200 on? When older versions of Tower ran jobs, they wrote the stdout to a file, and requests to e.g., GET /api/v1/project_updates/#{id}/stdout/ would read that file from the filesystem. The "stdout capture is missing" message you're seeing means the file isn't there, which can be caused by a number of bugs that Tower has had over various releases.
Štěpán, Also, if it's easier, I'm open to chatting about this on RH Slack or IRC to speed up troubleshooting. Let me know - I'm on RH Slack and am ryanpetrello on freenode.
Ryan, We encounter this bug with rather old Tower 3.0.2. I’ll try to reproduce it with more recent instance (3.2.2) to see whether the issue is present there as well. However, it looks like we can avoid encountering this bug at all. We are triggering a project update right after a project is created, even though a newly created project updates itself automatically. This manual update job does not have its stdout captured, but the automatic one does. I guess that if we’d get rid of this manual update, it would solve this issue getting around the Tower/AWX bug even for older releases.
Štěpán, Good to know - let me know if you find yourself stuck and need some additional help!
I can confirm that the stdout capture bug does not happen in newer Tower 3.2.2. Still it would be good to get rid of the unnecessary update triggered from our side. That would also get around the stdout issue for older Tower instances. I’d do that as a separate task, not mixing it with this BZ. Thanks Ryan for help!
https://github.com/ansible/ansible_tower_client_ruby/pull/102
https://github.com/ManageIQ/manageiq-providers-ansible_tower/pull/72
As CFME Bot already mentioned, I packed up some PR’s that solve the issue as it is for now: * https://github.com/ManageIQ/manageiq/pull/17290 * https://github.com/ManageIQ/manageiq-schema/pull/187 * https://github.com/ansible/ansible_tower_client_ruby/pull/102 * https://github.com/ManageIQ/manageiq-providers-ansible_tower/pull/72 * https://github.com/ManageIQ/manageiq-ui-classic/pull/3762 For improvements like compatibility with older Tower versions and create/refresh refactoring I’ll create new BZ’s.
https://github.com/ansible/ansible_tower_client_ruby/pull/103
New commit detected on ansible/ansible_tower_client_ruby/master: https://github.com/ManageIQ/ansible_tower_client/commit/7aadb7ebe918ebad117eb03e8bff18a1a364e642 commit 7aadb7ebe918ebad117eb03e8bff18a1a364e642 Author: Štěpán Tomsa <stomsa> AuthorDate: Tue Apr 3 06:41:57 2018 -0400 Commit: Štěpán Tomsa <stomsa> CommitDate: Tue Apr 3 06:41:57 2018 -0400 Add Project#last_update Add a project model method that allows to fetch its last update. This is useful to find some more information about why the update failed. ManageIQ Ansible Tower provider is going to fetch the stdout capture from this resource. https://bugzilla.redhat.com/show_bug.cgi?id=1513616 CHANGELOG.md | 3 + lib/ansible_tower_client/base_models/project.rb | 5 + spec/factories/responses.rb | 9 +- spec/project_spec.rb | 24 + 4 files changed, 40 insertions(+), 1 deletion(-)
Status update: These PRs are waiting to be merged. Those marked as ready were reviewed and I pushed the requested changes. If these are accepted by the reviewers, the PRs are really ready to be merged. * https://github.com/ManageIQ/manageiq-schema/pull/187 schema update is ready to be merged * https://github.com/ManageIQ/manageiq/pull/17290 model update is waiting for the aforementioned schema to be merged, otherwise ready * https://github.com/ManageIQ/manageiq-providers-ansible_tower/pull/73 is required for the following PR and is ready to be merged * https://github.com/ManageIQ/manageiq-providers-ansible_tower/pull/72 is waiting of the aforementioned PR and for the aforementioned model update to be merged, otherwise ready * https://github.com/ManageIQ/manageiq-ui-classic/pull/3762 is waiting for the aforementioned model update to be merged, otherwise ready These PRs have been merged: * https://github.com/ansible/ansible_tower_client_ruby/pull/102 Tower API client method * https://github.com/ansible/ansible_tower_client_ruby/pull/103 Version release.
This issue is tracking the merging progress: https://github.com/ManageIQ/manageiq/issues/17307
https://github.com/ManageIQ/manageiq-providers-ansible_tower/pull/82
I’d like to ask Brad: Do we still support Tower v2? Unfortunately I found out during testing that my PR would not work in a specific case when using this legacy version of tower. Thanks!
I would say we should be guided by the Tower lifecycle[1] and the last tower 2.x version went end of life on 20-July-2017. If it is easy enough to call out in the errata docs (and docs) that it has a particular issue then call it out but do not worry about EOL Tower versions. [1] https://access.redhat.com/support/policy/updates/ansible-tower
New commit detected on ManageIQ/manageiq-providers-ansible_tower/master: https://github.com/ManageIQ/manageiq-providers-ansible_tower/commit/f7c616e8767752551fefb78b5e75bb1fcf58b418 commit f7c616e8767752551fefb78b5e75bb1fcf58b418 Author: Štěpán Tomsa <stomsa> AuthorDate: Tue Apr 3 06:34:23 2018 -0400 Commit: Štěpán Tomsa <stomsa> CommitDate: Tue Apr 3 06:34:23 2018 -0400 Save projects’ last update stdout Load stdout of projects’ last update job upon collecting. This standard output capture can contain useful information. Save this output to a new database field. Re-record cassettes so they make additional request to grab every project’s last update. https://bugzilla.redhat.com/show_bug.cgi?id=1513616 app/models/manageiq/providers/ansible_tower/shared/inventory/parser/automation_manager.rb | 10 + spec/support/ansible_shared/automation_manager/refresh_configuartion_script_source.rb | 18 +- spec/support/ansible_shared/automation_manager/refresher.rb | 64 +- spec/vcr_cassettes/manageiq/providers/ansible_tower/automation_manager/refresher.yml | 2312 +- 4 files changed, 1894 insertions(+), 510 deletions(-)
https://github.com/ManageIQ/manageiq-providers-ansible_tower/pull/72 has been merged for some time already. I think this is resolved. Moving to ON_QA and assigning to @tcoufal, who took over this BZ. Please move to VERIFIED if it has been verified. Thanks!
Verified in 5.10.0.17.20180927011235_1b5cf54. Now it possible to check the stdout of ansible repository.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:0212