Description of problem: When deploying Skydive, the agent is not getting pulled by docker because we use the same tag as the analyzer. Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-9.0.1-0.20181013060908.el7ost.noarch How reproducible: All the time Steps to Reproduce: 1. Deploy skydive with the default templates, using the latest tag: ~~~ DockerSkydiveAgentImage: satellite:5000/lab-osp14_containers-skydive-agent:latest DockerSkydiveAnalyzerImage: satellite:5000/lab-osp14_containers-skydive-analyzer:latest ~~~ Actual results: "fatal: [lab -l-rh-cmp-0]: FAILED! => {\"changed\": true, \"cmd\": \"docker pull satellite:5000/lab-osp14_containers-skydive-agent:14.0-46\", \"delta\": \"0:00:00.132607\", \"end\": \"2019-02-07 22:40: 26.701776\", \"msg\": \"non-zero return code\", \"rc\": 1, \"start\": \"2019-02-07 22:40:26.569169\", \"stderr\": \"error parsing HTTP 404 response body: invalid character '<' looking for beginning of value : \\\"<!DOCTYPE html>\\\\n<html>\\\\n<head>\\\\n <title>The page you were looking for doesn't exist (404)</title>\\\\n Expected results: It should download the right image. Additional info: [1] Apparently due skydive template using same tag for agent image as for analyzer image [2] Not the same tags available [1] ~~~ /usr/share/openstack-tripleo-heat-templates/extraconfig/services/skydive-agent.yaml [...] skydive_docker_image_tag: {{skydive_analyzer_docker_image | regex_replace(".*:")}} [...] ~~~ [2] ~~~ $ skopeo inspect docker://satellite:5000/lab-osp14_containers-skydive-agent:latest | jq .RepoTags[] "14.0-47" "14.0-48" "14.0" "latest" $ skopeo inspect docker://satellite:5000/lab-osp14_containers-skydive-analyzer:latest | jq .RepoTags[] "14.0-45" "14.0-46" "14.0" "latest" ~~~
It looks like the main issue is a configuration issue : ~~~ parameter_defaults: SkydiveVars: globals: skydive_listen_ip: 192.168.4.6 ~~~ this IP(192.168.4.6) seems to be not reachable by the agents. There is a check in the Skydive ansible playbooks which checks is the analyzer/API is available which seems to be not the case according to the Skydive playbook logs. I would try to not specify any IP or to use 0.0.0.0 to test. For the docker image tag I do not see why specifying 'satellite:5000/lab-osp14_containers-skydive-agent:latest', we get this one in the log 'satellite:5000/lab-osp14_containers-skydive-agent:14.0-46'. Does the installation have been re-triggered with a another tag specified ? As the analyzer and the agents seems to be started thus docker pull succeed at least once, per the log and the processes reported, I don't think the main issue is due to the docker tag.
Adding these parameters could help ~~~ parameter_defaults: SkydiveVars: analyzers: skydive_analyzer_docker_extra_env: "--net=host" ControllerExtraConfig: tripleo::firewall::firewall_rules: '600 allow skydive etcd': dport: - 12379 - 12380 ~~~
Customer has retried with the recommended change but it still fails. I believe this is because the tenant and other operation are ran on all 3 controllers but because it fails on 2 out of 3 hosts (because you can't create a tenant multiple times), the 2 other hosts are ignored for the rest of the play. [1] the tasks that fails [2] The logs from the playbook I believe that all the keystone operations shouldn't be executed on all 3 controllers. We should probably "delegate_to: localhost" [1] ~~~ - name: Create a Skydive tenant environment: OS_AUTH_TOKEN: "" OS_AUTH_URL: "{{ os_auth_url }}" OS_USERNAME: "{{ os_username }}" OS_PASSWORD: "{{ os_password }}" OS_PROJECT_NAME: "{{ os_tenant_name }}" OS_USER_DOMAIN_NAME: "{{ os_user_domain_name }}" OS_PROJECT_DOMAIN_NAME: "{{ os_project_domain_name }}" OS_IDENTITY_API_VERSION: "{{ os_identity_api_version }}" os_project: name: "{{ skydive_auth_os_tenant_name }}" description: "Skydive admin users" domain_id: "{{ skydive_auth_os_domain_id }}" enabled: True state: present ~~~ [2] ~~~ TASK [skydive_analyzer : Create a Skydive tenant] ****************************** Tuesday 05 March 2019 15:46:59 -0800 (0:00:01.025) 0:03:00.335 ********* fatal: [oc-l-rh-ocld-0 -> localhost]: FAILED! => {"changed": false, "extra_data": null, "msg": "ConflictException: 409"} changed: [oc-l-rh-ocld-1 -> localhost] fatal: [oc-l-rh-ocld-2 -> localhost]: FAILED! => {"changed": false, "extra_data": null, "msg": "ConflictException: 409"} TASK [skydive_analyzer : Create a Skydive keystone API user] ******************* Tuesday 05 March 2019 15:47:04 -0800 (0:00:04.998) 0:03:05.334 ********* changed: [oc-l-rh-ocld-1 -> localhost] TASK [skydive_analyzer : Set skydive Keystone API user role] ******************* Tuesday 05 March 2019 15:47:09 -0800 (0:00:05.325) 0:03:10.659 ********* changed: [oc-l-rh-ocld-1 -> localhost] TASK [skydive_analyzer : Create a Skydive keystone service user] *************** Tuesday 05 March 2019 15:47:15 -0800 (0:00:05.771) 0:03:16.431 ********* changed: [oc-l-rh-ocld-1 -> localhost] TASK [skydive_analyzer : Set skydive Keystone service user role] *************** Tuesday 05 March 2019 15:47:20 -0800 (0:00:05.056) 0:03:21.487 ********* changed: [oc-l-rh-ocld-1 -> localhost] TASK [skydive_analyzer : Make the docker image available] ********************** Tuesday 05 March 2019 15:47:26 -0800 (0:00:05.241) 0:03:26.729 ********* TASK [skydive_common : Install Docker] ***************************************** Tuesday 05 March 2019 15:47:26 -0800 (0:00:00.512) 0:03:27.242 ********* ok: [oc-l-rh-ocld-1] TASK [skydive_common : Enable Docker service] ********************************** Tuesday 05 March 2019 15:47:29 -0800 (0:00:03.390) 0:03:30.632 ********* ok: [oc-l-rh-ocld-1] TASK [skydive_common : Pull skydive image] ************************************* Tuesday 05 March 2019 15:47:30 -0800 (0:00:00.566) 0:03:31.198 ********* changed: [oc-l-rh-ocld-1] ~~~
There is already a "delegate_to" thing here: https://github.com/skydive-project/skydive/blob/master/contrib/ansible/roles/skydive_analyzer/tasks/main.yml#L30 I'll check one more time...
This is interesting, it's clearly running on all controllers instead of the undercloud though. It looks that since 2.5 [1] we need to import if we want inheritance. This was reported upstream here [2] [1] https://docs.ansible.com/ansible/devel/porting_guides/porting_guide_2.5.html#dynamic-includes-and-attribute-inheritance [2] https://github.com/ansible/ansible/issues/37995
Thanks David, Indeed something changed. I'm about to submit a fix upstream for that and we will backport then it.
*** Bug 1677607 has been marked as a duplicate of this bug. ***
*** Bug 1679851 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0944
Hi Mark, The keystone changes have been backported and should be part of the next release. The firewall rules will be added by default in OSP15. Thanks, Sylvain
Addressed in https://bugzilla.redhat.com/show_bug.cgi?id=1722053