Bug 1676915
Summary: | Skydive agent's deployment fails because it uses the same tag as the analyzer | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | David Vallee Delisle <dvd> |
Component: | skydive | Assignee: | safchain |
Status: | CLOSED ERRATA | QA Contact: | safchain |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 14.0 (Rocky) | CC: | fbaudin, marjones, mburns, mkaliyam, nplanel, psahoo, rheslop, rsafrono, safchain, sbaubeau |
Target Milestone: | --- | Keywords: | Reopened, Triaged, ZStream |
Target Release: | 14.0 (Rocky) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | skydive-0.20.4-1.el7ost.x86_64.rpm | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-06-19 12:41:34 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
David Vallee Delisle
2019-02-13 15:21:33 UTC
It looks like the main issue is a configuration issue : ~~~ parameter_defaults: SkydiveVars: globals: skydive_listen_ip: 192.168.4.6 ~~~ this IP(192.168.4.6) seems to be not reachable by the agents. There is a check in the Skydive ansible playbooks which checks is the analyzer/API is available which seems to be not the case according to the Skydive playbook logs. I would try to not specify any IP or to use 0.0.0.0 to test. For the docker image tag I do not see why specifying 'satellite:5000/lab-osp14_containers-skydive-agent:latest', we get this one in the log 'satellite:5000/lab-osp14_containers-skydive-agent:14.0-46'. Does the installation have been re-triggered with a another tag specified ? As the analyzer and the agents seems to be started thus docker pull succeed at least once, per the log and the processes reported, I don't think the main issue is due to the docker tag. Adding these parameters could help ~~~ parameter_defaults: SkydiveVars: analyzers: skydive_analyzer_docker_extra_env: "--net=host" ControllerExtraConfig: tripleo::firewall::firewall_rules: '600 allow skydive etcd': dport: - 12379 - 12380 ~~~ Customer has retried with the recommended change but it still fails. I believe this is because the tenant and other operation are ran on all 3 controllers but because it fails on 2 out of 3 hosts (because you can't create a tenant multiple times), the 2 other hosts are ignored for the rest of the play. [1] the tasks that fails [2] The logs from the playbook I believe that all the keystone operations shouldn't be executed on all 3 controllers. We should probably "delegate_to: localhost" [1] ~~~ - name: Create a Skydive tenant environment: OS_AUTH_TOKEN: "" OS_AUTH_URL: "{{ os_auth_url }}" OS_USERNAME: "{{ os_username }}" OS_PASSWORD: "{{ os_password }}" OS_PROJECT_NAME: "{{ os_tenant_name }}" OS_USER_DOMAIN_NAME: "{{ os_user_domain_name }}" OS_PROJECT_DOMAIN_NAME: "{{ os_project_domain_name }}" OS_IDENTITY_API_VERSION: "{{ os_identity_api_version }}" os_project: name: "{{ skydive_auth_os_tenant_name }}" description: "Skydive admin users" domain_id: "{{ skydive_auth_os_domain_id }}" enabled: True state: present ~~~ [2] ~~~ TASK [skydive_analyzer : Create a Skydive tenant] ****************************** Tuesday 05 March 2019 15:46:59 -0800 (0:00:01.025) 0:03:00.335 ********* fatal: [oc-l-rh-ocld-0 -> localhost]: FAILED! => {"changed": false, "extra_data": null, "msg": "ConflictException: 409"} changed: [oc-l-rh-ocld-1 -> localhost] fatal: [oc-l-rh-ocld-2 -> localhost]: FAILED! => {"changed": false, "extra_data": null, "msg": "ConflictException: 409"} TASK [skydive_analyzer : Create a Skydive keystone API user] ******************* Tuesday 05 March 2019 15:47:04 -0800 (0:00:04.998) 0:03:05.334 ********* changed: [oc-l-rh-ocld-1 -> localhost] TASK [skydive_analyzer : Set skydive Keystone API user role] ******************* Tuesday 05 March 2019 15:47:09 -0800 (0:00:05.325) 0:03:10.659 ********* changed: [oc-l-rh-ocld-1 -> localhost] TASK [skydive_analyzer : Create a Skydive keystone service user] *************** Tuesday 05 March 2019 15:47:15 -0800 (0:00:05.771) 0:03:16.431 ********* changed: [oc-l-rh-ocld-1 -> localhost] TASK [skydive_analyzer : Set skydive Keystone service user role] *************** Tuesday 05 March 2019 15:47:20 -0800 (0:00:05.056) 0:03:21.487 ********* changed: [oc-l-rh-ocld-1 -> localhost] TASK [skydive_analyzer : Make the docker image available] ********************** Tuesday 05 March 2019 15:47:26 -0800 (0:00:05.241) 0:03:26.729 ********* TASK [skydive_common : Install Docker] ***************************************** Tuesday 05 March 2019 15:47:26 -0800 (0:00:00.512) 0:03:27.242 ********* ok: [oc-l-rh-ocld-1] TASK [skydive_common : Enable Docker service] ********************************** Tuesday 05 March 2019 15:47:29 -0800 (0:00:03.390) 0:03:30.632 ********* ok: [oc-l-rh-ocld-1] TASK [skydive_common : Pull skydive image] ************************************* Tuesday 05 March 2019 15:47:30 -0800 (0:00:00.566) 0:03:31.198 ********* changed: [oc-l-rh-ocld-1] ~~~ There is already a "delegate_to" thing here: https://github.com/skydive-project/skydive/blob/master/contrib/ansible/roles/skydive_analyzer/tasks/main.yml#L30 I'll check one more time... This is interesting, it's clearly running on all controllers instead of the undercloud though. It looks that since 2.5 [1] we need to import if we want inheritance. This was reported upstream here [2] [1] https://docs.ansible.com/ansible/devel/porting_guides/porting_guide_2.5.html#dynamic-includes-and-attribute-inheritance [2] https://github.com/ansible/ansible/issues/37995 Thanks David, Indeed something changed. I'm about to submit a fix upstream for that and we will backport then it. *** Bug 1677607 has been marked as a duplicate of this bug. *** *** Bug 1679851 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0944 Hi Mark, The keystone changes have been backported and should be part of the next release. The firewall rules will be added by default in OSP15. Thanks, Sylvain Addressed in https://bugzilla.redhat.com/show_bug.cgi?id=1722053 |