Bug 1761612
| Summary: | The GUI installer is unable to override the prometheus port setting of ceph-ansible | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Paul Cuzner <pcuzner> | ||||||
| Component: | Ceph-Ansible | Assignee: | Dimitri Savineau <dsavinea> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Ameena Suhani S H <amsyedha> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 4.0 | CC: | aschoen, ceph-eng-bugs, ceph-qe-bugs, dsavinea, gabrioux, gmeno, kdreyer, nthomas, tchandra, tserlin, vashastr, ykaul | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | 4.0 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | ceph-ansible-4.0.3-1.el8cp | Doc Type: | If docs needed, set a value | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2020-01-31 12:47:36 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | 1787068 | ||||||||
| Bug Blocks: | |||||||||
| Attachments: |
|
||||||||
|
Description
Paul Cuzner
2019-10-14 21:27:41 UTC
Please specify the severity of this bug. Severity is defined here: https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity. By embedding the tasks from dashboard.yml into site.yml, the override for prometheus_port works - from the CLI (ansible-playbook site.yml) and ansble-runner (GUI) In addition, I'm also seeing the grafana_server_addr not being set as expected In my environment I have an installer machine - where ansible is running from, which is also the target for the grafana and prometheus containers. At the end of the play the containers are running from the installer machine, but the settings applied to ceph and the datasource defined to grafana all point to the mgr not the installer host? Have attached my group_vars and inventory Created attachment 1626180 [details]
inventory file
Created attachment 1626181 [details]
group vars - all.yml
In my test environment, I'm using two machines - and installer, and an all-in-one node for all ceph daemons. The installer is used for the grafana-server group. Looking at the ceph config keys [root@rhcs4-aio ~]# ceph config get mgr.rhcs4-aio mgr/dashboard/PROMETHEUS_API_HOST [root@rhcs4-aio ~]# ceph config get mgr.rhcs4-aio mgr/dashboard/GRAFANA_API_URL http://10.90.90.165:3000/ In my case 10.90.90.165 is the IP for the host running ceph mgr, 10.90.90.163 is actually where the prometheus & grafana containers are deployed to from the rhcs4-aio box (.165), the rhcs4-installer name resolves to .163 correctly As a consequence, the grafana dashboard integration is also broken Other relevant versions sh-4.4# rpm -q ansible ansible-2.8.3-1.el8ae.noarch sh-4.4# rpm -q ansible-runner ansible-runner-1.3.4-2.el8ar.noarch > Looking at the ceph config keys > [root@rhcs4-aio ~]# ceph config get mgr.rhcs4-aio mgr/dashboard/PROMETHEUS_API_HOST The prometheus api host key has been added by [1] and present since v4.0.0 > [root@rhcs4-aio ~]# ceph config get mgr.rhcs4-aio mgr/dashboard/GRAFANA_API_URL > http://10.90.90.165:3000/ This was fixed by [2] and present since v4.0.0 Expect those two issues, the prometheus_port override should already work in rc16. I'll try to do a deployment using rc16. [1] https://github.com/ceph/ceph-ansible/commit/74ab59c4f33d534cfbca4055c1f494a670be40e2 [2] https://github.com/ceph/ceph-ansible/commit/9bb11c7b2a17db56cfcd7284d2190af36e17bba6 The prometheus_port override works for me with rc16
1 installer node with grafana/prometheus stack and running ceph-ansible
1 aio node with mon/mgr/osd/rgw/mds
$ ansible --version
ansible 2.8.3
$ grep prometheus_port group_vars/all.yml
prometheus_port: 9095
$ sudo ss -lntup|grep prometheus
tcp LISTEN 0 128 :::9095 :::* users:(("prometheus",pid=15481,fd=7))
Good to know - thanks for testing. Is this rhel8 with python3? My ansible version on rhel8 is 2.8.5 [root@rhcs4-installer inventory]# ansible --version ansible 2.8.5 config file = /etc/ansible/ansible.cfg configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python3.6/site-packages/ansible executable location = /usr/bin/ansible python version = 3.6.8 (default, Apr 3 2019, 17:26:03) [GCC 8.2.1 20180905 (Red Hat 8.2.1-3)] [root@rhcs4-installer inventory]# rpm -q ansible ansible-2.8.5-2.el8ae.noarch I tried 4.02 from brew, and can confirm the issues with the prometheus/grafana definitions is resolved - the grafana integration is working as expected. The prometheus_port though didn't take my override, but given the default is now 9092 I didn't have a port clash. > Is this rhel8 with python3? My ansible version on rhel8 is 2.8.5 No it was on CentOS 7 with python2 but it doesn't change anything. I also tried with ansible 2.8.5 as well with success. > The prometheus_port though didn't take my override, but given the default is now 9092 I didn't have a port clash. Could you share how you run the ansible-playbook command ? (ie: where is located the inventory file, [group|host]_vars directory, etc...) Ok so the import_playbook doesn't use the [group|host]_vars the same way depending on the location of the ansible inventory.
the group_vars directory is used when it's present (there's more scenarios based on [1]):
- in the same directory than the inventory file
- in the same directory than the playbook file
When the inventory file is in the ceph-ansible directory then everything works perfectly (that's what I'm always using).
-----------------
$ grep prometheus_port group_vars/all.yml
prometheus_port: 9099
$ ansible-playbook -i hosts site.yml
PLAY [mons] *******************************************
TASK [ceph-prometheus : prometheus_port variable] *****
ok: [rhcs4-aio] => {
"msg": 9099
}
PLAY [grafana-server] *********************************
TASK [ceph-prometheus : prometheus_port variable] *****
ok: [rhcs4-installer] => {
"msg": 9099
}
-----------------
But if the inventory file directory doesn't contain a group_vars directory then the overrides for the dashboard playbook will be lost (because there's no group_vars in the infrastructure-playbooks directory).
-----------------
$ ansible-playbook -i /tmp/hosts site.yml
PLAY [mons] *******************************************
TASK [ceph-prometheus : prometheus_port variable] *****
"msg": 9099
}
PLAY [grafana-server] *********************************
TASK [ceph-prometheus : prometheus_port variable] *****
ok: [rhcs4-installer] => {
"msg": 9092
}
-----------------
So we need to change either the dashboard.yml file location (not under infrastructure-playbooks directory) or duplicate that code in that playbook in both site and site-container playbooks.
@Paul Could you confirm that there's no group_vars directory in the inventory file directory ?
[1] https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html#ansible-variable-precedence
In my case the inventory is in /ussr/share/ansible-runner-service/inventory/hosts, and group_vars and host_vars are coming from /usr/share/ceph-ansible as normal. from the cli I was running the site.yml play as follows; cd /usr/share/ceph-ansible ansible-playbook -i /usr/share/ansible-runner-service/inventory/hosts site.yml group_vars and host_vars are within /usr/share/ceph-ansible Would you please let us know what tagged version on the stable-4.0 branch contains the complete fixes for this BZ? There's no tag upstream yet. It will be present in v4.0.3 @Ken, shouldn't this BZ be targeted to something else than 4.* ? Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0312 |