Description of problem: If the GUI detects that the target for the metrics is the same as the host being used for the installation, it attempts to defined the prometheus_port = 9095 in all.yml, to avoid a port conflict between prometheus and the cockpit UI. This override no longer works. Version-Release number of selected component (if applicable): cockpit-ceph-installer-0.9-1 ceph-ansible-4.0.0-0.1.rc16.el8cp.noarch How reproducible: 100% Steps to Reproduce: 1. install rhcs, with the metrics host the same as the installation host 2. 3. Actual results: prometheus fails to start dashboard settings are incorrect dashboard/grafaana integration fails - shows with no data or errors grafana datasource is incorrect Expected results: the prometheus configuration should work correctly Additional info: upstream issue raised - https://github.com/ceph/ceph-ansible/issues/4601
Please specify the severity of this bug. Severity is defined here: https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.
By embedding the tasks from dashboard.yml into site.yml, the override for prometheus_port works - from the CLI (ansible-playbook site.yml) and ansble-runner (GUI) In addition, I'm also seeing the grafana_server_addr not being set as expected In my environment I have an installer machine - where ansible is running from, which is also the target for the grafana and prometheus containers. At the end of the play the containers are running from the installer machine, but the settings applied to ceph and the datasource defined to grafana all point to the mgr not the installer host? Have attached my group_vars and inventory
Created attachment 1626180 [details] inventory file
Created attachment 1626181 [details] group vars - all.yml
In my test environment, I'm using two machines - and installer, and an all-in-one node for all ceph daemons. The installer is used for the grafana-server group. Looking at the ceph config keys [root@rhcs4-aio ~]# ceph config get mgr.rhcs4-aio mgr/dashboard/PROMETHEUS_API_HOST [root@rhcs4-aio ~]# ceph config get mgr.rhcs4-aio mgr/dashboard/GRAFANA_API_URL http://10.90.90.165:3000/ In my case 10.90.90.165 is the IP for the host running ceph mgr, 10.90.90.163 is actually where the prometheus & grafana containers are deployed to from the rhcs4-aio box (.165), the rhcs4-installer name resolves to .163 correctly As a consequence, the grafana dashboard integration is also broken
Other relevant versions sh-4.4# rpm -q ansible ansible-2.8.3-1.el8ae.noarch sh-4.4# rpm -q ansible-runner ansible-runner-1.3.4-2.el8ar.noarch
> Looking at the ceph config keys > [root@rhcs4-aio ~]# ceph config get mgr.rhcs4-aio mgr/dashboard/PROMETHEUS_API_HOST The prometheus api host key has been added by [1] and present since v4.0.0 > [root@rhcs4-aio ~]# ceph config get mgr.rhcs4-aio mgr/dashboard/GRAFANA_API_URL > http://10.90.90.165:3000/ This was fixed by [2] and present since v4.0.0 Expect those two issues, the prometheus_port override should already work in rc16. I'll try to do a deployment using rc16. [1] https://github.com/ceph/ceph-ansible/commit/74ab59c4f33d534cfbca4055c1f494a670be40e2 [2] https://github.com/ceph/ceph-ansible/commit/9bb11c7b2a17db56cfcd7284d2190af36e17bba6
The prometheus_port override works for me with rc16 1 installer node with grafana/prometheus stack and running ceph-ansible 1 aio node with mon/mgr/osd/rgw/mds $ ansible --version ansible 2.8.3 $ grep prometheus_port group_vars/all.yml prometheus_port: 9095 $ sudo ss -lntup|grep prometheus tcp LISTEN 0 128 :::9095 :::* users:(("prometheus",pid=15481,fd=7))
Good to know - thanks for testing. Is this rhel8 with python3? My ansible version on rhel8 is 2.8.5 [root@rhcs4-installer inventory]# ansible --version ansible 2.8.5 config file = /etc/ansible/ansible.cfg configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python3.6/site-packages/ansible executable location = /usr/bin/ansible python version = 3.6.8 (default, Apr 3 2019, 17:26:03) [GCC 8.2.1 20180905 (Red Hat 8.2.1-3)] [root@rhcs4-installer inventory]# rpm -q ansible ansible-2.8.5-2.el8ae.noarch
I tried 4.02 from brew, and can confirm the issues with the prometheus/grafana definitions is resolved - the grafana integration is working as expected. The prometheus_port though didn't take my override, but given the default is now 9092 I didn't have a port clash.
> Is this rhel8 with python3? My ansible version on rhel8 is 2.8.5 No it was on CentOS 7 with python2 but it doesn't change anything. I also tried with ansible 2.8.5 as well with success. > The prometheus_port though didn't take my override, but given the default is now 9092 I didn't have a port clash. Could you share how you run the ansible-playbook command ? (ie: where is located the inventory file, [group|host]_vars directory, etc...)
Ok so the import_playbook doesn't use the [group|host]_vars the same way depending on the location of the ansible inventory. the group_vars directory is used when it's present (there's more scenarios based on [1]): - in the same directory than the inventory file - in the same directory than the playbook file When the inventory file is in the ceph-ansible directory then everything works perfectly (that's what I'm always using). ----------------- $ grep prometheus_port group_vars/all.yml prometheus_port: 9099 $ ansible-playbook -i hosts site.yml PLAY [mons] ******************************************* TASK [ceph-prometheus : prometheus_port variable] ***** ok: [rhcs4-aio] => { "msg": 9099 } PLAY [grafana-server] ********************************* TASK [ceph-prometheus : prometheus_port variable] ***** ok: [rhcs4-installer] => { "msg": 9099 } ----------------- But if the inventory file directory doesn't contain a group_vars directory then the overrides for the dashboard playbook will be lost (because there's no group_vars in the infrastructure-playbooks directory). ----------------- $ ansible-playbook -i /tmp/hosts site.yml PLAY [mons] ******************************************* TASK [ceph-prometheus : prometheus_port variable] ***** "msg": 9099 } PLAY [grafana-server] ********************************* TASK [ceph-prometheus : prometheus_port variable] ***** ok: [rhcs4-installer] => { "msg": 9092 } ----------------- So we need to change either the dashboard.yml file location (not under infrastructure-playbooks directory) or duplicate that code in that playbook in both site and site-container playbooks. @Paul Could you confirm that there's no group_vars directory in the inventory file directory ? [1] https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html#ansible-variable-precedence
In my case the inventory is in /ussr/share/ansible-runner-service/inventory/hosts, and group_vars and host_vars are coming from /usr/share/ceph-ansible as normal. from the cli I was running the site.yml play as follows; cd /usr/share/ceph-ansible ansible-playbook -i /usr/share/ansible-runner-service/inventory/hosts site.yml group_vars and host_vars are within /usr/share/ceph-ansible
Would you please let us know what tagged version on the stable-4.0 branch contains the complete fixes for this BZ?
There's no tag upstream yet. It will be present in v4.0.3
@Ken, shouldn't this BZ be targeted to something else than 4.* ?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0312