Created attachment 1456686 [details] playbook_run.log and hosts inventory file. Description of problem: Prometheus based container ceph-metrics installation fails at TASKS 1. TASK [ceph-node-exporter : Open ports for node_exporter] 2. RUNNING HANDLER [ceph-node-exporter : Restart service] Version-Release number of selected component (if applicable): [root@magna025 cephmetrics-ansible]# rpm -qa | grep ansible ansible-2.4.5.0-1.el7ae.noarch cephmetrics-ansible-2.0-1.el7cp.x86_64 ceph-ansible-3.1.0-0.1.rc9.el7cp.noarch [root@magna025 cephmetrics-ansible]# rpm -qa | grep node prometheus-node-exporter-0.15.2-2.el7cp.x86_64 container_name: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/grafana How reproducible: 1/1 Steps to Reproduce: 1. Deploy a 3.1 ceph cluster with bluestore 2. Installed the cephmetrics-ansible on the ceph-ansible node 3. Edit the container name in the main.yml file located at /usr/share/cephmetrics-ansible/roles/ceph-grafana/defaults container_name: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/grafana 4. Add the ceph-grafana node to the ansible hosts inventory file. 5. Run the cephmetrics-ansible -v playbook.yml Actual results: The playbook fails at TASKS 1. TASK [ceph-node-exporter : Open ports for node_exporter] failed: [magna025] (item=9100/tcp) => {"changed": false, "failed": true, "item": "9100/tcp", "msg": "ERROR: Exception caught: ALREADY_ENABLED: '9100:tcp' Permanent and Non-Permanent(immediate) operation"} 2. RUNNING HANDLER [ceph-node-exporter : Restart service] fatal: [magna048]: FAILED! => {"failed": true, "msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'service_name'\n\nThe error appears to have been in '/usr/share/cephmetrics-ansible/roles/ceph-node-exporter/handlers/main.yml': line 2, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n---\n- name: Restart service\n ^ here\n\nexception type: <class 'ansible.errors.AnsibleUndefinedVariable'>\nexception: 'dict object' has no attribute 'service_name'"} Expected results: The playbook should complete successfully without failures. Additional info: As a Workaround tried the following, 1. Re-run the playbook, it succeeds. 2. prometheus-node-exporter.service is running on the cephmetrics-ansible node. 3. Manually restarted the prometheus-node-exporter.service on ceph cluster nodes. The installation succeeded on re-run of the playbook.
Customers/testers shouldn't be modifying any of the files that we're actually shipping. Setting site-specific values should be done using yaml files in /path/to/inventory_dir/group_vars/; could you try that please?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2819