Bug 1598339 - [cephmetrics] Prometheus based container ceph-metrics installation fails
Summary: [cephmetrics] Prometheus based container ceph-metrics installation fails
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Metrics
Version: 3.1
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: 3.1
Assignee: Boris Ranto
QA Contact: Pratik Surve
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-05 07:26 UTC by Madhavi Kasturi
Modified: 2018-09-26 18:23 UTC (History)
6 users (show)

Fixed In Version: cephmetrics-2.0-3.el7cp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-09-26 18:22:08 UTC
Embargoed:


Attachments (Terms of Use)
playbook_run.log and hosts inventory file. (16.41 KB, application/x-gzip)
2018-07-05 07:26 UTC, Madhavi Kasturi
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:2819 0 None None None 2018-09-26 18:23:34 UTC

Description Madhavi Kasturi 2018-07-05 07:26:30 UTC
Created attachment 1456686 [details]
playbook_run.log and hosts inventory file.

Description of problem:
Prometheus based container ceph-metrics installation fails at TASKS
1. TASK [ceph-node-exporter : Open ports for node_exporter]
2. RUNNING HANDLER [ceph-node-exporter : Restart service]

Version-Release number of selected component (if applicable):
[root@magna025 cephmetrics-ansible]# rpm -qa | grep ansible
ansible-2.4.5.0-1.el7ae.noarch
cephmetrics-ansible-2.0-1.el7cp.x86_64
ceph-ansible-3.1.0-0.1.rc9.el7cp.noarch

[root@magna025 cephmetrics-ansible]# rpm -qa | grep node
prometheus-node-exporter-0.15.2-2.el7cp.x86_64

container_name: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/grafana

How reproducible:
1/1

Steps to Reproduce:
1. Deploy a 3.1 ceph cluster with bluestore 
2. Installed the cephmetrics-ansible on the ceph-ansible node
3. Edit the container name in the main.yml file located at /usr/share/cephmetrics-ansible/roles/ceph-grafana/defaults
    container_name: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/grafana
4. Add the ceph-grafana node to the ansible hosts inventory file.
5. Run the cephmetrics-ansible -v playbook.yml

Actual results:
The playbook fails at TASKS 
1. TASK [ceph-node-exporter : Open ports for node_exporter]
failed: [magna025] (item=9100/tcp) => {"changed": false, "failed": true, "item": "9100/tcp", "msg": "ERROR: Exception caught: ALREADY_ENABLED: '9100:tcp' Permanent and Non-Permanent(immediate) operation"}

2. RUNNING HANDLER [ceph-node-exporter : Restart service]
fatal: [magna048]: FAILED! => {"failed": true, "msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'service_name'\n\nThe error appears to have been in '/usr/share/cephmetrics-ansible/roles/ceph-node-exporter/handlers/main.yml': line 2, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n---\n- name: Restart service\n  ^ here\n\nexception type: <class 'ansible.errors.AnsibleUndefinedVariable'>\nexception: 'dict object' has no attribute 'service_name'"}

Expected results:
The playbook should complete successfully without failures.

Additional info:
As a Workaround tried the following, 
1. Re-run the playbook, it succeeds.
2. prometheus-node-exporter.service is running on the cephmetrics-ansible node.
3. Manually restarted the prometheus-node-exporter.service on ceph cluster nodes.

The installation succeeded on re-run of the playbook.

Comment 2 Zack Cerza 2018-07-10 18:22:29 UTC
Customers/testers shouldn't be modifying any of the files that we're actually shipping. Setting site-specific values should be done using yaml files in /path/to/inventory_dir/group_vars/; could you try that please?

Comment 9 errata-xmlrpc 2018-09-26 18:22:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2819


Note You need to log in before you can comment on or make changes to this bug.