1598339 – [cephmetrics] Prometheus based container ceph-metrics installation fails

Bug 1598339 - [cephmetrics] Prometheus based container ceph-metrics installation fails

Summary: [cephmetrics] Prometheus based container ceph-metrics installation fails

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Metrics
Sub Component:
Version:	3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	3.1
Assignee:	Boris Ranto
QA Contact:	Pratik Surve
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-07-05 07:26 UTC by Madhavi Kasturi
Modified:	2018-09-26 18:23 UTC (History)
CC List:	6 users (show)
Fixed In Version:	cephmetrics-2.0-3.el7cp
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-09-26 18:22:08 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
playbook_run.log and hosts inventory file. (16.41 KB, application/x-gzip) 2018-07-05 07:26 UTC, Madhavi Kasturi	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2018:2819	0	None	None	None	2018-09-26 18:23:34 UTC

Description Madhavi Kasturi 2018-07-05 07:26:30 UTC

Created attachment 1456686 [details]
playbook_run.log and hosts inventory file.

Description of problem:
Prometheus based container ceph-metrics installation fails at TASKS
1. TASK [ceph-node-exporter : Open ports for node_exporter]
2. RUNNING HANDLER [ceph-node-exporter : Restart service]

Version-Release number of selected component (if applicable):
[root@magna025 cephmetrics-ansible]# rpm -qa | grep ansible
ansible-2.4.5.0-1.el7ae.noarch
cephmetrics-ansible-2.0-1.el7cp.x86_64
ceph-ansible-3.1.0-0.1.rc9.el7cp.noarch

[root@magna025 cephmetrics-ansible]# rpm -qa | grep node
prometheus-node-exporter-0.15.2-2.el7cp.x86_64

container_name: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/grafana

How reproducible:
1/1

Steps to Reproduce:
1. Deploy a 3.1 ceph cluster with bluestore 
2. Installed the cephmetrics-ansible on the ceph-ansible node
3. Edit the container name in the main.yml file located at /usr/share/cephmetrics-ansible/roles/ceph-grafana/defaults
    container_name: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/grafana
4. Add the ceph-grafana node to the ansible hosts inventory file.
5. Run the cephmetrics-ansible -v playbook.yml

Actual results:
The playbook fails at TASKS 
1. TASK [ceph-node-exporter : Open ports for node_exporter]
failed: [magna025] (item=9100/tcp) => {"changed": false, "failed": true, "item": "9100/tcp", "msg": "ERROR: Exception caught: ALREADY_ENABLED: '9100:tcp' Permanent and Non-Permanent(immediate) operation"}

2. RUNNING HANDLER [ceph-node-exporter : Restart service]
fatal: [magna048]: FAILED! => {"failed": true, "msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'service_name'\n\nThe error appears to have been in '/usr/share/cephmetrics-ansible/roles/ceph-node-exporter/handlers/main.yml': line 2, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n---\n- name: Restart service\n  ^ here\n\nexception type: <class 'ansible.errors.AnsibleUndefinedVariable'>\nexception: 'dict object' has no attribute 'service_name'"}

Expected results:
The playbook should complete successfully without failures.

Additional info:
As a Workaround tried the following, 
1. Re-run the playbook, it succeeds.
2. prometheus-node-exporter.service is running on the cephmetrics-ansible node.
3. Manually restarted the prometheus-node-exporter.service on ceph cluster nodes.

The installation succeeded on re-run of the playbook.

Comment 2 Zack Cerza 2018-07-10 18:22:29 UTC

Customers/testers shouldn't be modifying any of the files that we're actually shipping. Setting site-specific values should be done using yaml files in /path/to/inventory_dir/group_vars/; could you try that please?

Comment 9 errata-xmlrpc 2018-09-26 18:22:08 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2819

Note You need to log in before you can comment on or make changes to this bug.