Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1598339

Summary: [cephmetrics] Prometheus based container ceph-metrics installation fails
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Madhavi Kasturi <mkasturi>
Component: Ceph-MetricsAssignee: Boris Ranto <branto>
Status: CLOSED ERRATA QA Contact: Pratik Surve <prsurve>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.1CC: branto, ceph-eng-bugs, gmeno, hnallurv, mkasturi, prsurve
Target Milestone: rcKeywords: Reopened
Target Release: 3.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: cephmetrics-2.0-3.el7cp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-26 18:22:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
playbook_run.log and hosts inventory file. none

Description Madhavi Kasturi 2018-07-05 07:26:30 UTC
Created attachment 1456686 [details]
playbook_run.log and hosts inventory file.

Description of problem:
Prometheus based container ceph-metrics installation fails at TASKS
1. TASK [ceph-node-exporter : Open ports for node_exporter]
2. RUNNING HANDLER [ceph-node-exporter : Restart service]

Version-Release number of selected component (if applicable):
[root@magna025 cephmetrics-ansible]# rpm -qa | grep ansible
ansible-2.4.5.0-1.el7ae.noarch
cephmetrics-ansible-2.0-1.el7cp.x86_64
ceph-ansible-3.1.0-0.1.rc9.el7cp.noarch

[root@magna025 cephmetrics-ansible]# rpm -qa | grep node
prometheus-node-exporter-0.15.2-2.el7cp.x86_64

container_name: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/grafana

How reproducible:
1/1

Steps to Reproduce:
1. Deploy a 3.1 ceph cluster with bluestore 
2. Installed the cephmetrics-ansible on the ceph-ansible node
3. Edit the container name in the main.yml file located at /usr/share/cephmetrics-ansible/roles/ceph-grafana/defaults
    container_name: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/grafana
4. Add the ceph-grafana node to the ansible hosts inventory file.
5. Run the cephmetrics-ansible -v playbook.yml

Actual results:
The playbook fails at TASKS 
1. TASK [ceph-node-exporter : Open ports for node_exporter]
failed: [magna025] (item=9100/tcp) => {"changed": false, "failed": true, "item": "9100/tcp", "msg": "ERROR: Exception caught: ALREADY_ENABLED: '9100:tcp' Permanent and Non-Permanent(immediate) operation"}

2. RUNNING HANDLER [ceph-node-exporter : Restart service]
fatal: [magna048]: FAILED! => {"failed": true, "msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'service_name'\n\nThe error appears to have been in '/usr/share/cephmetrics-ansible/roles/ceph-node-exporter/handlers/main.yml': line 2, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n---\n- name: Restart service\n  ^ here\n\nexception type: <class 'ansible.errors.AnsibleUndefinedVariable'>\nexception: 'dict object' has no attribute 'service_name'"}

Expected results:
The playbook should complete successfully without failures.

Additional info:
As a Workaround tried the following, 
1. Re-run the playbook, it succeeds.
2. prometheus-node-exporter.service is running on the cephmetrics-ansible node.
3. Manually restarted the prometheus-node-exporter.service on ceph cluster nodes.

The installation succeeded on re-run of the playbook.

Comment 2 Zack Cerza 2018-07-10 18:22:29 UTC
Customers/testers shouldn't be modifying any of the files that we're actually shipping. Setting site-specific values should be done using yaml files in /path/to/inventory_dir/group_vars/; could you try that please?

Comment 9 errata-xmlrpc 2018-09-26 18:22:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2819