Bug 1378826
| Summary: | 1.3.3: Diamond service is not started in one of the OSD nodes in RC Build | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Ramakrishnan Periyasamy <rperiyas> |
| Component: | Calamari | Assignee: | Andrew Schoen <aschoen> |
| Calamari sub component: | Back-end | QA Contact: | ceph-qe-bugs <ceph-qe-bugs> |
| Status: | CLOSED WONTFIX | Docs Contact: | Bara Ancincova <bancinco> |
| Severity: | medium | ||
| Priority: | unspecified | CC: | anharris, aschoen, asriram, ceph-eng-bugs, gmeno, hnallurv, kdreyer, rperiyas |
| Version: | 1.3.3 | ||
| Target Milestone: | rc | ||
| Target Release: | 1.3.4 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Known Issue | |
| Doc Text: |
.The "diamond" service sometimes does not start
After installing Red Hat Ceph Storage and connecting the Ceph nodes to the Calamari server, in some cases, the `diamond` service does not start on certain Ceph nodes despite the `diamond` package being installed. As a consequence, graphs in the Calamari UI are not generated for such nodes.
To work around this issue, run the following command from the administration node as root or with `sudo`:
----
salt '*' state.highstate
----
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-02-20 20:50:50 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1372735 | ||
|
Description
Ramakrishnan Periyasamy
2016-09-23 10:41:33 UTC
Andrew would you please take a look at this setup today? (In reply to Gregory Meno from comment #5) > Andrew would you please take a look at this setup today? Yeah, I can take a look. I was able to get diamond started on all nodes and reporting to the web UI by doing the following steps: On the Admin/Calamari node: - unauthorized all nodes with `sudo salt-key -D -y` On all the OSD and MON nodes: - `sudo yum remove diamond salt-minion` - `sudo rm -rf /etc/salt` - `sudo rm /var/lock/subsys/diamond` On the Admin/Calamari node: - Did ceph-deploy calamari connect for all nodes in cluster - Logged into the Web UI and accepted the keys I've left the cluster in this working state for you to inspect and verify. I'm not exactly sure why diamond did not start on one of the OSDs before but I suspect it was because that node might not have been properly cleaned up from previous tests. I have installed ceph and calamari in new setup(all servers are re-imaged), still observing the same problem. After accepting keys in calamari UI diamond service is not starting in some nodes. How Reproducible: 2/2 This time diamond service not started in 3 servers(1 MON and 2 OSD machines). Andrew have taken a look on this setup. I took a look at the nodes where diamond did not start and found this in the salt logs.
Sep 29 15:14:21 magna107 salt-minion[30739]: [WARNING ] The minion function caused an exception
Sep 29 15:14:21 magna107 salt-minion[30739]: Traceback (most recent call last):
Sep 29 15:14:21 magna107 salt-minion[30739]: File "/usr/lib/python2.7/site-packages/salt/minion.py", line 796, in _thread_return
Sep 29 15:14:21 magna107 salt-minion[30739]: return_data = func(*args, **kwargs)
Sep 29 15:14:21 magna107 salt-minion[30739]: File "/usr/lib/python2.7/site-packages/salt/modules/state.py", line 275, in highstate
Sep 29 15:14:21 magna107 salt-minion[30739]: force=kwargs.get('force', False)
Sep 29 15:14:21 magna107 salt-minion[30739]: File "/usr/lib/python2.7/site-packages/salt/state.py", line 2497, in call_highstate
Sep 29 15:14:21 magna107 salt-minion[30739]: self.load_dynamic(matches)
Sep 29 15:14:21 magna107 salt-minion[30739]: File "/usr/lib/python2.7/site-packages/salt/state.py", line 2081, in load_dynamic
Sep 29 15:14:21 magna107 salt-minion[30739]: refresh=False)
Sep 29 15:14:21 magna107 salt-minion[30739]: File "/usr/lib/python2.7/site-packages/salt/modules/saltutil.py", line 343, in sync_all
Sep 29 15:14:21 magna107 salt-minion[30739]: ret['modules'] = sync_modules(saltenv, False)
Sep 29 15:14:21 magna107 salt-minion[30739]: File "/usr/lib/python2.7/site-packages/salt/modules/saltutil.py", line 228, in sync_modules
Sep 29 15:14:21 magna107 salt-minion[30739]: ret = _sync('modules', saltenv)
Sep 29 15:14:21 magna107 salt-minion[30739]: File "/usr/lib/python2.7/site-packages/salt/modules/saltutil.py", line 82, in _sync
Sep 29 15:14:21 magna107 salt-minion[30739]: os.makedirs(mod_dir)
Sep 29 15:14:21 magna107 salt-minion[30739]: File "/usr/lib64/python2.7/os.py", line 157, in makedirs
Sep 29 15:14:21 magna107 salt-minion[30739]: mkdir(name, mode)
Sep 29 15:14:21 magna107 salt-minion[30739]: OSError: [Errno 17] File exists: '/var/cache/salt/minion/extmods/modules'
This tracebook looks identical to one found in this upstream issue: http://tracker.ceph.com/issues/8780#note-1
This seems like a salt issue to me. I'd recommend adding documentation to note that running "salt '*' state.highstate" from the admin node as a workaround when this situation occurs. Also just restarting the salt-minion on the affected nodes seems to fix the issue.
QE ran "salt '*' state.highstate" from admin node and it resolved the issue This BZ needs to be release noted for 1.3.3 with the workaround mentioned above. Doc text looks good The doc text looks good to me as well. Looks like it got added to the 1.3.3 release notes |