Bug 1378826 - 1.3.3: Diamond service is not started in one of the OSD nodes in RC Build
Summary: 1.3.3: Diamond service is not started in one of the OSD nodes in RC Build
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Calamari
Version: 1.3.3
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: rc
: 1.3.4
Assignee: Andrew Schoen
QA Contact: ceph-qe-bugs
Bara Ancincova
URL:
Whiteboard:
Depends On:
Blocks: 1372735
TreeView+ depends on / blocked
 
Reported: 2016-09-23 10:41 UTC by Ramakrishnan Periyasamy
Modified: 2018-02-20 20:50 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
.The "diamond" service sometimes does not start After installing Red Hat Ceph Storage and connecting the Ceph nodes to the Calamari server, in some cases, the `diamond` service does not start on certain Ceph nodes despite the `diamond` package being installed. As a consequence, graphs in the Calamari UI are not generated for such nodes. To work around this issue, run the following command from the administration node as root or with `sudo`: ---- salt '*' state.highstate ----
Clone Of:
Environment:
Last Closed: 2018-02-20 20:50:50 UTC
Embargoed:


Attachments (Terms of Use)

Description Ramakrishnan Periyasamy 2016-09-23 10:41:33 UTC
Description of problem:

Diamond service is not started in one of the OSD node in 1.3.3 RC Build. On all the other ceph nodes it was running.

After running "salt '*' state.highstate" command from admin node, diamond service was running in all nodes and in Calamari UI graphs were generated.

Version-Release number of selected component (if applicable):
diamond-3.4.67-4.el7cp.noarch

How reproducible:
1/1

Steps to Reproduce:
1. Install Ceph and Calamari
2. Did ceph-deploy calamari connect for all nodes in cluster from admin node
3. Logged into Calamari UI and accepted the keys
4. Diamond service was not started in one of the OSD node out of total 6 nodes(3 MON and 3 OSD) but diamond package got installed.
5. Graphs are not generated for that particular node in calamari UI

Actual results:
Diamond service not started on one of the osd nodes

Expected results:
Diamond service should be started in all cluster nodes

Additional info:
Tried following work around:
   Run "salt '*' state.highstate" command from admin node. After this, diamond service was running in all nodes and in Calamari UI, graphs were generated.

Comment 5 Christina Meno 2016-09-23 17:12:55 UTC
Andrew would you please take a look at this setup today?

Comment 6 Andrew Schoen 2016-09-23 18:24:40 UTC
(In reply to Gregory Meno from comment #5)
> Andrew would you please take a look at this setup today?

Yeah, I can take a look.

Comment 7 Andrew Schoen 2016-09-23 21:32:06 UTC
I was able to get diamond started on all nodes and reporting to the web UI by doing the  following steps:

On the Admin/Calamari node:

- unauthorized all nodes with `sudo salt-key -D -y`

On all the OSD and MON nodes:

- `sudo yum remove diamond salt-minion`
- `sudo rm -rf /etc/salt`
- `sudo rm /var/lock/subsys/diamond`

On the Admin/Calamari node:

- Did ceph-deploy calamari connect for all nodes in cluster
- Logged into the Web UI and accepted the keys

I've left the cluster in this working state for you to inspect and verify. I'm not exactly sure why diamond did not start on one of the OSDs before but I suspect it was because that node might not have been properly cleaned up from previous tests.

Comment 9 Ramakrishnan Periyasamy 2016-09-29 17:01:35 UTC
I have installed ceph and calamari in new setup(all servers are re-imaged), still observing the same problem. After accepting keys in calamari UI diamond service is not starting in some nodes.

How Reproducible:
2/2

This time diamond service not started in 3 servers(1 MON and 2 OSD machines).
Andrew have taken a look on this setup.

Comment 10 Andrew Schoen 2016-09-29 17:17:25 UTC
I took a look at the nodes where diamond did not start and found this in the salt logs.

Sep 29 15:14:21 magna107 salt-minion[30739]: [WARNING ] The minion function caused an exception
Sep 29 15:14:21 magna107 salt-minion[30739]: Traceback (most recent call last):
Sep 29 15:14:21 magna107 salt-minion[30739]: File "/usr/lib/python2.7/site-packages/salt/minion.py", line 796, in _thread_return
Sep 29 15:14:21 magna107 salt-minion[30739]: return_data = func(*args, **kwargs)
Sep 29 15:14:21 magna107 salt-minion[30739]: File "/usr/lib/python2.7/site-packages/salt/modules/state.py", line 275, in highstate
Sep 29 15:14:21 magna107 salt-minion[30739]: force=kwargs.get('force', False)
Sep 29 15:14:21 magna107 salt-minion[30739]: File "/usr/lib/python2.7/site-packages/salt/state.py", line 2497, in call_highstate
Sep 29 15:14:21 magna107 salt-minion[30739]: self.load_dynamic(matches)
Sep 29 15:14:21 magna107 salt-minion[30739]: File "/usr/lib/python2.7/site-packages/salt/state.py", line 2081, in load_dynamic
Sep 29 15:14:21 magna107 salt-minion[30739]: refresh=False)
Sep 29 15:14:21 magna107 salt-minion[30739]: File "/usr/lib/python2.7/site-packages/salt/modules/saltutil.py", line 343, in sync_all
Sep 29 15:14:21 magna107 salt-minion[30739]: ret['modules'] = sync_modules(saltenv, False)
Sep 29 15:14:21 magna107 salt-minion[30739]: File "/usr/lib/python2.7/site-packages/salt/modules/saltutil.py", line 228, in sync_modules
Sep 29 15:14:21 magna107 salt-minion[30739]: ret = _sync('modules', saltenv)
Sep 29 15:14:21 magna107 salt-minion[30739]: File "/usr/lib/python2.7/site-packages/salt/modules/saltutil.py", line 82, in _sync
Sep 29 15:14:21 magna107 salt-minion[30739]: os.makedirs(mod_dir)
Sep 29 15:14:21 magna107 salt-minion[30739]: File "/usr/lib64/python2.7/os.py", line 157, in makedirs
Sep 29 15:14:21 magna107 salt-minion[30739]: mkdir(name, mode)
Sep 29 15:14:21 magna107 salt-minion[30739]: OSError: [Errno 17] File exists: '/var/cache/salt/minion/extmods/modules'

This tracebook looks identical to one found in this upstream issue: http://tracker.ceph.com/issues/8780#note-1

This seems like a salt issue to me. I'd recommend adding documentation to note that running "salt '*' state.highstate" from the admin node as a workaround when this situation occurs. Also just restarting the salt-minion on the affected nodes seems to fix the issue.

Comment 11 Harish NV Rao 2016-09-29 17:32:47 UTC
QE ran "salt '*' state.highstate" from admin node and it resolved the issue

This BZ needs to be release noted for 1.3.3 with the workaround mentioned above.

Comment 13 Ramakrishnan Periyasamy 2016-09-30 11:24:00 UTC
Doc text looks good

Comment 14 Andrew Schoen 2016-09-30 13:36:58 UTC
The doc text looks good to me as well.

Comment 15 Christina Meno 2017-05-01 14:03:58 UTC
Looks like it got added to the 1.3.3 release notes


Note You need to log in before you can comment on or make changes to this bug.