Description of problem:
"ceph-deploy calamari connect" is not installing diamond package also failing to start salt-minion service and because of that monitoring is not working
Version-Release number of selected component (if applicable):
Red Hat Ceph Storage 1.3.1
Always (100% )
Steps to Reproduce:
1. Run ceph-deploy calamari connect --master <FQDN calamari node> <node1> ... <nodeN>
2. check ceph-deploy log it wont install diamond package.
3. It will install salt-minion but salt-minion service will not be running in osd and mon nodes
We need to manually install diamond packages and start the salt-minion and diamond services manually and then we need to check "salt-key -L" and if keys are not accepted then we need to run :
salt-key -a <hostname of minion could be osd or mon>
ceph-deploy should install diamond package and start salt-minion and daimond package in all the nodes
Tried this on RHEL7.2 with 0.94.5-15.el7cp.x86_64.
Diamond packages are properly installed and salt-minions services are running but failed to start diamond process hence calamari was not generating the graphs.
Followed the workaround from https://bugzilla.redhat.com/show_bug.cgi?id=1310829 and diamond process started , everything works fine.
Hence movinf this bug back to assigned.
my concern is that https://github.com/ceph/calamari/blob/master/salt/srv/salt/diamond.sls should be taking care of this when ceph-deploy calamari connect runs
so that upstream patch to ceph-deploy is evidence that we need to investigate why that salt in calamari isn't running successfully
Andrew will be reproducing, if we already have that please share the details
I've opened a PR upstream to address this: https://github.com/ceph/calamari/pull/488
I think you mean accept the salt-minion keys AFTER running ceph-deploy calamari connect.
That is the whole point of this command -- to install salt-minion and configure it to know where calamari is running. AFTER running it go to the calamari web UI and accept the new nodes in the manage tab.
(In reply to Gregory Meno from comment #18)
> I think you mean accept the salt-minion keys AFTER running ceph-deploy
> calamari connect.
Yes, sorry. You'll want to run ceph-deploy calamari connect and then accept the new nodes in the web UI. The salt provided by calamari will then install diamond and start it.
Perhaps this ticket could be resolved by updating the docs with what to expect from the ceph-deploy calamari connect command and detailing what needs to happen after that command? If these docs already exist, I've not been able to find them.
Hi Andrew, Greg,
I reran the calamari connect with the steps as mentioned by Andrew.
1. yum remove diamond and salt on all the nodes.
2. Ran calamari connect on the master.
3. accepted the keys on the WebGUI.
The status now was:
Diamond was not started on any node, salt-minion was started.
Subject: Unit diamond.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- Unit diamond.service has begun starting up.
Sep 14 04:47:10 magna105 diamond: Starting diamond: ERROR: Config file: /etc/diamond/diamond.conf does not exist.
Sep 14 04:47:10 magna105 diamond: Usage: diamond [options]
Sep 14 04:47:10 magna105 diamond: Options:
Sep 14 04:47:10 magna105 diamond: -h, --help show this help message and exit
Sep 14 04:47:10 magna105 diamond: -c CONFIGFILE, --configfile=CONFIGFILE
Sep 14 04:47:10 magna105 diamond: config file
Sep 14 04:47:10 magna105 diamond: -f, --foreground run in foreground
Sep 14 04:47:10 magna105 diamond: -l, --log-stdout log to stdout
Sep 14 04:47:10 magna105 diamond: -p PIDFILE, --pidfile=PIDFILE
Sep 14 04:47:10 magna105 diamond: pid file
Sep 14 04:47:10 magna105 diamond: -r COLLECTOR, --run=COLLECTOR
Sep 14 04:47:10 magna105 diamond: run a given collector once and exit
Sep 14 04:47:10 magna105 diamond: -v, --version display the version and exit
Sep 14 04:47:10 magna105 diamond: --skip-pidfile Skip creating PID file
Sep 14 04:47:10 magna105 diamond: -u USER, --user=USER Change to specified unprivilegd user
Sep 14 04:47:10 magna105 diamond: -g GROUP, --group=GROUP
Sep 14 04:47:10 magna105 diamond: Change to specified unprivilegd group
Sep 14 04:47:10 magna105 diamond: --skip-change-user Skip changing to an unprivilegd user
Sep 14 04:47:10 magna105 diamond: --skip-fork Skip forking (damonizing) process
Sep 14 04:47:10 magna105 diamond: [17B blob data]
Sep 14 04:47:10 magna105 systemd: PID file /var/run/diamond.pid not readable (yet?) after start.
Sep 14 04:47:10 magna105 systemd: Failed to start LSB: System statistics collector for Graphite.
4. Ran the salt '*' highstate on the master node.
Diamond now started.
Does the "/etc/diamond/diamon.conf" file get when the "salt * highstate" command is run?
If so we need to document these sequence of steps.
Yes, it is the salt provided by calamari-server that ensures diamond is installed, places it's diamond.conf file and starts the diamond service. You should not need to run that manually though, once the salt-minions are installed and the keys are accepted they should take care of it all.
I logged onto magna105 and noticed a few things I have questions about.
1) on magna105 I noticed that calamari-server was not installed, did you install calamari-server on all nodes? calamari-server needs to be installed on all nodes, not just the master
2) did you verify with ``salt-key -L`` on the master node that the keys were actually accepted after doing so through the web UI?
3) which nodes is your master node? what other nodes were you using in this test?
4) when removing diamond before this test did you make sure that ``/var/lock/subsys/diamond`` was removed from all nodes as well?
5) what was the exact ceph-deploy calamari connect command that you ran?
Another thing to mention is that once the salt minions are connected to the master it will take a minute or so for the minions to respond and get everything installed.
I've figured out that the nodes you used for this test were:
I'm going to take these nodes today and try to recreate.
I followed a similar set of steps from what you mentioned. However in addition I also did these:
1. yum remove salt on the master. This removes calamari-client, calamari-server, salt, salt-master, salt-minion
2. rm -rf /etc/salt so that all the minion files and the keys are deleted.
And then did the same set of steps, worked great.
I will be moving this bug to Doc.
on further consideration no doc changes are needed.
The existing workflow from 1.3.2 should work.
Moving the bug to Verified state.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.