Bug 1302721 - "ceph-deploy calamari connect" is not installing diamond package also failing to start salt-minion service
Summary: "ceph-deploy calamari connect" is not installing diamond package also failing...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Ceph-Installer
Version: 1.3.1
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: 1.3.3
Assignee: Andrew Schoen
QA Contact: Tejas
URL:
Whiteboard:
Depends On:
Blocks: 1348597
TreeView+ depends on / blocked
 
Reported: 2016-01-28 13:01 UTC by Vikhyat Umrao
Modified: 2019-12-16 05:19 UTC (History)
15 users (show)

Fixed In Version: RHEL: ceph-deploy-1.5.36-1.el7cp Ubuntu: ceph-deploy_1.5.36-2redhat1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-09-29 12:56:14 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Ceph Project Bug Tracker 16651 None None None 2016-07-12 05:08:09 UTC
Red Hat Knowledge Base (Solution) 2152951 None None None 2016-05-20 04:09:54 UTC
Red Hat Product Errata RHSA-2016:1972 normal SHIPPED_LIVE Moderate: Red Hat Ceph Storage 1.3.3 security, bug fix, and enhancement update 2016-09-29 16:51:21 UTC

Description Vikhyat Umrao 2016-01-28 13:01:28 UTC
Description of problem:
"ceph-deploy calamari connect" is not installing diamond package also failing to start salt-minion service and because of that monitoring is not working 

Version-Release number of selected component (if applicable):
Red Hat Ceph Storage 1.3.1
ceph-0.94.3-3.el7cp.x86_64

salt-2014.1.5-3.el7cp.noarch
salt-master-2014.1.5-3.el7cp.noarch
salt-minion-2014.1.5-3.el7cp.noarch

calamari-server-1.3-11.el7cp.x86_64
calamari-clients-1.3-2.el7cp.x86_64

diamond-3.4.67-4.el7cp.noarch

How reproducible:
Always (100% )

Steps to Reproduce:
1. Run ceph-deploy calamari connect --master <FQDN calamari node> <node1> ... <nodeN>
2. check ceph-deploy log it wont install diamond package.
3. It will install salt-minion but salt-minion service  will not be running in osd and mon nodes 

Actual results:
We need to manually install diamond packages and start the salt-minion and diamond services manually and then we need to check "salt-key -L" and if keys are not accepted then we need to run :

salt-key -a <hostname of minion could be osd or mon>

Expected results:
ceph-deploy should install diamond package and start salt-minion and daimond package in all the nodes

Comment 10 shylesh 2016-09-06 09:49:23 UTC
Tried this on RHEL7.2 with 0.94.5-15.el7cp.x86_64.
Diamond packages are properly installed and salt-minions services are running but failed to start diamond process hence calamari was not generating the graphs.

Followed the workaround from https://bugzilla.redhat.com/show_bug.cgi?id=1310829 and diamond process started , everything works fine.

Hence movinf this bug back to assigned.

Comment 11 Christina Meno 2016-09-07 17:01:54 UTC
my concern is that https://github.com/ceph/calamari/blob/master/salt/srv/salt/diamond.sls should be taking care of this when ceph-deploy calamari connect runs

so that upstream patch to ceph-deploy is evidence that we need to investigate why that salt in calamari isn't running successfully

Andrew will be reproducing, if we already have that please share the details

Comment 12 Andrew Schoen 2016-09-07 19:09:00 UTC
I've opened a PR upstream to address this: https://github.com/ceph/calamari/pull/488

Comment 18 Christina Meno 2016-09-08 21:56:19 UTC
I think you mean accept the salt-minion keys AFTER running ceph-deploy calamari connect.

That is the whole point of this command -- to install salt-minion and configure it to know where calamari is running. AFTER running it go to the calamari web UI and accept the new nodes in the manage tab.

Comment 19 Andrew Schoen 2016-09-09 14:01:17 UTC
(In reply to Gregory Meno from comment #18)
> I think you mean accept the salt-minion keys AFTER running ceph-deploy
> calamari connect.

Yes, sorry. You'll want to run ceph-deploy calamari connect and then accept the new nodes in the web UI. The salt provided by calamari will then install diamond and start it.

Comment 20 Andrew Schoen 2016-09-09 15:16:44 UTC
Perhaps this ticket could be resolved by updating the docs with what to expect from the ceph-deploy calamari connect command and detailing what needs to happen after that command? If these docs already exist, I've not been able to find them.

Comment 21 Tejas 2016-09-14 05:16:08 UTC
Hi Andrew, Greg,

     I reran the calamari connect with the steps as mentioned by Andrew.

1. yum remove diamond and salt on all the nodes.
2. Ran calamari connect on the master.
3. accepted the keys on the WebGUI.

The status now was:
Diamond was not started on any node, salt-minion was started.

Subject: Unit diamond.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit diamond.service has begun starting up.
Sep 14 04:47:10 magna105 diamond[6680]: Starting diamond: ERROR: Config file: /etc/diamond/diamond.conf does not exist.
Sep 14 04:47:10 magna105 diamond[6680]: Usage: diamond [options]
Sep 14 04:47:10 magna105 diamond[6680]: Options:
Sep 14 04:47:10 magna105 diamond[6680]: -h, --help            show this help message and exit
Sep 14 04:47:10 magna105 diamond[6680]: -c CONFIGFILE, --configfile=CONFIGFILE
Sep 14 04:47:10 magna105 diamond[6680]: config file
Sep 14 04:47:10 magna105 diamond[6680]: -f, --foreground      run in foreground
Sep 14 04:47:10 magna105 diamond[6680]: -l, --log-stdout      log to stdout
Sep 14 04:47:10 magna105 diamond[6680]: -p PIDFILE, --pidfile=PIDFILE
Sep 14 04:47:10 magna105 diamond[6680]: pid file
Sep 14 04:47:10 magna105 diamond[6680]: -r COLLECTOR, --run=COLLECTOR
Sep 14 04:47:10 magna105 diamond[6680]: run a given collector once and exit
Sep 14 04:47:10 magna105 diamond[6680]: -v, --version         display the version and exit
Sep 14 04:47:10 magna105 diamond[6680]: --skip-pidfile        Skip creating PID file
Sep 14 04:47:10 magna105 diamond[6680]: -u USER, --user=USER  Change to specified unprivilegd user
Sep 14 04:47:10 magna105 diamond[6680]: -g GROUP, --group=GROUP
Sep 14 04:47:10 magna105 diamond[6680]: Change to specified unprivilegd group
Sep 14 04:47:10 magna105 diamond[6680]: --skip-change-user    Skip changing to an unprivilegd user
Sep 14 04:47:10 magna105 diamond[6680]: --skip-fork           Skip forking (damonizing) process
Sep 14 04:47:10 magna105 diamond[6680]: [17B blob data]
Sep 14 04:47:10 magna105 systemd[1]: PID file /var/run/diamond.pid not readable (yet?) after start.
Sep 14 04:47:10 magna105 systemd[1]: Failed to start LSB: System statistics collector for Graphite.



4. Ran the salt '*' highstate on the master node.
   Diamond now started.

Does the "/etc/diamond/diamon.conf" file get when the "salt * highstate" command is run?
If so we need to document these sequence of steps.


Thanks,
Tejas

Comment 22 Andrew Schoen 2016-09-14 14:17:16 UTC
Tejas,

Yes, it is the salt provided by calamari-server that ensures diamond is installed, places it's diamond.conf file and starts the diamond service. You should not need to run that manually though, once the salt-minions are installed and the keys are accepted they should take care of it all.

I logged onto magna105 and noticed a few things I have questions about.

1) on magna105 I noticed that calamari-server was not installed, did you install calamari-server on all nodes? calamari-server needs to be installed on all nodes, not just the master

2) did you verify with ``salt-key -L`` on the master node that the keys were actually accepted after doing so through the web UI?

3) which nodes is your master node? what other nodes were you using in this test?

4) when removing diamond before this test did you make sure that ``/var/lock/subsys/diamond`` was removed from all nodes as well?

5) what was the exact ceph-deploy calamari connect command that you ran?

Another thing to mention is that once the salt minions are connected to the master it will take a minute or so for the minions to respond and get everything installed.

Comment 23 Andrew Schoen 2016-09-14 14:44:28 UTC
Tejas,

I've figured out that the nodes you used for this test were:

magna104.ceph.redhat.com
magna105.ceph.redhat.com
magna107.ceph.redhat.com
magna108.ceph.redhat.com

I'm going to take these nodes today and try to recreate.

Comment 27 Tejas 2016-09-15 07:09:03 UTC
hi Andrew, 

I followed a similar set of steps from what you mentioned. However in addition I also did these:
1. yum remove salt on the master. This removes calamari-client, calamari-server, salt, salt-master, salt-minion
2. rm -rf /etc/salt  so that all the minion files and the keys are deleted.

And then did the same set of steps, worked great.

I will be moving this bug to Doc.

Thanks,
Tejas

Comment 28 Tejas 2016-09-15 13:10:02 UTC
on further consideration no doc changes are needed.
The existing workflow from 1.3.2 should work.

Moving the bug to Verified state.

Thanks,
Tejas

Comment 32 errata-xmlrpc 2016-09-29 12:56:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-1972.html


Note You need to log in before you can comment on or make changes to this bug.