Bug 1322907 - Calamari is not started after Mon install/configure
Summary: Calamari is not started after Mon install/configure
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Calamari
Version: 2.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: rc
: 2.0
Assignee: Christina Meno
QA Contact: Rachana Patel
Depends On:
Blocks: 1291304
TreeView+ depends on / blocked
Reported: 2016-03-31 15:36 UTC by Nishanth Thomas
Modified: 2016-08-23 19:35 UTC (History)
11 users (show)

Fixed In Version: calamari-server-1.4.0-0.5.rc8.el7cp
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2016-08-23 19:35:14 UTC
Target Upstream Version:

Attachments (Terms of Use)
/var/log/message from ceph-installer node (864.32 KB, text/plain)
2016-03-31 17:42 UTC, Shubhendu Tripathi
no flags Details
Ceph-installer logs (1.46 MB, text/plain)
2016-04-13 17:06 UTC, Nishanth Thomas
no flags Details

System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1755 0 normal SHIPPED_LIVE Red Hat Ceph Storage 2.0 bug fix and enhancement update 2016-08-23 23:23:52 UTC

Description Nishanth Thomas 2016-03-31 15:36:30 UTC
ceph installer is not starting the calamari lite after mon install/configure

Comment 2 Christina Meno 2016-03-31 16:24:20 UTC

Can you provide me credentials to a machine where this is happening?
or at a minimum provide me the output / error log of 
sudo calamari-ctl initialize

Comment 3 Shubhendu Tripathi 2016-03-31 17:42:44 UTC
Created attachment 1142319 [details]
/var/log/message from ceph-installer node

Comment 4 Shubhendu Tripathi 2016-03-31 17:47:47 UTC
Attached /var/log/messages content from the ceph-installer node where tried executing curl commands. Truncated top lines which were older ones..

1. /api/mon/install/
2. /api/osd/install/
3. /api/mon/configure/
4. /api/osd/configure

I can see some messages regarding calamari-ctl but nothing for "supervisorctl restart" kind.

You can refer the setup at (ceph-installer node and ran curl commands from this only. credentials root/redhat)

mon node:
osd node:

Comment 5 Christina Meno 2016-03-31 19:33:27 UTC
same root cause as https://bugzilla.redhat.com/show_bug.cgi?id=1322905

Comment 6 Christina Meno 2016-03-31 19:34:15 UTC

*** This bug has been marked as a duplicate of bug 1322905 ***

Comment 7 Christina Meno 2016-04-01 21:35:07 UTC
There is an additional issue here:
calamari-ctl initialize still calls salt-call which means it required salt-minion and salt-common at least.

So a workaround is to just re-run calamari-ctl initialize after the storage console agent gets bootstrapped.

The longer term fix is to rework calamari-ctl initialize to now need salt. there should be code upstream to handle this.

Comment 8 Nishanth Thomas 2016-04-04 10:16:41 UTC
calamari-ctl initialize is called after bootstrapping is done through ceph-installer. The flow is something like this:

1. setup the ansible communication(bootstrap ansible)
2. Install and configure the agent packages
3. Install MON. Calmari gets installed as part of this
4. Configure MON. Ideally calamari-ctl initialize and supervisor d restart and supervisorctl restart all should have happen before this.

In this way 4th step done after bootstrapping the agent. So why it is not started properly as part of MON configure?

Comment 9 Christina Meno 2016-04-05 22:26:12 UTC
ok I've got it reproduced.

calamari is crashing because mon_remote isn't getting proper ceph cluster maps.
This wasn't a problem when we scheduled with salt.
I need to make mon_remote tolerate broken clusters better.

This is the error I'm seeing:
Traceback (most recent call last):
  File "/opt/calamari/venv/lib/python2.7/site-packages/gevent/greenlet.py", line 327, in run
    result = self._run(*self.args, **self.kwargs)
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_common-0.1-py2.7.egg/calamari_common/remote/mon_remote.py", line 798, in _run
    server_heartbeat, cluster_heartbeat = get_heartbeats()
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_common-0.1-py2.7.egg/calamari_common/remote/mon_remote.py", line 577, in get_heartbeats
    cluster_heartbeat[fsid] = cluster_status(cluster_handle, fsid_names[fsid])
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_common-0.1-py2.7.egg/calamari_common/remote/mon_remote.py", line 644, in cluster_status
    mds_epoch = status['mdsmap']['epoch']
KeyError: 'mdsmap'
<MsgGenerator at 0x2e060f0> failed with KeyError

Comment 10 Christina Meno 2016-04-05 22:27:56 UTC
just so you know Nishanth.

calamari-ctl initialize happens on mon configure.

Comment 11 Christina Meno 2016-04-05 22:57:33 UTC
would you please pm_ack this

Comment 13 Christina Meno 2016-04-11 22:20:51 UTC
This is fixed when you boot-strap the redhat storage console agent first
like described in https://bugzilla.redhat.com/show_bug.cgi?id=1322907#c8

I'm going to open another bz about calamari-ctl initialize requiring salt-minion to work.

Comment 14 Tamil 2016-04-13 16:15:41 UTC
Nishanth, can you please update tyhe bug with the steps you used to reproduce the bug and any logs from your side would be helpful too.

Comment 15 Nishanth Thomas 2016-04-13 16:59:12 UTC
 curl -d "{\"calamari\": true, \"hosts\": [\"dhcp46-139.lab.eng.blr.redhat.com\"],\"redhat_storage\":false,\"redhat_use_cdn\":true}" http://dhcp46-65.lab.eng.blr.redhat.com:8181/api/mon/install/

curl -d "{\"calamari\": true, \"host\": \"dhcp46-139.lab.eng.blr.redhat.com\", \"fsid\": \"deedcb4c-a67a-4997-93a6-92149ad2622a\", \"interface\": \"eth0\", \"monitor_secret\": \"AQA7P8dWAAAAABAAH/tbiZQn/40Z8pr959UmEA==\", \"cluster_network\": \"\", \"public_network\": \"\", \"redhat_storage\": false}" http://dhcp46-65.lab.eng.blr.redhat.com:8181/api/mon/configure/

 supervisorctl status
calamari-lite                    FATAL      Exited too quickly (process log may have details)

Nothing in calamari logs

Comment 16 Nishanth Thomas 2016-04-13 17:06:35 UTC
Created attachment 1146916 [details]
Ceph-installer logs

Comment 17 Christina Meno 2016-04-14 03:06:07 UTC
/var/log/messages seem to me like we're trying to run calamari-ctl initialze before salt is running. I'll try to reproduce, the fix is easy if this is the problem.

Comment 20 Ken Dreyer (Red Hat) 2016-05-09 18:47:37 UTC
Looks like this was a calamari bug after all. Moving back to the RH Ceph product.

Comment 22 errata-xmlrpc 2016-08-23 19:35:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.