Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1322907 - Calamari is not started after Mon install/configure
Calamari is not started after Mon install/configure
Status: CLOSED ERRATA
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Calamari (Show other bugs)
2.0
Unspecified Unspecified
urgent Severity urgent
: rc
: 2.0
Assigned To: Gregory Meno
Rachana Patel
: Reopened
Depends On:
Blocks: 1291304
  Show dependency treegraph
 
Reported: 2016-03-31 11:36 EDT by Nishanth Thomas
Modified: 2016-08-23 15:35 EDT (History)
11 users (show)

See Also:
Fixed In Version: calamari-server-1.4.0-0.5.rc8.el7cp
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-08-23 15:35:14 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
/var/log/message from ceph-installer node (864.32 KB, text/plain)
2016-03-31 13:42 EDT, Shubhendu Tripathi
no flags Details
Ceph-installer logs (1.46 MB, text/plain)
2016-04-13 13:06 EDT, Nishanth Thomas
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1755 normal SHIPPED_LIVE Red Hat Ceph Storage 2.0 bug fix and enhancement update 2016-08-23 19:23:52 EDT

  None (edit)
Description Nishanth Thomas 2016-03-31 11:36:30 EDT
ceph installer is not starting the calamari lite after mon install/configure
Comment 2 Gregory Meno 2016-03-31 12:24:20 EDT
Nishanth,

Can you provide me credentials to a machine where this is happening?
or at a minimum provide me the output / error log of 
sudo calamari-ctl initialize
?
Comment 3 Shubhendu Tripathi 2016-03-31 13:42 EDT
Created attachment 1142319 [details]
/var/log/message from ceph-installer node
Comment 4 Shubhendu Tripathi 2016-03-31 13:47:47 EDT
Attached /var/log/messages content from the ceph-installer node where tried executing curl commands. Truncated top lines which were older ones..

1. /api/mon/install/
2. /api/osd/install/
3. /api/mon/configure/
4. /api/osd/configure

I can see some messages regarding calamari-ctl but nothing for "supervisorctl restart" kind.

You can refer the setup at 10.70.47.73 (ceph-installer node and ran curl commands from this only. credentials root/redhat)

mon node: 10.70.46.204
osd node: 10.70.46.150
Comment 5 Gregory Meno 2016-03-31 15:33:27 EDT
same root cause as https://bugzilla.redhat.com/show_bug.cgi?id=1322905
Comment 6 Gregory Meno 2016-03-31 15:34:15 EDT

*** This bug has been marked as a duplicate of bug 1322905 ***
Comment 7 Gregory Meno 2016-04-01 17:35:07 EDT
There is an additional issue here:
calamari-ctl initialize still calls salt-call which means it required salt-minion and salt-common at least.

So a workaround is to just re-run calamari-ctl initialize after the storage console agent gets bootstrapped.

The longer term fix is to rework calamari-ctl initialize to now need salt. there should be code upstream to handle this.
Comment 8 Nishanth Thomas 2016-04-04 06:16:41 EDT
calamari-ctl initialize is called after bootstrapping is done through ceph-installer. The flow is something like this:

1. setup the ansible communication(bootstrap ansible)
2. Install and configure the agent packages
3. Install MON. Calmari gets installed as part of this
4. Configure MON. Ideally calamari-ctl initialize and supervisor d restart and supervisorctl restart all should have happen before this.

In this way 4th step done after bootstrapping the agent. So why it is not started properly as part of MON configure?
Comment 9 Gregory Meno 2016-04-05 18:26:12 EDT
ok I've got it reproduced.

calamari is crashing because mon_remote isn't getting proper ceph cluster maps.
This wasn't a problem when we scheduled with salt.
I need to make mon_remote tolerate broken clusters better.

This is the error I'm seeing:
Traceback (most recent call last):
  File "/opt/calamari/venv/lib/python2.7/site-packages/gevent/greenlet.py", line 327, in run
    result = self._run(*self.args, **self.kwargs)
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_common-0.1-py2.7.egg/calamari_common/remote/mon_remote.py", line 798, in _run
    server_heartbeat, cluster_heartbeat = get_heartbeats()
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_common-0.1-py2.7.egg/calamari_common/remote/mon_remote.py", line 577, in get_heartbeats
    cluster_heartbeat[fsid] = cluster_status(cluster_handle, fsid_names[fsid])
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_common-0.1-py2.7.egg/calamari_common/remote/mon_remote.py", line 644, in cluster_status
    mds_epoch = status['mdsmap']['epoch']
KeyError: 'mdsmap'
<MsgGenerator at 0x2e060f0> failed with KeyError
Comment 10 Gregory Meno 2016-04-05 18:27:56 EDT
just so you know Nishanth.

calamari-ctl initialize happens on mon configure.
Comment 11 Gregory Meno 2016-04-05 18:57:33 EDT
would you please pm_ack this
Comment 13 Gregory Meno 2016-04-11 18:20:51 EDT
This is fixed when you boot-strap the redhat storage console agent first
like described in https://bugzilla.redhat.com/show_bug.cgi?id=1322907#c8


I'm going to open another bz about calamari-ctl initialize requiring salt-minion to work.
Comment 14 Tamil 2016-04-13 12:15:41 EDT
Nishanth, can you please update tyhe bug with the steps you used to reproduce the bug and any logs from your side would be helpful too.
Comment 15 Nishanth Thomas 2016-04-13 12:59:12 EDT
 curl -d "{\"calamari\": true, \"hosts\": [\"dhcp46-139.lab.eng.blr.redhat.com\"],\"redhat_storage\":false,\"redhat_use_cdn\":true}" http://dhcp46-65.lab.eng.blr.redhat.com:8181/api/mon/install/

curl -d "{\"calamari\": true, \"host\": \"dhcp46-139.lab.eng.blr.redhat.com\", \"fsid\": \"deedcb4c-a67a-4997-93a6-92149ad2622a\", \"interface\": \"eth0\", \"monitor_secret\": \"AQA7P8dWAAAAABAAH/tbiZQn/40Z8pr959UmEA==\", \"cluster_network\": \"10.70.44.0/22\", \"public_network\": \"10.70.44.0/22\", \"redhat_storage\": false}" http://dhcp46-65.lab.eng.blr.redhat.com:8181/api/mon/configure/

 supervisorctl status
calamari-lite                    FATAL      Exited too quickly (process log may have details)


Nothing in calamari logs
Comment 16 Nishanth Thomas 2016-04-13 13:06 EDT
Created attachment 1146916 [details]
Ceph-installer logs
Comment 17 Gregory Meno 2016-04-13 23:06:07 EDT
/var/log/messages seem to me like we're trying to run calamari-ctl initialze before salt is running. I'll try to reproduce, the fix is easy if this is the problem.
Comment 20 Ken Dreyer (Red Hat) 2016-05-09 14:47:37 EDT
Looks like this was a calamari bug after all. Moving back to the RH Ceph product.
Comment 22 errata-xmlrpc 2016-08-23 15:35:14 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1755.html

Note You need to log in before you can comment on or make changes to this bug.