Bug 1322907

Summary: Calamari is not started after Mon install/configure
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Nishanth Thomas <nthomas>
Component: CalamariAssignee: Christina Meno <gmeno>
Calamari sub component: Back-end QA Contact: Rachana Patel <racpatel>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: ceph-eng-bugs, federico, hnallurv, icolle, kdreyer, nlevine, nthomas, sankarshan, shtripat, tmuthami, vsarmila
Version: 2.0Keywords: Reopened
Target Milestone: rc   
Target Release: 2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: calamari-server-1.4.0-0.5.rc8.el7cp Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-23 19:35:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1291304    
Attachments:
Description Flags
/var/log/message from ceph-installer node
none
Ceph-installer logs none

Description Nishanth Thomas 2016-03-31 15:36:30 UTC
ceph installer is not starting the calamari lite after mon install/configure

Comment 2 Christina Meno 2016-03-31 16:24:20 UTC
Nishanth,

Can you provide me credentials to a machine where this is happening?
or at a minimum provide me the output / error log of 
sudo calamari-ctl initialize
?

Comment 3 Shubhendu Tripathi 2016-03-31 17:42:44 UTC
Created attachment 1142319 [details]
/var/log/message from ceph-installer node

Comment 4 Shubhendu Tripathi 2016-03-31 17:47:47 UTC
Attached /var/log/messages content from the ceph-installer node where tried executing curl commands. Truncated top lines which were older ones..

1. /api/mon/install/
2. /api/osd/install/
3. /api/mon/configure/
4. /api/osd/configure

I can see some messages regarding calamari-ctl but nothing for "supervisorctl restart" kind.

You can refer the setup at 10.70.47.73 (ceph-installer node and ran curl commands from this only. credentials root/redhat)

mon node: 10.70.46.204
osd node: 10.70.46.150

Comment 5 Christina Meno 2016-03-31 19:33:27 UTC
same root cause as https://bugzilla.redhat.com/show_bug.cgi?id=1322905

Comment 6 Christina Meno 2016-03-31 19:34:15 UTC

*** This bug has been marked as a duplicate of bug 1322905 ***

Comment 7 Christina Meno 2016-04-01 21:35:07 UTC
There is an additional issue here:
calamari-ctl initialize still calls salt-call which means it required salt-minion and salt-common at least.

So a workaround is to just re-run calamari-ctl initialize after the storage console agent gets bootstrapped.

The longer term fix is to rework calamari-ctl initialize to now need salt. there should be code upstream to handle this.

Comment 8 Nishanth Thomas 2016-04-04 10:16:41 UTC
calamari-ctl initialize is called after bootstrapping is done through ceph-installer. The flow is something like this:

1. setup the ansible communication(bootstrap ansible)
2. Install and configure the agent packages
3. Install MON. Calmari gets installed as part of this
4. Configure MON. Ideally calamari-ctl initialize and supervisor d restart and supervisorctl restart all should have happen before this.

In this way 4th step done after bootstrapping the agent. So why it is not started properly as part of MON configure?

Comment 9 Christina Meno 2016-04-05 22:26:12 UTC
ok I've got it reproduced.

calamari is crashing because mon_remote isn't getting proper ceph cluster maps.
This wasn't a problem when we scheduled with salt.
I need to make mon_remote tolerate broken clusters better.

This is the error I'm seeing:
Traceback (most recent call last):
  File "/opt/calamari/venv/lib/python2.7/site-packages/gevent/greenlet.py", line 327, in run
    result = self._run(*self.args, **self.kwargs)
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_common-0.1-py2.7.egg/calamari_common/remote/mon_remote.py", line 798, in _run
    server_heartbeat, cluster_heartbeat = get_heartbeats()
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_common-0.1-py2.7.egg/calamari_common/remote/mon_remote.py", line 577, in get_heartbeats
    cluster_heartbeat[fsid] = cluster_status(cluster_handle, fsid_names[fsid])
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_common-0.1-py2.7.egg/calamari_common/remote/mon_remote.py", line 644, in cluster_status
    mds_epoch = status['mdsmap']['epoch']
KeyError: 'mdsmap'
<MsgGenerator at 0x2e060f0> failed with KeyError

Comment 10 Christina Meno 2016-04-05 22:27:56 UTC
just so you know Nishanth.

calamari-ctl initialize happens on mon configure.

Comment 11 Christina Meno 2016-04-05 22:57:33 UTC
would you please pm_ack this

Comment 13 Christina Meno 2016-04-11 22:20:51 UTC
This is fixed when you boot-strap the redhat storage console agent first
like described in https://bugzilla.redhat.com/show_bug.cgi?id=1322907#c8


I'm going to open another bz about calamari-ctl initialize requiring salt-minion to work.

Comment 14 Tamil 2016-04-13 16:15:41 UTC
Nishanth, can you please update tyhe bug with the steps you used to reproduce the bug and any logs from your side would be helpful too.

Comment 15 Nishanth Thomas 2016-04-13 16:59:12 UTC
 curl -d "{\"calamari\": true, \"hosts\": [\"dhcp46-139.lab.eng.blr.redhat.com\"],\"redhat_storage\":false,\"redhat_use_cdn\":true}" http://dhcp46-65.lab.eng.blr.redhat.com:8181/api/mon/install/

curl -d "{\"calamari\": true, \"host\": \"dhcp46-139.lab.eng.blr.redhat.com\", \"fsid\": \"deedcb4c-a67a-4997-93a6-92149ad2622a\", \"interface\": \"eth0\", \"monitor_secret\": \"AQA7P8dWAAAAABAAH/tbiZQn/40Z8pr959UmEA==\", \"cluster_network\": \"10.70.44.0/22\", \"public_network\": \"10.70.44.0/22\", \"redhat_storage\": false}" http://dhcp46-65.lab.eng.blr.redhat.com:8181/api/mon/configure/

 supervisorctl status
calamari-lite                    FATAL      Exited too quickly (process log may have details)


Nothing in calamari logs

Comment 16 Nishanth Thomas 2016-04-13 17:06:35 UTC
Created attachment 1146916 [details]
Ceph-installer logs

Comment 17 Christina Meno 2016-04-14 03:06:07 UTC
/var/log/messages seem to me like we're trying to run calamari-ctl initialze before salt is running. I'll try to reproduce, the fix is easy if this is the problem.

Comment 20 Ken Dreyer (Red Hat) 2016-05-09 18:47:37 UTC
Looks like this was a calamari bug after all. Moving back to the RH Ceph product.

Comment 22 errata-xmlrpc 2016-08-23 19:35:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1755.html