Bug 1322907

Summary:

Calamari is not started after Mon install/configure

Product:

[Red Hat Storage] Red Hat Ceph Storage

Reporter:

Nishanth Thomas <nthomas>

Component:

Calamari

Assignee:

Christina Meno <gmeno>

Calamari sub component:

Back-end

QA Contact:

Rachana Patel <racpatel>

Status:

CLOSED ERRATA

Docs Contact:

Severity:

urgent

Priority:

urgent

CC:

ceph-eng-bugs, federico, hnallurv, icolle, kdreyer, nlevine, nthomas, sankarshan, shtripat, tmuthami, vsarmila

Version:

2.0

Keywords:

Reopened

Target Milestone:

Target Release:

2.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

calamari-server-1.4.0-0.5.rc8.el7cp

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2016-08-23 19:35:14 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1291304

Attachments:

Description	Flags
/var/log/message from ceph-installer node	none
Ceph-installer logs	none

Description Nishanth Thomas 2016-03-31 15:36:30 UTC

ceph installer is not starting the calamari lite after mon install/configure

Comment 2 Christina Meno 2016-03-31 16:24:20 UTC

Nishanth,

Can you provide me credentials to a machine where this is happening?
or at a minimum provide me the output / error log of 
sudo calamari-ctl initialize
?

Comment 3 Shubhendu Tripathi 2016-03-31 17:42:44 UTC

Created attachment 1142319 [details]
/var/log/message from ceph-installer node

Comment 4 Shubhendu Tripathi 2016-03-31 17:47:47 UTC

Attached /var/log/messages content from the ceph-installer node where tried executing curl commands. Truncated top lines which were older ones..

1. /api/mon/install/
2. /api/osd/install/
3. /api/mon/configure/
4. /api/osd/configure

I can see some messages regarding calamari-ctl but nothing for "supervisorctl restart" kind.

You can refer the setup at 10.70.47.73 (ceph-installer node and ran curl commands from this only. credentials root/redhat)

mon node: 10.70.46.204
osd node: 10.70.46.150

Comment 5 Christina Meno 2016-03-31 19:33:27 UTC

same root cause as https://bugzilla.redhat.com/show_bug.cgi?id=1322905

Comment 6 Christina Meno 2016-03-31 19:34:15 UTC


*** This bug has been marked as a duplicate of bug 1322905 ***

Comment 7 Christina Meno 2016-04-01 21:35:07 UTC

There is an additional issue here:
calamari-ctl initialize still calls salt-call which means it required salt-minion and salt-common at least.

So a workaround is to just re-run calamari-ctl initialize after the storage console agent gets bootstrapped.

The longer term fix is to rework calamari-ctl initialize to now need salt. there should be code upstream to handle this.

Comment 8 Nishanth Thomas 2016-04-04 10:16:41 UTC

calamari-ctl initialize is called after bootstrapping is done through ceph-installer. The flow is something like this:

1. setup the ansible communication(bootstrap ansible)
2. Install and configure the agent packages
3. Install MON. Calmari gets installed as part of this
4. Configure MON. Ideally calamari-ctl initialize and supervisor d restart and supervisorctl restart all should have happen before this.

In this way 4th step done after bootstrapping the agent. So why it is not started properly as part of MON configure?

Comment 9 Christina Meno 2016-04-05 22:26:12 UTC

ok I've got it reproduced.

calamari is crashing because mon_remote isn't getting proper ceph cluster maps.
This wasn't a problem when we scheduled with salt.
I need to make mon_remote tolerate broken clusters better.

This is the error I'm seeing:
Traceback (most recent call last):
  File "/opt/calamari/venv/lib/python2.7/site-packages/gevent/greenlet.py", line 327, in run
    result = self._run(*self.args, **self.kwargs)
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_common-0.1-py2.7.egg/calamari_common/remote/mon_remote.py", line 798, in _run
    server_heartbeat, cluster_heartbeat = get_heartbeats()
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_common-0.1-py2.7.egg/calamari_common/remote/mon_remote.py", line 577, in get_heartbeats
    cluster_heartbeat[fsid] = cluster_status(cluster_handle, fsid_names[fsid])
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_common-0.1-py2.7.egg/calamari_common/remote/mon_remote.py", line 644, in cluster_status
    mds_epoch = status['mdsmap']['epoch']
KeyError: 'mdsmap'
<MsgGenerator at 0x2e060f0> failed with KeyError

Comment 10 Christina Meno 2016-04-05 22:27:56 UTC

just so you know Nishanth.

calamari-ctl initialize happens on mon configure.

Comment 11 Christina Meno 2016-04-05 22:57:33 UTC

would you please pm_ack this

Comment 13 Christina Meno 2016-04-11 22:20:51 UTC

This is fixed when you boot-strap the redhat storage console agent first
like described in https://bugzilla.redhat.com/show_bug.cgi?id=1322907#c8


I'm going to open another bz about calamari-ctl initialize requiring salt-minion to work.

Comment 14 Tamil 2016-04-13 16:15:41 UTC

Nishanth, can you please update tyhe bug with the steps you used to reproduce the bug and any logs from your side would be helpful too.

Comment 15 Nishanth Thomas 2016-04-13 16:59:12 UTC

 curl -d "{\"calamari\": true, \"hosts\": [\"dhcp46-139.lab.eng.blr.redhat.com\"],\"redhat_storage\":false,\"redhat_use_cdn\":true}" http://dhcp46-65.lab.eng.blr.redhat.com:8181/api/mon/install/

curl -d "{\"calamari\": true, \"host\": \"dhcp46-139.lab.eng.blr.redhat.com\", \"fsid\": \"deedcb4c-a67a-4997-93a6-92149ad2622a\", \"interface\": \"eth0\", \"monitor_secret\": \"AQA7P8dWAAAAABAAH/tbiZQn/40Z8pr959UmEA==\", \"cluster_network\": \"10.70.44.0/22\", \"public_network\": \"10.70.44.0/22\", \"redhat_storage\": false}" http://dhcp46-65.lab.eng.blr.redhat.com:8181/api/mon/configure/

 supervisorctl status
calamari-lite                    FATAL      Exited too quickly (process log may have details)


Nothing in calamari logs

Comment 16 Nishanth Thomas 2016-04-13 17:06:35 UTC

Created attachment 1146916 [details]
Ceph-installer logs

Comment 17 Christina Meno 2016-04-14 03:06:07 UTC

/var/log/messages seem to me like we're trying to run calamari-ctl initialze before salt is running. I'll try to reproduce, the fix is easy if this is the problem.

Comment 20 Ken Dreyer (Red Hat) 2016-05-09 18:47:37 UTC

Looks like this was a calamari bug after all. Moving back to the RH Ceph product.

Comment 22 errata-xmlrpc 2016-08-23 19:35:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1755.html