Bug 1322907
Summary: | Calamari is not started after Mon install/configure | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Nishanth Thomas <nthomas> | ||||||
Component: | Calamari | Assignee: | Christina Meno <gmeno> | ||||||
Calamari sub component: | Back-end | QA Contact: | Rachana Patel <racpatel> | ||||||
Status: | CLOSED ERRATA | Docs Contact: | |||||||
Severity: | urgent | ||||||||
Priority: | urgent | CC: | ceph-eng-bugs, federico, hnallurv, icolle, kdreyer, nlevine, nthomas, sankarshan, shtripat, tmuthami, vsarmila | ||||||
Version: | 2.0 | Keywords: | Reopened | ||||||
Target Milestone: | rc | ||||||||
Target Release: | 2.0 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | calamari-server-1.4.0-0.5.rc8.el7cp | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2016-08-23 19:35:14 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1291304 | ||||||||
Attachments: |
|
Description
Nishanth Thomas
2016-03-31 15:36:30 UTC
Nishanth, Can you provide me credentials to a machine where this is happening? or at a minimum provide me the output / error log of sudo calamari-ctl initialize ? Created attachment 1142319 [details]
/var/log/message from ceph-installer node
Attached /var/log/messages content from the ceph-installer node where tried executing curl commands. Truncated top lines which were older ones.. 1. /api/mon/install/ 2. /api/osd/install/ 3. /api/mon/configure/ 4. /api/osd/configure I can see some messages regarding calamari-ctl but nothing for "supervisorctl restart" kind. You can refer the setup at 10.70.47.73 (ceph-installer node and ran curl commands from this only. credentials root/redhat) mon node: 10.70.46.204 osd node: 10.70.46.150 same root cause as https://bugzilla.redhat.com/show_bug.cgi?id=1322905 *** This bug has been marked as a duplicate of bug 1322905 *** There is an additional issue here: calamari-ctl initialize still calls salt-call which means it required salt-minion and salt-common at least. So a workaround is to just re-run calamari-ctl initialize after the storage console agent gets bootstrapped. The longer term fix is to rework calamari-ctl initialize to now need salt. there should be code upstream to handle this. calamari-ctl initialize is called after bootstrapping is done through ceph-installer. The flow is something like this: 1. setup the ansible communication(bootstrap ansible) 2. Install and configure the agent packages 3. Install MON. Calmari gets installed as part of this 4. Configure MON. Ideally calamari-ctl initialize and supervisor d restart and supervisorctl restart all should have happen before this. In this way 4th step done after bootstrapping the agent. So why it is not started properly as part of MON configure? ok I've got it reproduced. calamari is crashing because mon_remote isn't getting proper ceph cluster maps. This wasn't a problem when we scheduled with salt. I need to make mon_remote tolerate broken clusters better. This is the error I'm seeing: Traceback (most recent call last): File "/opt/calamari/venv/lib/python2.7/site-packages/gevent/greenlet.py", line 327, in run result = self._run(*self.args, **self.kwargs) File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_common-0.1-py2.7.egg/calamari_common/remote/mon_remote.py", line 798, in _run server_heartbeat, cluster_heartbeat = get_heartbeats() File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_common-0.1-py2.7.egg/calamari_common/remote/mon_remote.py", line 577, in get_heartbeats cluster_heartbeat[fsid] = cluster_status(cluster_handle, fsid_names[fsid]) File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_common-0.1-py2.7.egg/calamari_common/remote/mon_remote.py", line 644, in cluster_status mds_epoch = status['mdsmap']['epoch'] KeyError: 'mdsmap' <MsgGenerator at 0x2e060f0> failed with KeyError just so you know Nishanth. calamari-ctl initialize happens on mon configure. would you please pm_ack this This is fixed when you boot-strap the redhat storage console agent first like described in https://bugzilla.redhat.com/show_bug.cgi?id=1322907#c8 I'm going to open another bz about calamari-ctl initialize requiring salt-minion to work. Nishanth, can you please update tyhe bug with the steps you used to reproduce the bug and any logs from your side would be helpful too. curl -d "{\"calamari\": true, \"hosts\": [\"dhcp46-139.lab.eng.blr.redhat.com\"],\"redhat_storage\":false,\"redhat_use_cdn\":true}" http://dhcp46-65.lab.eng.blr.redhat.com:8181/api/mon/install/ curl -d "{\"calamari\": true, \"host\": \"dhcp46-139.lab.eng.blr.redhat.com\", \"fsid\": \"deedcb4c-a67a-4997-93a6-92149ad2622a\", \"interface\": \"eth0\", \"monitor_secret\": \"AQA7P8dWAAAAABAAH/tbiZQn/40Z8pr959UmEA==\", \"cluster_network\": \"10.70.44.0/22\", \"public_network\": \"10.70.44.0/22\", \"redhat_storage\": false}" http://dhcp46-65.lab.eng.blr.redhat.com:8181/api/mon/configure/ supervisorctl status calamari-lite FATAL Exited too quickly (process log may have details) Nothing in calamari logs Created attachment 1146916 [details]
Ceph-installer logs
/var/log/messages seem to me like we're trying to run calamari-ctl initialze before salt is running. I'll try to reproduce, the fix is easy if this is the problem. Looks like this was a calamari bug after all. Moving back to the RH Ceph product. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1755.html |