Bug 1303241 - cthulhu crashes when osd_metadata is missing from the OSD_map
cthulhu crashes when osd_metadata is missing from the OSD_map
Status: CLOSED ERRATA
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Calamari (Show other bugs)
1.3.2
x86_64 Linux
unspecified Severity urgent
: rc
: 1.3.2
Assigned To: Gregory Meno
ceph-qe-bugs
: Rebase
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-29 17:01 EST by Rachana Patel
Modified: 2016-02-29 09:45 EST (History)
4 users (show)

See Also:
Fixed In Version: RHEL: calamari-server-1.3.3-1.el7cp Ubuntu: calamari_1.3.3-2redhat1trusty
Doc Type: Rebase: Bug Fixes and Enhancements
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-02-29 09:45:37 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Rachana Patel 2016-01-29 17:01:13 EST
Description of problem:
=======================
cthulhu crashes when osd_metadata is missing from the OSD_map

Version-Release number of selected component (if applicable):
==============================================================
ceph-deploy-1.5.27.4-1.el7cp.noarch
ceph-common-0.94.5-4.el7cp.x86_64
calamari-server-1.3.2-2.el7cp.x86_64
calamari-clients-1.3-2.el7cp.x86_64
salt-2014.1.5-3.el7cp.noarch
salt-minion-2014.1.5-3.el7cp.noarch
salt-master-2014.1.5-3.el7cp.noarch



How reproducible:
=================
always


Steps to Reproduce:
==================
1.Cluster had ceph-1.3.1 on RHEL-7.1
2. started upgrading it to 1.3.2 z build.
3. after uograding calamari server, web browser gives error - server error - 500.

cthulhu log :-
[c1@magna048 ~]$ sudo tail -f /var/log/calamari/cthulhu.log
    cluster_monitor.inject_sync_object(None, sync_type, version, msgpack.unpackb(latest_record.data))
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_cthulhu-0.1-py2.7.egg/cthulhu/manager/cluster_monitor.py", line 351, in inject_sync_object
    new_object = self._sync_objects.on_fetch_complete(minion_id, sync_type, version, data)
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_cthulhu-0.1-py2.7.egg/cthulhu/manager/cluster_monitor.py", line 138, in on_fetch_complete
    new_object = self.set_map(sync_type, version, data)
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_cthulhu-0.1-py2.7.egg/cthulhu/manager/cluster_monitor.py", line 56, in set_map
    so = self._objects[typ] = typ(version, map_data)
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_common-0.1-py2.7.egg/calamari_common/types.py", line 60, in __init__
    self.metadata_by_id = dict([(m['osd'], m) for m in data['osd_metadata']])
KeyError: 'osd_metadata'



Actual results:
===============
cthulhu crashed as osd_metadata is missing





Additional info:
===================
[c1@magna048 ~]$ sudo tail -f /var/log/calamari/calamari.log 
    reply_event = bufchan.recv(timeout)
  File "/opt/calamari/venv/lib/python2.7/site-packages/zerorpc/channel.py", line 267, in recv
    event = self._input_queue.get(timeout=timeout)
  File "/opt/calamari/venv/lib/python2.7/site-packages/gevent/queue.py", line 200, in get
    result = waiter.get()
  File "/opt/calamari/venv/lib/python2.7/site-packages/gevent/hub.py", line 568, in get
    return self.hub.switch()
  File "/opt/calamari/venv/lib/python2.7/site-packages/gevent/hub.py", line 331, in switch
    return greenlet.switch(self)
LostRemote: Lost remote after 10s heartbeat
Comment 2 Ken Dreyer (Red Hat) 2016-02-03 21:41:14 EST
Gregory's work in progress is @ https://github.com/ceph/calamari/tree/wip-fix-osd-metadata
Comment 3 Ken Dreyer (Red Hat) 2016-02-04 17:45:24 EST
Fixed in v1.3.3 upstream
Comment 8 errata-xmlrpc 2016-02-29 09:45:37 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:0313

Note You need to log in before you can comment on or make changes to this bug.