Bug 1303241

Summary: cthulhu crashes when osd_metadata is missing from the OSD_map
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Rachana Patel <racpatel>
Component: CalamariAssignee: Christina Meno <gmeno>
Calamari sub component: Web UI QA Contact: ceph-qe-bugs <ceph-qe-bugs>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: unspecified CC: ceph-eng-bugs, flucifre, hnallurv, kdreyer
Version: 1.3.2Keywords: Rebase
Target Milestone: rc   
Target Release: 1.3.2   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: RHEL: calamari-server-1.3.3-1.el7cp Ubuntu: calamari_1.3.3-2redhat1trusty Doc Type: Rebase: Bug Fixes and Enhancements
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-02-29 14:45:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rachana Patel 2016-01-29 22:01:13 UTC
Description of problem:
=======================
cthulhu crashes when osd_metadata is missing from the OSD_map

Version-Release number of selected component (if applicable):
==============================================================
ceph-deploy-1.5.27.4-1.el7cp.noarch
ceph-common-0.94.5-4.el7cp.x86_64
calamari-server-1.3.2-2.el7cp.x86_64
calamari-clients-1.3-2.el7cp.x86_64
salt-2014.1.5-3.el7cp.noarch
salt-minion-2014.1.5-3.el7cp.noarch
salt-master-2014.1.5-3.el7cp.noarch



How reproducible:
=================
always


Steps to Reproduce:
==================
1.Cluster had ceph-1.3.1 on RHEL-7.1
2. started upgrading it to 1.3.2 z build.
3. after uograding calamari server, web browser gives error - server error - 500.

cthulhu log :-
[c1@magna048 ~]$ sudo tail -f /var/log/calamari/cthulhu.log
    cluster_monitor.inject_sync_object(None, sync_type, version, msgpack.unpackb(latest_record.data))
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_cthulhu-0.1-py2.7.egg/cthulhu/manager/cluster_monitor.py", line 351, in inject_sync_object
    new_object = self._sync_objects.on_fetch_complete(minion_id, sync_type, version, data)
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_cthulhu-0.1-py2.7.egg/cthulhu/manager/cluster_monitor.py", line 138, in on_fetch_complete
    new_object = self.set_map(sync_type, version, data)
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_cthulhu-0.1-py2.7.egg/cthulhu/manager/cluster_monitor.py", line 56, in set_map
    so = self._objects[typ] = typ(version, map_data)
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_common-0.1-py2.7.egg/calamari_common/types.py", line 60, in __init__
    self.metadata_by_id = dict([(m['osd'], m) for m in data['osd_metadata']])
KeyError: 'osd_metadata'



Actual results:
===============
cthulhu crashed as osd_metadata is missing





Additional info:
===================
[c1@magna048 ~]$ sudo tail -f /var/log/calamari/calamari.log 
    reply_event = bufchan.recv(timeout)
  File "/opt/calamari/venv/lib/python2.7/site-packages/zerorpc/channel.py", line 267, in recv
    event = self._input_queue.get(timeout=timeout)
  File "/opt/calamari/venv/lib/python2.7/site-packages/gevent/queue.py", line 200, in get
    result = waiter.get()
  File "/opt/calamari/venv/lib/python2.7/site-packages/gevent/hub.py", line 568, in get
    return self.hub.switch()
  File "/opt/calamari/venv/lib/python2.7/site-packages/gevent/hub.py", line 331, in switch
    return greenlet.switch(self)
LostRemote: Lost remote after 10s heartbeat

Comment 2 Ken Dreyer (Red Hat) 2016-02-04 02:41:14 UTC
Gregory's work in progress is @ https://github.com/ceph/calamari/tree/wip-fix-osd-metadata

Comment 3 Ken Dreyer (Red Hat) 2016-02-04 22:45:24 UTC
Fixed in v1.3.3 upstream

Comment 8 errata-xmlrpc 2016-02-29 14:45:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:0313