Bug 1420675 - Ceph Installer REST Apis unable to get the cluster details, on an upgraded Ceph setup
Summary: Ceph Installer REST Apis unable to get the cluster details, on an upgraded Ce...
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Calamari
Version: 2.2
Hardware: Unspecified
OS: Linux
Target Milestone: rc
: 2.2
Assignee: Boris Ranto
QA Contact: Tejas
Depends On:
TreeView+ depends on / blocked
Reported: 2017-02-09 09:44 UTC by Tejas
Modified: 2017-03-14 15:49 UTC (History)
9 users (show)

Fixed In Version: RHEL: calamari-server-1.5.1-1.el7cp Ubuntu: calamari_1.5.1-2redhat1xenial
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2017-03-14 15:49:10 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0514 0 normal SHIPPED_LIVE Red Hat Ceph Storage 2.2 bug fix and enhancement update 2017-03-21 07:24:26 UTC

Description Tejas 2017-02-09 09:44:16 UTC
Description of problem:
      On a Ceph cluster which is upgraded from 1.3.3 to 2.2, the ceph installer api's are not able to fetch the cluster details from the calamari server on the MON node.
Actually I am trying to import the cluster into USM. The calamari service is running successfully on  the MONs.

GET /api/v2/cluster
Vary: Accept
Content-Type: text/html; charset=utf-8


The cluster API is retuning a blank.

Version-Release number of selected component (if applicable):
ceph version 10.2.5-22.el7cp (5cec6848b914e87dd6178e559dedae8a37cc08a3)

How reproducible:
Not sure

Steps to Reproduce:
1. Create a 1.3.3 ceph cluster with 3 MON and 3 OSD.
2. Upgrade the cluster to ceph 2.2 using the documented procedure.
3. executed a take_over_existing_cluster.yml on the upgraded cluster, from a different ansible node.
4. Added a mon to this cluster using site.yml
5. Try to import the cluster to USM.

Additional info:

The system is still in the same state:



Console node:

Comment 3 Christina Meno 2017-02-09 17:05:21 UTC
2017-02-09 06:03:56,930 - ERROR - calamari Uncaught exception
Traceback (most recent call last):
  File "/opt/calamari/venv/bin/calamari-lite", line 9, in <module>
    load_entry_point('calamari-lite==0.1', 'console_scripts', 'calamari-lite')()
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_lite-0.1-py2.7.egg/calamari_lite/server.py", line 140, in main
    cthulhu = Manager()
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_cthulhu-0.1-py2.7.egg/cthulhu/manager/manager.py", line 193, in __init__
    self.eventer = Eventer(self)
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_cthulhu-0.1-py2.7.egg/cthulhu/manager/eventer.py", line 77, in __init__
    self.caller = salt.client.Caller(mopts=__opts__)
TypeError: __init__() got an unexpected keyword argument 'mopts'

Comment 5 Christina Meno 2017-02-09 17:29:12 UTC
not sure that the traceback in the log is the cause

Comment 6 Christina Meno 2017-02-09 17:40:52 UTC
Root cause probably no admin socket in /var/run/ceph

Comment 7 Christina Meno 2017-02-09 17:51:46 UTC
rebooting magna046 caused a socket to be created
something strange is up

Comment 8 Christina Meno 2017-02-09 17:55:12 UTC
and calamari works fine on magna046 after opening the firewall.

Comment 9 Christina Meno 2017-02-09 18:04:01 UTC
history on one monitor says that likely it wasn't rebooted after upgrade.

Tejas is this the case?

[root@magna052 ubuntu]# history
    1  subscription-manager repos --enable=rhel-7-server-rhceph-1.3-mon-rpms
    2  systemctl start firewalld
    3  systemctl enable firewalld
    4  systemctl status firewalld.service
    5   firewall-cmd --zone=public --add-port=6789/tcp
    6  firewall-cmd --zone=public --add-port=6789/tcp --permanent
    7   systemctl enable ntpd.service
    8  systemctl start ntpd
    9  ntpq -p
   10  yum-config-manager --disable epel
   11  setenforce 1
   12  uname -a
   13  systemctl status salt-minion.service
   14  subscription-manager repos --disable=rhel-7-server-rhceph-1.3-mon-rpms --disable=rhel-7-server-rhceph-1.3-installer-rpms --disable=rhel-7-server-rhceph-1.3-calamari-rpms
   15  systemctl status ceph-mon.magna052.1486542313.319673641.service 
   16  systemctl stop ceph-mon.magna052.1486542313.319673641.service 
   17  yum update ceph-mon
   18  chown -R ceph:ceph /var/lib/ceph/mon
   19  chown -R ceph:ceph /var/log/ceph
   20  chown -R ceph:ceph /var/run/ceph
   21  chown -R ceph:ceph /etc/ceph
   22  touch /.autorelabel
   23  udevadm trigger
   24  systemctl enable ceph-mon.target
   25  systemctl enable ceph-mon@magna052
   26  systemctl status ceph-mon@magna052.service 
   27  systemctl start ceph-mon@magna052.service 
   28  systemctl status ceph-mon@magna052.service 
   29  ps -ef | grep ceph
   30  systemctl status ntpd.service 
   31  systemctl start ntpd.service 
   32  ntpq -p
   33  free -h
   34  salt-call --local pillar.items | grep ceph.heartbeat
   35  ceph -s
   36  free -h
   37  curl magna028.ceph.redhat.com:8181/setup/agent/ | bash
   38  rpm -qa | grep salt
   39  rpm -qa | grep agent
   40  systemctl restart salt-minion.service
   41  history
[root@magna052 ubuntu]#

Comment 12 Boris Ranto 2017-02-10 12:51:00 UTC
OK, I found the cause of this. We use this regexp (calamari_common/remote/mon_remote.py:service_status):

   match = re.match("^(.*)-(.*)\.(.*).asok$", os.path.basename(socket_path))

to get the cluster_name, service_type and service_id but the regexp matches the second group (service_type) all the way to the last dot so we get weird type and id when using FQDNs. I'm currently looking at the ways to fix this regexp to match to the first dot.

Comment 13 Boris Ranto 2017-02-10 13:07:33 UTC
It turns out we just need to change the line to read

    match = re.match("^(.*)-([^\.]*)\.(.*).asok$", os.path.basename(socket_path))

and it looks like everything works ok (cluster was discovered, osds were there, ...)

Upstream PR:


Comment 14 Christina Meno 2017-02-13 16:46:52 UTC

Comment 17 Tejas 2017-02-17 11:04:13 UTC
Verified on calamari build:

Comment 19 errata-xmlrpc 2017-03-14 15:49:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.