Bug 1420675 - Ceph Installer REST Apis unable to get the cluster details, on an upgraded Ceph setup
Summary: Ceph Installer REST Apis unable to get the cluster details, on an upgraded Ce...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Calamari
Version: 2.2
Hardware: Unspecified
OS: Linux
unspecified
urgent
Target Milestone: rc
: 2.2
Assignee: Boris Ranto
QA Contact: Tejas
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-09 09:44 UTC by Tejas
Modified: 2017-03-14 15:49 UTC (History)
9 users (show)

Fixed In Version: RHEL: calamari-server-1.5.1-1.el7cp Ubuntu: calamari_1.5.1-2redhat1xenial
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-14 15:49:10 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0514 0 normal SHIPPED_LIVE Red Hat Ceph Storage 2.2 bug fix and enhancement update 2017-03-21 07:24:26 UTC

Description Tejas 2017-02-09 09:44:16 UTC
Description of problem:
      On a Ceph cluster which is upgraded from 1.3.3 to 2.2, the ceph installer api's are not able to fetch the cluster details from the calamari server on the MON node.
Actually I am trying to import the cluster into USM. The calamari service is running successfully on  the MONs.

GET /api/v2/cluster
HTTP 200 OK
Vary: Accept
Content-Type: text/html; charset=utf-8
Allow: GET, HEAD, OPTIONS

[]

The cluster API is retuning a blank.


Version-Release number of selected component (if applicable):
ceph version 10.2.5-22.el7cp (5cec6848b914e87dd6178e559dedae8a37cc08a3)
calamari-server-1.5.0-1.el7cp.x86_64
salt-minion-2015.5.5-1.el7.noarch
salt-selinux-0.0.45-1.el7scon.noarch
salt-2015.5.5-1.el7.noarch


How reproducible:
Not sure

Steps to Reproduce:
1. Create a 1.3.3 ceph cluster with 3 MON and 3 OSD.
2. Upgrade the cluster to ceph 2.2 using the documented procedure.
3. executed a take_over_existing_cluster.yml on the upgraded cluster, from a different ansible node.
4. Added a mon to this cluster using site.yml
5. Try to import the cluster to USM.




Additional info:

The system is still in the same state:

mons:
magna031
magna046
magna052

osds:
magna058
magna061
magna063

Console node:
magna028

Comment 3 Christina Meno 2017-02-09 17:05:21 UTC
2017-02-09 06:03:56,930 - ERROR - calamari Uncaught exception
Traceback (most recent call last):
  File "/opt/calamari/venv/bin/calamari-lite", line 9, in <module>
    load_entry_point('calamari-lite==0.1', 'console_scripts', 'calamari-lite')()
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_lite-0.1-py2.7.egg/calamari_lite/server.py", line 140, in main
    cthulhu = Manager()
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_cthulhu-0.1-py2.7.egg/cthulhu/manager/manager.py", line 193, in __init__
    self.eventer = Eventer(self)
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_cthulhu-0.1-py2.7.egg/cthulhu/manager/eventer.py", line 77, in __init__
    self.caller = salt.client.Caller(mopts=__opts__)
TypeError: __init__() got an unexpected keyword argument 'mopts'

Comment 5 Christina Meno 2017-02-09 17:29:12 UTC
not sure that the traceback in the log is the cause

Comment 6 Christina Meno 2017-02-09 17:40:52 UTC
Root cause probably no admin socket in /var/run/ceph

Comment 7 Christina Meno 2017-02-09 17:51:46 UTC
rebooting magna046 caused a socket to be created
something strange is up

Comment 8 Christina Meno 2017-02-09 17:55:12 UTC
and calamari works fine on magna046 after opening the firewall.

Comment 9 Christina Meno 2017-02-09 18:04:01 UTC
history on one monitor says that likely it wasn't rebooted after upgrade.

Tejas is this the case?

[root@magna052 ubuntu]# history
    1  subscription-manager repos --enable=rhel-7-server-rhceph-1.3-mon-rpms
    2  systemctl start firewalld
    3  systemctl enable firewalld
    4  systemctl status firewalld.service
    5   firewall-cmd --zone=public --add-port=6789/tcp
    6  firewall-cmd --zone=public --add-port=6789/tcp --permanent
    7   systemctl enable ntpd.service
    8  systemctl start ntpd
    9  ntpq -p
   10  yum-config-manager --disable epel
   11  setenforce 1
   12  uname -a
   13  systemctl status salt-minion.service
   14  subscription-manager repos --disable=rhel-7-server-rhceph-1.3-mon-rpms --disable=rhel-7-server-rhceph-1.3-installer-rpms --disable=rhel-7-server-rhceph-1.3-calamari-rpms
   15  systemctl status ceph-mon.magna052.1486542313.319673641.service 
   16  systemctl stop ceph-mon.magna052.1486542313.319673641.service 
   17  yum update ceph-mon
   18  chown -R ceph:ceph /var/lib/ceph/mon
   19  chown -R ceph:ceph /var/log/ceph
   20  chown -R ceph:ceph /var/run/ceph
   21  chown -R ceph:ceph /etc/ceph
   22  touch /.autorelabel
   23  udevadm trigger
   24  systemctl enable ceph-mon.target
   25  systemctl enable ceph-mon@magna052
   26  systemctl status ceph-mon 
   27  systemctl start ceph-mon 
   28  systemctl status ceph-mon 
   29  ps -ef | grep ceph
   30  systemctl status ntpd.service 
   31  systemctl start ntpd.service 
   32  ntpq -p
   33  free -h
   34  salt-call --local pillar.items | grep ceph.heartbeat
   35  ceph -s
   36  free -h
   37  curl magna028.ceph.redhat.com:8181/setup/agent/ | bash
   38  rpm -qa | grep salt
   39  rpm -qa | grep agent
   40  systemctl restart salt-minion.service
   41  history
[root@magna052 ubuntu]#

Comment 12 Boris Ranto 2017-02-10 12:51:00 UTC
OK, I found the cause of this. We use this regexp (calamari_common/remote/mon_remote.py:service_status):

   match = re.match("^(.*)-(.*)\.(.*).asok$", os.path.basename(socket_path))

to get the cluster_name, service_type and service_id but the regexp matches the second group (service_type) all the way to the last dot so we get weird type and id when using FQDNs. I'm currently looking at the ways to fix this regexp to match to the first dot.

Comment 13 Boris Ranto 2017-02-10 13:07:33 UTC
It turns out we just need to change the line to read

    match = re.match("^(.*)-([^\.]*)\.(.*).asok$", os.path.basename(socket_path))

and it looks like everything works ok (cluster was discovered, osds were there, ...)

Upstream PR:

https://github.com/ceph/calamari/pull/502

Comment 14 Christina Meno 2017-02-13 16:46:52 UTC
https://github.com/ceph/calamari/releases/tag/v1.5.1

Comment 17 Tejas 2017-02-17 11:04:13 UTC
Verified on calamari build:
calamari-server-1.5.2-1.el7cp.x86_64

Comment 19 errata-xmlrpc 2017-03-14 15:49:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0514.html


Note You need to log in before you can comment on or make changes to this bug.