Bug 1420675
| Summary: | Ceph Installer REST Apis unable to get the cluster details, on an upgraded Ceph setup | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Tejas <tchandra> |
| Component: | Calamari | Assignee: | Boris Ranto <branto> |
| Calamari sub component: | Back-end | QA Contact: | Tejas <tchandra> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | urgent | ||
| Priority: | unspecified | CC: | adeza, aschoen, ceph-eng-bugs, gmeno, hnallurv, kdreyer, nthomas, sankarshan, tchandra |
| Version: | 2.2 | ||
| Target Milestone: | rc | ||
| Target Release: | 2.2 | ||
| Hardware: | Unspecified | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | RHEL: calamari-server-1.5.1-1.el7cp Ubuntu: calamari_1.5.1-2redhat1xenial | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-03-14 15:49:10 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
2017-02-09 06:03:56,930 - ERROR - calamari Uncaught exception
Traceback (most recent call last):
File "/opt/calamari/venv/bin/calamari-lite", line 9, in <module>
load_entry_point('calamari-lite==0.1', 'console_scripts', 'calamari-lite')()
File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_lite-0.1-py2.7.egg/calamari_lite/server.py", line 140, in main
cthulhu = Manager()
File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_cthulhu-0.1-py2.7.egg/cthulhu/manager/manager.py", line 193, in __init__
self.eventer = Eventer(self)
File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_cthulhu-0.1-py2.7.egg/cthulhu/manager/eventer.py", line 77, in __init__
self.caller = salt.client.Caller(mopts=__opts__)
TypeError: __init__() got an unexpected keyword argument 'mopts'
not sure that the traceback in the log is the cause Root cause probably no admin socket in /var/run/ceph rebooting magna046 caused a socket to be created something strange is up and calamari works fine on magna046 after opening the firewall. history on one monitor says that likely it wasn't rebooted after upgrade.
Tejas is this the case?
[root@magna052 ubuntu]# history
1 subscription-manager repos --enable=rhel-7-server-rhceph-1.3-mon-rpms
2 systemctl start firewalld
3 systemctl enable firewalld
4 systemctl status firewalld.service
5 firewall-cmd --zone=public --add-port=6789/tcp
6 firewall-cmd --zone=public --add-port=6789/tcp --permanent
7 systemctl enable ntpd.service
8 systemctl start ntpd
9 ntpq -p
10 yum-config-manager --disable epel
11 setenforce 1
12 uname -a
13 systemctl status salt-minion.service
14 subscription-manager repos --disable=rhel-7-server-rhceph-1.3-mon-rpms --disable=rhel-7-server-rhceph-1.3-installer-rpms --disable=rhel-7-server-rhceph-1.3-calamari-rpms
15 systemctl status ceph-mon.magna052.1486542313.319673641.service
16 systemctl stop ceph-mon.magna052.1486542313.319673641.service
17 yum update ceph-mon
18 chown -R ceph:ceph /var/lib/ceph/mon
19 chown -R ceph:ceph /var/log/ceph
20 chown -R ceph:ceph /var/run/ceph
21 chown -R ceph:ceph /etc/ceph
22 touch /.autorelabel
23 udevadm trigger
24 systemctl enable ceph-mon.target
25 systemctl enable ceph-mon@magna052
26 systemctl status ceph-mon
27 systemctl start ceph-mon
28 systemctl status ceph-mon
29 ps -ef | grep ceph
30 systemctl status ntpd.service
31 systemctl start ntpd.service
32 ntpq -p
33 free -h
34 salt-call --local pillar.items | grep ceph.heartbeat
35 ceph -s
36 free -h
37 curl magna028.ceph.redhat.com:8181/setup/agent/ | bash
38 rpm -qa | grep salt
39 rpm -qa | grep agent
40 systemctl restart salt-minion.service
41 history
[root@magna052 ubuntu]#
OK, I found the cause of this. We use this regexp (calamari_common/remote/mon_remote.py:service_status):
match = re.match("^(.*)-(.*)\.(.*).asok$", os.path.basename(socket_path))
to get the cluster_name, service_type and service_id but the regexp matches the second group (service_type) all the way to the last dot so we get weird type and id when using FQDNs. I'm currently looking at the ways to fix this regexp to match to the first dot.
It turns out we just need to change the line to read
match = re.match("^(.*)-([^\.]*)\.(.*).asok$", os.path.basename(socket_path))
and it looks like everything works ok (cluster was discovered, osds were there, ...)
Upstream PR:
https://github.com/ceph/calamari/pull/502
Verified on calamari build: calamari-server-1.5.2-1.el7cp.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2017-0514.html |
Description of problem: On a Ceph cluster which is upgraded from 1.3.3 to 2.2, the ceph installer api's are not able to fetch the cluster details from the calamari server on the MON node. Actually I am trying to import the cluster into USM. The calamari service is running successfully on the MONs. GET /api/v2/cluster HTTP 200 OK Vary: Accept Content-Type: text/html; charset=utf-8 Allow: GET, HEAD, OPTIONS [] The cluster API is retuning a blank. Version-Release number of selected component (if applicable): ceph version 10.2.5-22.el7cp (5cec6848b914e87dd6178e559dedae8a37cc08a3) calamari-server-1.5.0-1.el7cp.x86_64 salt-minion-2015.5.5-1.el7.noarch salt-selinux-0.0.45-1.el7scon.noarch salt-2015.5.5-1.el7.noarch How reproducible: Not sure Steps to Reproduce: 1. Create a 1.3.3 ceph cluster with 3 MON and 3 OSD. 2. Upgrade the cluster to ceph 2.2 using the documented procedure. 3. executed a take_over_existing_cluster.yml on the upgraded cluster, from a different ansible node. 4. Added a mon to this cluster using site.yml 5. Try to import the cluster to USM. Additional info: The system is still in the same state: mons: magna031 magna046 magna052 osds: magna058 magna061 magna063 Console node: magna028