Description of problem: On cluster created from Ubuntu nodes is not calamari-lite properly started on any monitor. It might be problem with supervisor.service, because it is also not running and is disabled. Version-Release number of selected component (if applicable): USM server (RHEL 7.2): ceph-ansible-1.0.5-25.el7scon.noarch ceph-installer-1.0.12-4.el7scon.noarch rhscon-ceph-0.0.32-1.el7scon.x86_64 rhscon-core-0.0.33-1.el7scon.x86_64 rhscon-core-selinux-0.0.33-1.el7scon.noarch rhscon-ui-0.0.47-1.el7scon.noarch Ceph MON (Ubuntu 16.04): calamari-server 1.4.5-2redhat1xenial ceph-base 10.2.2-16redhat1xenial ceph-common 10.2.2-16redhat1xenial ceph-mon 10.2.2-16redhat1xenial libcephfs1 10.2.2-16redhat1xenial python-cephfs 10.2.2-16redhat1xenial rhscon-agent 0.0.14-2redhat1xenial How reproducible: 100% Steps to Reproduce: 1. Prepare bunch of nodes (one RHEL 7.2 and at least 5 Ubuntu 16.04). 2. Install and configure USM server on RHEL node and configure rhscon-agents on Ubuntu nodes. 3. Create Ceph cluster via USM web UI. 4. Check if calamari-lite is running on some ceph MON node. # supervisorctl status calamari-lite # systemctl status supervisor.service Actual results: calamari-lite (and also supervisor.service) is not running, supervisor.service is not enabled to start after machine reboot. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # supervisorctl status calamari-lite unix:///var/run/supervisor.sock no such file # systemctl status supervisor.service ● supervisor.service - Supervisor process control system for UNIX Loaded: loaded (/lib/systemd/system/supervisor.service; disabled; vendor preset: enabled) Active: inactive (dead) Docs: http://supervisord.org ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Expected results: calamari-lite will be properly configured and running on one ceph MON as it is required and als it will be configured to automatically start after machine reboot. Additional info: I'm not 100% sure, who is responsible for configuring and starting calamari and related services, so if it is problem for example with ceph-installer or ceph-ansible, please reassign this bug to proper component. It might be related to Bug 1305259.
Just a note: I also noticed, that supervisor.service is called differently on RHEL and on Ubuntu. On RHEL it is supervisord.service, but on Ubuntu it is only supervisor.service.
yes Daniel, that is root cause of this issue. https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/calamari_ctl.py#L260 tries to start supervisord(as specified in /opt/calamari/salt-local/services.sls) rather it should be supervisor in ubuntu
https://github.com/ceph/calamari/releases/tag/v1.4.6
Tested on: USM Server (RHEL 7.2): ceph-ansible-1.0.5-31.el7scon.noarch ceph-installer-1.0.14-1.el7scon.noarch rhscon-ceph-0.0.36-1.el7scon.x86_64 rhscon-core-0.0.36-1.el7scon.x86_64 rhscon-core-selinux-0.0.36-1.el7scon.noarch rhscon-ui-0.0.50-1.el7scon.noarch Ceph MON (Ubuntu 16.04): ii calamari-server 1.4.7-2redhat1xenial amd64 Inktank package containing the Calamari management server ii ceph-base 10.2.2-23redhat1xenial amd64 common ceph daemon libraries and management tools ii ceph-common 10.2.2-23redhat1xenial amd64 common utilities to mount and interact with a ceph storage cluster ii ceph-mon 10.2.2-23redhat1xenial amd64 monitor server for the ceph storage system ii libcephfs1 10.2.2-23redhat1xenial amd64 Ceph distributed file system client library ii python-cephfs 10.2.2-23redhat1xenial amd64 Python libraries for the Ceph libcephfs library ii rhscon-agent 0.0.16-2redhat1xenial all SKYNET is the event agent for SKYRING. Each storage node managed Service supervisor and calamari-lite is properly running on one Ceph MON. >> VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1755.html