| Summary: | OSD fails to start when custom cluster name contains numbers | ||
|---|---|---|---|
| Product: | Red Hat Storage Console | Reporter: | Nishanth Thomas <nthomas> |
| Component: | ceph-installer | Assignee: | Andrew Schoen <aschoen> |
| Status: | CLOSED ERRATA | QA Contact: | Daniel Horák <dahorak> |
| Severity: | high | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 2 | CC: | adeza, aschoen, ceph-eng-bugs, dahorak, kdreyer, mkudlej, nthomas, sankarshan, sds-qe-bugs |
| Target Milestone: | --- | ||
| Target Release: | 2 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | ceph-ansible-1.0.5-32.el7scon | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-08-23 19:49:43 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
This is created by using the custom cluster name 'mine123'. The method ceph-ansible uses to get the OSD IDs fails to parse a custom cluster name that includes numbers. It returns the ID of '123' twice, which is the failure you're seeing. Here's what ceph-ansible is doing to determine the OSD IDs: [root@dhcp47-41 ~]# ls /var/lib/ceph/osd mine123-0 mine123-1 [root@dhcp47-41 ~]# ls /var/lib/ceph/osd/ |grep -oh '[0-9]*' 123 0 123 1 Nishanth, can you try again and either use the default cluster name or one without numbers in it? Thanks. I made a PR upstream to address the issue of retrieving OSD IDs when the cluster name includes numbers: https://github.com/ceph/ceph-ansible/pull/750 I have tried with default cluster name and this issue is not seen The fix is not correct, because it have problems with cluster names containing '-' (dash), for example: MyCluster-01 my-cluster Dash should be supported in Ceph cluster name, because it is mentioned in documentation[1]: ~~~~~~~~~~~~~~~~~~~~~~~~~~ For example, when you run multiple clusters in a federated architecture, the cluster name (e.g., us-west, us-east) identifies the cluster for the current CLI session. ~~~~~~~~~~~~~~~~~~~~~~~~~~ Tested on: USM Server/ceph-installer server (RHEL 7.2): ceph-ansible-1.0.5-31.el7scon.noarch ceph-installer-1.0.14-1.el7scon.noarch rhscon-ceph-0.0.39-1.el7scon.x86_64 rhscon-core-0.0.39-1.el7scon.x86_64 rhscon-core-selinux-0.0.39-1.el7scon.noarch rhscon-ui-0.0.51-1.el7scon.noarch salt-2015.5.5-1.el7.noarch salt-master-2015.5.5-1.el7.noarch salt-selinux-0.0.39-1.el7scon.noarch Ceph MON (RHEL 7.2): calamari-server-1.4.8-1.el7cp.x86_64 ceph-base-10.2.2-32.el7cp.x86_64 ceph-common-10.2.2-32.el7cp.x86_64 ceph-mon-10.2.2-32.el7cp.x86_64 ceph-selinux-10.2.2-32.el7cp.x86_64 libcephfs1-10.2.2-32.el7cp.x86_64 python-cephfs-10.2.2-32.el7cp.x86_64 rhscon-agent-0.0.16-1.el7scon.noarch rhscon-core-selinux-0.0.39-1.el7scon.noarch salt-2015.5.5-1.el7.noarch salt-minion-2015.5.5-1.el7.noarch salt-selinux-0.0.39-1.el7scon.noarch Ceph OSD (RHEL 7.2): ceph-base-10.2.2-32.el7cp.x86_64 ceph-common-10.2.2-32.el7cp.x86_64 ceph-osd-10.2.2-32.el7cp.x86_64 ceph-selinux-10.2.2-32.el7cp.x86_64 libcephfs1-10.2.2-32.el7cp.x86_64 python-cephfs-10.2.2-32.el7cp.x86_64 rhscon-agent-0.0.16-1.el7scon.noarch rhscon-core-selinux-0.0.39-1.el7scon.noarch salt-2015.5.5-1.el7.noarch salt-minion-2015.5.5-1.el7.noarch salt-selinux-0.0.39-1.el7scon.noarch >> moving back to ASSIGNED [1] http://docs.ceph.com/docs/master/install/manual-deployment/ This upstream PR fixed the dashes in a cluster name issue: https://github.com/ceph/ceph-ansible/pull/816 Retested with names mentioned in comment 8 and it works as expected. Tested on: USM Server/ceph-installer server (RHEL 7.2): ceph-ansible-1.0.5-32.el7scon.noarch ceph-installer-1.0.14-1.el7scon.noarch rhscon-ceph-0.0.39-1.el7scon.x86_64 rhscon-core-0.0.39-1.el7scon.x86_64 rhscon-core-selinux-0.0.39-1.el7scon.noarch rhscon-ui-0.0.51-1.el7scon.noarch salt-2015.5.5-1.el7.noarch salt-master-2015.5.5-1.el7.noarch salt-selinux-0.0.39-1.el7scon.noarch Ceph MON (RHEL 7.2): calamari-server-1.4.8-1.el7cp.x86_64 ceph-base-10.2.2-33.el7cp.x86_64 ceph-common-10.2.2-33.el7cp.x86_64 ceph-mon-10.2.2-33.el7cp.x86_64 ceph-selinux-10.2.2-33.el7cp.x86_64 libcephfs1-10.2.2-33.el7cp.x86_64 python-cephfs-10.2.2-33.el7cp.x86_64 rhscon-agent-0.0.16-1.el7scon.noarch rhscon-core-selinux-0.0.39-1.el7scon.noarch salt-2015.5.5-1.el7.noarch salt-minion-2015.5.5-1.el7.noarch salt-selinux-0.0.39-1.el7scon.noarch Ceph OSD (RHEL 7.2): ceph-base-10.2.2-33.el7cp.x86_64 ceph-common-10.2.2-33.el7cp.x86_64 ceph-osd-10.2.2-33.el7cp.x86_64 ceph-selinux-10.2.2-33.el7cp.x86_64 libcephfs1-10.2.2-33.el7cp.x86_64 python-cephfs-10.2.2-33.el7cp.x86_64 rhscon-agent-0.0.16-1.el7scon.noarch rhscon-core-selinux-0.0.39-1.el7scon.noarch salt-2015.5.5-1.el7.noarch salt-minion-2015.5.5-1.el7.noarch salt-selinux-0.0.39-1.el7scon.noarch >> VERIFIED Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2016:1754 |
Hitting this issue while creating the OSDs. This is particularly seen when more than 1 OSDs are created per host. Can you please have a look? Looks like OSDs are getting created but the task return a failure. Error: TASK: [ceph-osd | start and add that the osd service(s) to the init sequence (for or after infernalis)] *** ok: [dhcp47-41.lab.eng.blr.redhat.com] => (item=123) => {"changed": false, "enabled": true, "item": "123", "name": "ceph-osd@123", "state": "started"} ok: [dhcp47-41.lab.eng.blr.redhat.com] => (item=0) => {"changed": false, "enabled": true, "item": "0", "name": "ceph-osd@0", "state": "started"} failed: [dhcp47-41.lab.eng.blr.redhat.com] => (item=123) => {"changed": false, "failed": true, "item": "123"} msg: Job for ceph-osd failed because start of the service was attempted too often. See "systemctl status ceph-osd" and "journalctl -xe" for details. To force a start use "systemctl reset-failed ceph-osd" followed by "systemctl start ceph-osd" again.