When using filestore and the following devices list with ceph-ansible 3.24 and the rhceph-3-rhel7:3-32 container (78e0950c3de6) a ceph deployment fails while waiting for all of the OSDs to start. devices: - /dev/loop3 - /dev/loop4 - /dev/loop5 The OSDs are not starting because the ceph-osd-run.sh script is trimming only one character from the device name which results in an invalid device and the following error when attempting to manually start the OSD: [root@overcloud-ceph-leaf1-0 ~]# /usr/share/ceph-osd-run.sh 1 2019-09-04 20:11:24 /entrypoint.sh: static: does not generate config 2019-09-04 20:11:24 /entrypoint.sh: ERROR: you either provided a non-existing device or no device at all. 2019-09-04 20:11:24 /entrypoint.sh: You must provide a device to build your OSD ie: /dev/sdb [root@overcloud-ceph-leaf1-0 ~]# If I modify the deployed script to trim not one character but two [2], then the OSD starts fine. When running the commands directly you can see what's happening for to the given block devices [3]. We get a non-exist DATA_PART of /dev/loop3p : [root@overcloud-ceph-leaf1-0 ~]# DATA_PART=$(docker run --rm --ulimit nofile=1024:1024 --privileged=true -v /dev/:/dev/ -v /etc/ceph:/etc/ceph:z --entrypoint ceph-disk 10.37.168.131:8787/rhceph/rhceph-3-rhel7:3-32 list | grep ", osd\.1," | awk '{ print $1 }') [root@overcloud-ceph-leaf1-0 ~]# OSD_DEVICE=${DATA_PART:0:-1} [root@overcloud-ceph-leaf1-0 ~]# echo $OSD_DEVICE /dev/loop3p [root@overcloud-ceph-leaf1-0 ~]# I assume that the block devices you tested with, e.g. /dev/vdb, didn't have this issue but the loop back devices did. [root@overcloud-ceph-leaf1-0 ~]# echo $OSD_DEVICE /dev/loop3p [root@overcloud-ceph-leaf1-0 ~]# OSD_DEVICE=${DATA_PART:0:-2} [root@overcloud-ceph-leaf1-0 ~]# echo $OSD_DEVICE /dev/loop3 [root@overcloud-ceph-leaf1-0 ~]# If you want to do additional checking in the shell script it would solve this bug for devices which match this syntax type. We're working around it simply by switching the deployment from filestore to bluestore. Just wanted to report the bug for completeness. [1] https://github.com/ceph/ceph-ansible/blob/v3.2.24/roles/ceph-osd/templates/ceph-osd-run.sh.j2#L23 [2] [fultonj@skagra tmp]$ diff -u old new --- old 2019-09-04 16:54:23.085337059 -0400 +++ new 2019-09-04 16:54:31.829142391 -0400 @@ -15,7 +15,7 @@ if [[ "${DATA_PART}" =~ ^/dev/(cciss|nvme) ]]; then OSD_DEVICE=${DATA_PART:0:-2} else - OSD_DEVICE=${DATA_PART:0:-1} + OSD_DEVICE=${DATA_PART:0:-2} fi } [fultonj@skagra tmp]$ [3] [root@overcloud-ceph-leaf1-0 ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 931.5G 0 disk ├─sda1 8:1 0 1M 0 part └─sda2 8:2 0 931.5G 0 part / loop3 7:3 0 20G 0 loop ├─loop3p1 259:1 0 15G 0 loop └─loop3p2 259:0 0 5G 0 loop loop4 7:4 0 20G 0 loop ├─loop4p1 259:3 0 15G 0 loop └─loop4p2 259:2 0 5G 0 loop loop5 7:5 0 20G 0 loop ├─loop5p1 259:5 0 15G 0 loop └─loop5p2 259:4 0 5G 0 loop loop6 7:6 0 20G 0 loop ├─loop6p1 259:7 0 15G 0 loop └─loop6p2 259:6 0 5G 0 loop [root@overcloud-ceph-leaf1-0 ~]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:4353