Bug 1749097 - ceph-ansible filestore fails to start containerized OSD when using block device like /dev/loop3
Summary: ceph-ansible filestore fails to start containerized OSD when using block devi...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 3.2
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: z2
: 3.3
Assignee: Dimitri Savineau
QA Contact: Vasishta
URL:
Whiteboard:
Depends On:
Blocks: 1578730
TreeView+ depends on / blocked
 
Reported: 2019-09-04 21:15 UTC by John Fulton
Modified: 2019-12-19 17:59 UTC (History)
13 users (show)

Fixed In Version: RHEL: ceph-ansible-3.2.30-1.el7cp Ubuntu: ceph-ansible_3.2.30-2redhat1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-12-19 17:59:09 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 4470 0 'None' 'closed' 'ceph-osd: handle loop devices with containers' 2019-11-19 14:18:40 UTC
Red Hat Product Errata RHSA-2019:4353 0 None None None 2019-12-19 17:59:28 UTC

Description John Fulton 2019-09-04 21:15:06 UTC
When using filestore and the following devices list with ceph-ansible 3.24 and the rhceph-3-rhel7:3-32 container (78e0950c3de6) a ceph deployment fails while waiting for all of the OSDs to start. 

devices:
 - /dev/loop3
 - /dev/loop4
 - /dev/loop5

The OSDs are not starting because the ceph-osd-run.sh script is trimming only one character from the device name which results in an invalid device and the following error when attempting to manually start the OSD:

[root@overcloud-ceph-leaf1-0 ~]# /usr/share/ceph-osd-run.sh 1
2019-09-04 20:11:24  /entrypoint.sh: static: does not generate config
2019-09-04 20:11:24  /entrypoint.sh: ERROR: you either provided a non-existing device or no device at all.
2019-09-04 20:11:24  /entrypoint.sh: You must provide a device to build your OSD ie: /dev/sdb
[root@overcloud-ceph-leaf1-0 ~]# 

If I modify the deployed script to trim not one character but two [2], then the OSD starts fine.

When running the commands directly you can see what's happening for to the given block devices [3]. We get a non-exist DATA_PART of /dev/loop3p :

[root@overcloud-ceph-leaf1-0 ~]# DATA_PART=$(docker run --rm --ulimit nofile=1024:1024 --privileged=true -v /dev/:/dev/ -v /etc/ceph:/etc/ceph:z --entrypoint ceph-disk 10.37.168.131:8787/rhceph/rhceph-3-rhel7:3-32 list | grep ", osd\.1," | awk '{ print $1 }')
[root@overcloud-ceph-leaf1-0 ~]# OSD_DEVICE=${DATA_PART:0:-1}
[root@overcloud-ceph-leaf1-0 ~]# echo $OSD_DEVICE
/dev/loop3p
[root@overcloud-ceph-leaf1-0 ~]# 

I assume that the block devices you tested with, e.g. /dev/vdb, didn't have this issue but the loop back devices did. 

[root@overcloud-ceph-leaf1-0 ~]# echo $OSD_DEVICE
/dev/loop3p
[root@overcloud-ceph-leaf1-0 ~]# OSD_DEVICE=${DATA_PART:0:-2}
[root@overcloud-ceph-leaf1-0 ~]# echo $OSD_DEVICE
/dev/loop3
[root@overcloud-ceph-leaf1-0 ~]# 

If you want to do additional checking in the shell script it would solve this bug for devices which match this syntax type. 

We're working around it simply by switching the deployment from filestore to bluestore. Just wanted to report the bug for completeness.


[1] https://github.com/ceph/ceph-ansible/blob/v3.2.24/roles/ceph-osd/templates/ceph-osd-run.sh.j2#L23

[2] 
[fultonj@skagra tmp]$ diff -u old new
--- old	2019-09-04 16:54:23.085337059 -0400
+++ new	2019-09-04 16:54:31.829142391 -0400
@@ -15,7 +15,7 @@
   if [[ "${DATA_PART}" =~ ^/dev/(cciss|nvme) ]]; then
     OSD_DEVICE=${DATA_PART:0:-2}
   else
-    OSD_DEVICE=${DATA_PART:0:-1}
+    OSD_DEVICE=${DATA_PART:0:-2}
   fi
 }
 
[fultonj@skagra tmp]$ 

[3] 
[root@overcloud-ceph-leaf1-0 ~]# lsblk
NAME      MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda         8:0    0 931.5G  0 disk 
├─sda1      8:1    0     1M  0 part 
└─sda2      8:2    0 931.5G  0 part /
loop3       7:3    0    20G  0 loop 
├─loop3p1 259:1    0    15G  0 loop 
└─loop3p2 259:0    0     5G  0 loop 
loop4       7:4    0    20G  0 loop 
├─loop4p1 259:3    0    15G  0 loop 
└─loop4p2 259:2    0     5G  0 loop 
loop5       7:5    0    20G  0 loop 
├─loop5p1 259:5    0    15G  0 loop 
└─loop5p2 259:4    0     5G  0 loop 
loop6       7:6    0    20G  0 loop 
├─loop6p1 259:7    0    15G  0 loop 
└─loop6p2 259:6    0     5G  0 loop 
[root@overcloud-ceph-leaf1-0 ~]#

Comment 6 errata-xmlrpc 2019-12-19 17:59:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:4353


Note You need to log in before you can comment on or make changes to this bug.