Bug 1850377

Summary:	`ceph-osd-run.sh` shall error gracefully when OSD_DEVICE can't be determined
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Harald Klein <hklein>
Component:	Ceph-Ansible	Assignee:	Dimitri Savineau <dsavinea>
Status:	CLOSED ERRATA	QA Contact:	Vasishta <vashastr>
Severity:	low	Docs Contact:
Priority:	low
Version:	3.3	CC:	aschoen, bniver, ceph-eng-bugs, ceph-qe-bugs, dsavinea, gabrioux, gmeno, nthomas, pdhange, tchandra, tserlin, ykaul
Target Milestone:	z6
Target Release:	3.3
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	RHEL: ceph-ansible-3.2.46-1.el7cp Ubuntu: ceph-ansible_3.2.46-2redhat1	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-08-18 18:05:58 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Harald Klein 2020-06-24 07:46:26 UTC

Description of problem:

When the OSD_DEVICE can't be determined, `ceph-osd-run.sh` attempts to start the container with an invalid image name

Version-Release number of selected component (if applicable):

Red Hat Ceph Storage 3.3

How reproducible:

This will occur when there are no OSD devices, e.g. after failed re-deployment of OSDs

Actual results:

dockerd-current[1234]: time="2020-06-22T20:10:16.462419051+02:00" level=error msg="Handler for POST /v1.26/containers/create returned error: No such image: 2020-06-22:latest"
ceph-osd-run.sh[123456]: Unable to find image '2020-06-22:latest' locally

Expected results:

Handle cases where OSD_DEVICE is empty in a gracefull manner. The script shall not attempt to start the container with an invalid image name in this case. A meaningful error message should be printed.


Additional info:

The OSD_DEVICE is derived here:

---->8----
function id_to_device () {
  DATA_PART=$(docker run --rm --ulimit nofile=1024:4096 --privileged=true -v /dev/:/dev/ -v /etc/ceph:/etc/ceph:z --entrypoint ceph-disk registry.access.redhat.com/rhceph/rhceph-3-rhel7:3-32 list | grep ", osd\.${1}," | awk '{ print $1 }')
  if [[ "${DATA_PART}" =~ ^/dev/(cciss|nvme|loop) ]]; then
    OSD_DEVICE=${DATA_PART:0:-2}
  else
    OSD_DEVICE=${DATA_PART:0:-1}
  fi
}
----8<-----

This is the output of ceph-disk list:
---->8-----
/dev/sda :
 /dev/sda2 other, iso9660
 /dev/sda1 other, vfat
 /dev/sda3 other, xfs, mounted on /
/dev/sdb :
 /dev/sdb1 ceph data, prepared, cluster ceph, journal /dev/sdg1
/dev/sdc :
 /dev/sdc1 ceph data, prepared, cluster ceph, journal /dev/sdg2
/dev/sdd :
 /dev/sdd1 ceph data, prepared, cluster ceph, journal /dev/sdg3
/dev/sde :
 /dev/sde1 ceph data, prepared, cluster ceph, journal /dev/sdg4
/dev/sdf other, unknown
/dev/sdg :
 /dev/sdg1 ceph journal, for /dev/sdb1
 /dev/sdg2 ceph journal, for /dev/sdc1
 /dev/sdg3 ceph journal, for /dev/sdd1
 /dev/sdg4 ceph journal, for /dev/sde1
 /dev/sdg5 ceph journal, for /dev/sdh1
/dev/sdh :
 /dev/sdh1 ceph data, prepared, cluster ceph, journal /dev/sdg5
----8<-----

Comment 8 errata-xmlrpc 2020-08-18 18:05:58 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 3.3 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:3504