Bug 1850377

Summary: `ceph-osd-run.sh` shall error gracefully when OSD_DEVICE can't be determined
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Harald Klein <hklein>
Component: Ceph-AnsibleAssignee: Dimitri Savineau <dsavinea>
Status: CLOSED ERRATA QA Contact: Vasishta <vashastr>
Severity: low Docs Contact:
Priority: low    
Version: 3.3CC: aschoen, bniver, ceph-eng-bugs, ceph-qe-bugs, dsavinea, gabrioux, gmeno, nthomas, pdhange, tchandra, tserlin, ykaul
Target Milestone: z6   
Target Release: 3.3   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHEL: ceph-ansible-3.2.46-1.el7cp Ubuntu: ceph-ansible_3.2.46-2redhat1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-08-18 18:05:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Harald Klein 2020-06-24 07:46:26 UTC
Description of problem:

When the OSD_DEVICE can't be determined, `ceph-osd-run.sh` attempts to start the container with an invalid image name

Version-Release number of selected component (if applicable):

Red Hat Ceph Storage 3.3

How reproducible:

This will occur when there are no OSD devices, e.g. after failed re-deployment of OSDs

Actual results:

dockerd-current[1234]: time="2020-06-22T20:10:16.462419051+02:00" level=error msg="Handler for POST /v1.26/containers/create returned error: No such image: 2020-06-22:latest"
ceph-osd-run.sh[123456]: Unable to find image '2020-06-22:latest' locally

Expected results:

Handle cases where OSD_DEVICE is empty in a gracefull manner. The script shall not attempt to start the container with an invalid image name in this case. A meaningful error message should be printed.


Additional info:

The OSD_DEVICE is derived here:

---->8----
function id_to_device () {
  DATA_PART=$(docker run --rm --ulimit nofile=1024:4096 --privileged=true -v /dev/:/dev/ -v /etc/ceph:/etc/ceph:z --entrypoint ceph-disk registry.access.redhat.com/rhceph/rhceph-3-rhel7:3-32 list | grep ", osd\.${1}," | awk '{ print $1 }')
  if [[ "${DATA_PART}" =~ ^/dev/(cciss|nvme|loop) ]]; then
    OSD_DEVICE=${DATA_PART:0:-2}
  else
    OSD_DEVICE=${DATA_PART:0:-1}
  fi
}
----8<-----

This is the output of ceph-disk list:
---->8-----
/dev/sda :
 /dev/sda2 other, iso9660
 /dev/sda1 other, vfat
 /dev/sda3 other, xfs, mounted on /
/dev/sdb :
 /dev/sdb1 ceph data, prepared, cluster ceph, journal /dev/sdg1
/dev/sdc :
 /dev/sdc1 ceph data, prepared, cluster ceph, journal /dev/sdg2
/dev/sdd :
 /dev/sdd1 ceph data, prepared, cluster ceph, journal /dev/sdg3
/dev/sde :
 /dev/sde1 ceph data, prepared, cluster ceph, journal /dev/sdg4
/dev/sdf other, unknown
/dev/sdg :
 /dev/sdg1 ceph journal, for /dev/sdb1
 /dev/sdg2 ceph journal, for /dev/sdc1
 /dev/sdg3 ceph journal, for /dev/sdd1
 /dev/sdg4 ceph journal, for /dev/sde1
 /dev/sdg5 ceph journal, for /dev/sdh1
/dev/sdh :
 /dev/sdh1 ceph data, prepared, cluster ceph, journal /dev/sdg5
----8<-----

Comment 8 errata-xmlrpc 2020-08-18 18:05:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 3.3 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:3504