Bug 1850377 - `ceph-osd-run.sh` shall error gracefully when OSD_DEVICE can't be determined
Summary: `ceph-osd-run.sh` shall error gracefully when OSD_DEVICE can't be determined
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 3.3
Hardware: All
OS: Linux
low
low
Target Milestone: z6
: 3.3
Assignee: Dimitri Savineau
QA Contact: Vasishta
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-24 07:46 UTC by Harald Klein
Modified: 2020-08-18 18:06 UTC (History)
12 users (show)

Fixed In Version: RHEL: ceph-ansible-3.2.46-1.el7cp Ubuntu: ceph-ansible_3.2.46-2redhat1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-08-18 18:05:58 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 5467 0 None closed ceph-osd: exit gracefully when no data partition 2021-01-20 09:47:10 UTC
Red Hat Product Errata RHSA-2020:3504 0 None None None 2020-08-18 18:06:29 UTC

Description Harald Klein 2020-06-24 07:46:26 UTC
Description of problem:

When the OSD_DEVICE can't be determined, `ceph-osd-run.sh` attempts to start the container with an invalid image name

Version-Release number of selected component (if applicable):

Red Hat Ceph Storage 3.3

How reproducible:

This will occur when there are no OSD devices, e.g. after failed re-deployment of OSDs

Actual results:

dockerd-current[1234]: time="2020-06-22T20:10:16.462419051+02:00" level=error msg="Handler for POST /v1.26/containers/create returned error: No such image: 2020-06-22:latest"
ceph-osd-run.sh[123456]: Unable to find image '2020-06-22:latest' locally

Expected results:

Handle cases where OSD_DEVICE is empty in a gracefull manner. The script shall not attempt to start the container with an invalid image name in this case. A meaningful error message should be printed.


Additional info:

The OSD_DEVICE is derived here:

---->8----
function id_to_device () {
  DATA_PART=$(docker run --rm --ulimit nofile=1024:4096 --privileged=true -v /dev/:/dev/ -v /etc/ceph:/etc/ceph:z --entrypoint ceph-disk registry.access.redhat.com/rhceph/rhceph-3-rhel7:3-32 list | grep ", osd\.${1}," | awk '{ print $1 }')
  if [[ "${DATA_PART}" =~ ^/dev/(cciss|nvme|loop) ]]; then
    OSD_DEVICE=${DATA_PART:0:-2}
  else
    OSD_DEVICE=${DATA_PART:0:-1}
  fi
}
----8<-----

This is the output of ceph-disk list:
---->8-----
/dev/sda :
 /dev/sda2 other, iso9660
 /dev/sda1 other, vfat
 /dev/sda3 other, xfs, mounted on /
/dev/sdb :
 /dev/sdb1 ceph data, prepared, cluster ceph, journal /dev/sdg1
/dev/sdc :
 /dev/sdc1 ceph data, prepared, cluster ceph, journal /dev/sdg2
/dev/sdd :
 /dev/sdd1 ceph data, prepared, cluster ceph, journal /dev/sdg3
/dev/sde :
 /dev/sde1 ceph data, prepared, cluster ceph, journal /dev/sdg4
/dev/sdf other, unknown
/dev/sdg :
 /dev/sdg1 ceph journal, for /dev/sdb1
 /dev/sdg2 ceph journal, for /dev/sdc1
 /dev/sdg3 ceph journal, for /dev/sdd1
 /dev/sdg4 ceph journal, for /dev/sde1
 /dev/sdg5 ceph journal, for /dev/sdh1
/dev/sdh :
 /dev/sdh1 ceph data, prepared, cluster ceph, journal /dev/sdg5
----8<-----

Comment 8 errata-xmlrpc 2020-08-18 18:05:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 3.3 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:3504


Note You need to log in before you can comment on or make changes to this bug.