Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1422191 - OSPD doesn't notify when it fails to create OSDs due to lack of disks in Ceph storage node
OSPD doesn't notify when it fails to create OSDs due to lack of disks in Ceph...
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: puppet-ceph (Show other bugs)
11.0 (Ocata)
x86_64 Linux
medium Severity high
: Upstream M3
: 11.0 (Ocata)
Assigned To: John Fulton
Yogev Rabl
Derek
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-02-14 11:55 EST by Yogev Rabl
Modified: 2017-05-17 15:59 EDT (History)
9 users (show)

See Also:
Fixed In Version: puppet-ceph-2.3.0-2.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-05-17 15:59:41 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1665697 None None None 2017-02-17 11:16 EST
OpenStack gerrit 435618 None None None 2017-02-17 17:36 EST
Ceph Project Bug Tracker 18976 None None None 2017-02-17 13:11 EST
Red Hat Product Errata RHEA-2017:1245 normal SHIPPED_LIVE Red Hat OpenStack Platform 11.0 Bug Fix and Enhancement Advisory 2017-05-17 19:01:50 EDT

  None (edit)
Description Yogev Rabl 2017-02-14 11:55:46 EST
Description of problem:
OSPD didn't raise any error or warning when a updating an Overcloud, increasing the number of OSDs from 3 in each node to 11. Each Ceph storage node had only 9 available disk to run OSDs on.
The update ended successfully, though not all of the OSDs that were set in the environment file were active. 

The environment file was set with 11 OSDs per node: 
  ExtraConfig:
    ceph::profile::params::osds:
     '/dev/vdb':
       journal:
     '/dev/vdc':
       journal:
     '/dev/vdd':
       journal:
     '/dev/vde':
       journal:
     '/dev/vdf':
       journal:
     '/dev/vdg':
       journal:
     '/dev/vdh':
       journal:
     '/dev/vdi':
       journal:
     '/dev/vdj':
       journal:
     '/dev/vdk':
       journal:
     '/dev/vdl':
       journal:
When there were only 9 disks available for the OSDs /dev/vdb-/dev/vdj

Version-Release number of selected component (if applicable):

openstack-tripleo-validations-5.3.1-0.20170125194508.6b928f1.el7ost.noarch
openstack-tripleo-common-5.7.1-0.20170126235054.c75d3c6.el7ost.noarch
puppet-tripleo-6.1.0-0.20170127040716.d427c2a.el7ost.noarch
openstack-tripleo-puppet-elements-6.0.0-0.20170126053436.688584c.el7ost.noarch
openstack-tripleo-0.0.8-0.2.4de13b3git.el7ost.noarch
openstack-tripleo-heat-templates-6.0.0-0.20170127041112.ce54697.el7ost.1.noarch
openstack-tripleo-ui-2.0.1-0.20170126144317.f3bd97e.el7ost.noarch
python-tripleoclient-6.0.1-0.20170127055753.8ea289c.el7ost.noarch
openstack-tripleo-image-elements-6.0.0-0.20170126135810.00b9869.el7ost.noarch


How reproducible:
100%

Steps to Reproduce:
1. Deploy an Overcloud with 3 OSDs on each Ceph storage node
2. Update the Overcloud with a new storage environment file that sets more OSDs that disks in the Ceph storage nodes.


Actual results:
The update of the Overcloud finished successfully.

Expected results:
The update fails with an error that not all of the OSDs were initialized.

Additional info:
Comment 1 John Fulton 2017-02-17 09:23:37 EST
We can add a test in puppet-ceph's osd.pp to make it fail if any of the OSDs on the list fail to be activated. Here's an example from another tool: 

https://github.com/ceph/ceph-ansible/blob/master/roles/ceph-osd/tasks/activate_osds.yml#L61-L66

Users should specify a list of the disks they want which is accurate: 

They can use something like the following: 

 http://tripleo.org/advanced_deployment/node_specific_hieradata.html

or even: 

 https://github.com/RHsyseng/hci/tree/master/other-scenarios/mixed-nodes

If they have heterogeneous hardware. 

So the next step is to look at how this scenario is slipping by the following conditionals: 

https://github.com/openstack/puppet-ceph/blob/master/manifests/osd.pp#L201-L206
Comment 3 John Fulton 2017-02-17 11:26:40 EST
What you get in this scenario is a working directory-based OSD, not block-based directory as the user intended (and they did intend it if they passed /dev/foo along with a list of other block devices). 

[root@osd ~]# ls -laF /dev/sdq
total 28
drwxr-xr-x.  3 ceph ceph  220 Feb 17 10:10 ./
drwxr-xr-x. 22 root root 3180 Feb 17 10:10 ../
-rw-r--r--.  1 root root  189 Feb 17 10:10 activate.monmap
-rw-r--r--.  1 ceph ceph   37 Feb 17 10:10 ceph_fsid
drwxr-xr-x.  3 ceph ceph   80 Feb 17 10:10 current/
-rw-r--r--.  1 ceph ceph   37 Feb 17 10:10 fsid
-rw-r--r--.  1 ceph ceph    0 Feb 17 10:10 journal
-rw-r--r--.  1 ceph ceph   21 Feb 17 10:10 magic
-rw-r--r--.  1 ceph ceph    4 Feb 17 10:10 store_version
-rw-r--r--.  1 ceph ceph   53 Feb 17 10:10 superblock
-rw-r--r--.  1 ceph ceph    2 Feb 17 10:10 whoami
Comment 4 John Fulton 2017-03-01 11:36:34 EST
There was an update requested on this: 

- I have a proposed fix https://review.openstack.org/#/c/435618
- I just need to update the unit test so it can pass CI and merge
- I will get this done before the end of march so I can focus on some higher priority items.
Comment 5 John Fulton 2017-03-18 12:08:46 EDT
Update: Proposed upstream fix [1] passed CI and received positive review so far. 

[1]  https://review.openstack.org/#/c/435618/
Comment 6 John Fulton 2017-03-20 13:25:49 EDT
https://review.openstack.org/#/c/435618 has merged upstream.
Comment 10 Yogev Rabl 2017-04-18 11:14:53 EDT
verified on puppet-ceph-2.3.0-4.el7ost.noarch
Comment 11 errata-xmlrpc 2017-05-17 15:59:41 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1245

Note You need to log in before you can comment on or make changes to this bug.