Bug 1422191
Summary: | OSPD doesn't notify when it fails to create OSDs due to lack of disks in Ceph storage node | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Yogev Rabl <yrabl> |
Component: | puppet-ceph | Assignee: | John Fulton <johfulto> |
Status: | CLOSED ERRATA | QA Contact: | Yogev Rabl <yrabl> |
Severity: | high | Docs Contact: | Derek <dcadzow> |
Priority: | medium | ||
Version: | 11.0 (Ocata) | CC: | gfidente, jjoyce, johfulto, jomurphy, jschluet, mburns, rhel-osp-director-maint, slinaber, tvignaud |
Target Milestone: | Upstream M3 | ||
Target Release: | 11.0 (Ocata) | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | puppet-ceph-2.3.0-2.el7ost | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-05-17 19:59:41 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Yogev Rabl
2017-02-14 16:55:46 UTC
We can add a test in puppet-ceph's osd.pp to make it fail if any of the OSDs on the list fail to be activated. Here's an example from another tool: https://github.com/ceph/ceph-ansible/blob/master/roles/ceph-osd/tasks/activate_osds.yml#L61-L66 Users should specify a list of the disks they want which is accurate: They can use something like the following: http://tripleo.org/advanced_deployment/node_specific_hieradata.html or even: https://github.com/RHsyseng/hci/tree/master/other-scenarios/mixed-nodes If they have heterogeneous hardware. So the next step is to look at how this scenario is slipping by the following conditionals: https://github.com/openstack/puppet-ceph/blob/master/manifests/osd.pp#L201-L206 What you get in this scenario is a working directory-based OSD, not block-based directory as the user intended (and they did intend it if they passed /dev/foo along with a list of other block devices). [root@osd ~]# ls -laF /dev/sdq total 28 drwxr-xr-x. 3 ceph ceph 220 Feb 17 10:10 ./ drwxr-xr-x. 22 root root 3180 Feb 17 10:10 ../ -rw-r--r--. 1 root root 189 Feb 17 10:10 activate.monmap -rw-r--r--. 1 ceph ceph 37 Feb 17 10:10 ceph_fsid drwxr-xr-x. 3 ceph ceph 80 Feb 17 10:10 current/ -rw-r--r--. 1 ceph ceph 37 Feb 17 10:10 fsid -rw-r--r--. 1 ceph ceph 0 Feb 17 10:10 journal -rw-r--r--. 1 ceph ceph 21 Feb 17 10:10 magic -rw-r--r--. 1 ceph ceph 4 Feb 17 10:10 store_version -rw-r--r--. 1 ceph ceph 53 Feb 17 10:10 superblock -rw-r--r--. 1 ceph ceph 2 Feb 17 10:10 whoami There was an update requested on this: - I have a proposed fix https://review.openstack.org/#/c/435618 - I just need to update the unit test so it can pass CI and merge - I will get this done before the end of march so I can focus on some higher priority items. Update: Proposed upstream fix [1] passed CI and received positive review so far. [1] https://review.openstack.org/#/c/435618/ https://review.openstack.org/#/c/435618 has merged upstream. verified on puppet-ceph-2.3.0-4.el7ost.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1245 |