Bug 1508663

Summary: 0 OSDs during containerized non-colocated Ceph deployment using 2.4-4 ceph-docker image (colocated doesn't have this issue)
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: John Fulton <johfulto>
Component: Ceph-AnsibleAssignee: Sébastien Han <shan>
Status: CLOSED CURRENTRELEASE QA Contact: Yogev Rabl <yrabl>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.4CC: adeza, aschoen, ceph-eng-bugs, dang, dwilson, gfidente, gmeno, hchen, hnallurv, jefbrown, jim.curtis, jschluet, kdreyer, nthomas, pprakash, sankarshan, tserlin, yrabl
Target Milestone: rc   
Target Release: 3.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: RHEL: ceph-ansible-3.0.9-1.el7cp Ubuntu: ceph-ansible_3.0.9-2redhat1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-02-14 15:40:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description John Fulton 2017-11-01 22:58:45 UTC
Problem overview:
- Deployed 5 physical node overcloud (3 Mons/Ctrl, 1 Compute, 2 OSDs)
- Used OSP12 10.31.1 puddle (passed phase2)
- Requested 24 OSDs (12 physical HDDs + 3 physical SSDs per box)
- Received 0 OSDs

Workaround: 
- Redeploy exactly the same but 
- Change 2.4-4 ceph-docker container to candidate-50944-20171027124759
- Observe 24 OSDs

Comment 5 John Fulton 2017-11-01 23:26:30 UTC
dwilson already answered the question asked in comment #4 at the following: 

 https://bugzilla.redhat.com/show_bug.cgi?id=1507644#c10

He confirms that he also had this issue in an independent test on his physical hardware and that the same workaround also worked for him.

Comment 10 Sébastien Han 2017-11-02 15:35:31 UTC
will be in 3.0.9

Comment 13 John Fulton 2017-11-03 11:26:12 UTC
Seb solved [0] this with ceph-ansible alone [1] so we don't have to change the image. 


[0]
[heat-admin@overcloud-controller-0 ~]$ ceph -s
    cluster 394fc43c-be6a-11e7-8f1b-525400330666
     health HEALTH_OK
     monmap e2: 3 mons at {overcloud-controller-0=192.168.1.36:6789/0,overcloud-controller-1=192.168.1.25:6789/0,overcloud-controller-2=192.168.1.37:6789/0}
            election epoch 6, quorum 0,1,2 overcloud-controller-1,overcloud-controller-0,overcloud-controller-2
      fsmap e5: 1/1/1 up {0=overcloud-controller-1=up:active}, 2 up:standby
     osdmap e96: 24 osds: 24 up, 24 in
            flags sortbitwise,require_jewel_osds,recovery_deletes
      pgmap v210: 1600 pgs, 8 pools, 2068 bytes data, 20 objects
            951 MB used, 26813 GB / 26814 GB avail
                1600 active+clean
[heat-admin@overcloud-controller-0 ~]$ sudo su -
[root@overcloud-controller-0 ~]# docker images | grep ceph 
docker-registry.engineering.redhat.com/ceph/rhceph-2-rhel7                                latest              039247b48eb4        2 weeks ago         529.9 MB
[root@overcloud-controller-0 ~]# 

[1]
(undercloud) [stack@hci-director ceph-ansible]$ git log 
commit a7d1947aa6f7e5fdfb43043ee858ed2167e7240a
Author: Sébastien Han <seb>
Date:   Thu Nov 2 16:17:38 2017 +0100

    osd: enhance backward compatibility
    
    During the initial implementation of this 'old' thing we were falling
    into this issue without noticing
    https://github.com/moby/moby/issues/30341 and where blindly using --rm,
    now this is fixed the prepare container disappears and thus activation
    fail.
    I'm fixing this for old jewel images.
    
    Signed-off-by: Sébastien Han <seb>

Comment 14 Sébastien Han 2017-11-03 12:20:20 UTC
Fix is in: https://github.com/ceph/ceph-ansible/releases/tag/v3.0.9

Ken, please build a package, thanks.

Comment 21 Yogev Rabl 2017-11-17 02:08:57 UTC
verified on ceph-ansible-3.0.11-1.el7cp.noarch

Comment 22 Ken Dreyer (Red Hat) 2018-02-14 15:40:09 UTC
RHCEPH 3.0 shipped in https://access.redhat.com/errata/RHBA-2017:3387