Bug 1398236

Summary: No osd after successful deployment overcloud
Product: Red Hat OpenStack Reporter: M.Rembas <malgorzata.rembas>
Component: cephAssignee: Sébastien Han <shan>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Yogev Rabl <yrabl>
Severity: unspecified Docs Contact:
Priority: high    
Version: 9.0 (Mitaka)CC: dcain, gcase, gfidente, jdonohue, jdurgin, johfulto, jomurphy, lhh, malgorzata.rembas, nlevine, seb, srevivo
Target Milestone: ---Flags: malgorzata.rembas: needinfo-
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-10 01:29:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1373538, 1400606    
Attachments:
Description Flags
storage environment
none
dump_of_os-collect-config none

Description M.Rembas 2016-11-24 10:29:26 UTC
Created attachment 1223732 [details]
storage environment

Description of problem:
In setup with 3 storage nodes with 15 osds all full flash with nvme disk as journal pre partitioned every deployment osd disks arent prepared and activated in ceph cluster 


How reproducible:


Steps to Reproduce:
1.Deploy overcloud with heat environments files  :  -e /home/stack/templates/storage-environment.yaml and included wipe_disk.yaml


Actual results:
Ceph status after successful deployment:
[root@overcloud-cephstorage-0' heat-admin]# ceph -s
    cluster 16eb2f62-ac3c-11e6-807b-001e67e2527d
     health HEALTH_ERR
            664 pgs stuck inactive
            664 pgs stuck unclean
            no osds
     monmap e1: 3 mons at {overcloud-controller-0=192.168.1.18:6789/0,overcloud-controller-1=192.168.1.14:6789/0,overcloud-controller-2=192.168.1.15:6789/0}
            election epoch 6, quorum 0,1,2 overcloud-controller-1,overcloud-controller-2,overcloud-controller-0
     osdmap e4: 0 osds: 0 up, 0 in
      pgmap v5: 664 pgs, 4 pools, 0 bytes data, 0 objects
            0 kB used, 0 kB / 0 kB avail
                 664 creating

Drives listening after deployment:
/dev/nvme0n1 :
 /dev/nvme0n1p1 ceph journal 
 /dev/nvme0n1p2 ceph journal
 /dev/nvme0n1p3 ceph journal
 /dev/nvme0n1p4 ceph journal
 /dev/nvme0n1p5 ceph journal
/dev/sda other, unknown
/dev/sdb other, unknown
/dev/sdc other, unknown
/dev/sdd other, unknown
/dev/sde other, unknown
/dev/sdf other, unknown
 /dev/sdf1 other, iso9660
 /dev/sdf2 other, ext4, mounted on /


Expected results:

I'm expect ceph prepared disk for cluster and activated all osd disks instead I have only partitioned disk


Additional info:

Comment 1 seb 2017-01-04 16:13:51 UTC
what's the partition status of the not-configured drives?

Comment 2 M.Rembas 2017-01-13 13:55:22 UTC
(In reply to seb from comment #1)
> what's the partition status of the not-configured drives?

/dev/sda other, unknown
/dev/sdb other, unknown
/dev/sdc other, unknown
/dev/sdd other, unknown
/dev/sde other, unknown
/dev/sdf other, unknown

Not prepared for ceph cluster need to manualy perform :
ceph-disk prepare .......

Comment 3 seb 2017-01-16 13:05:23 UTC
It looks like OSD have been skipped, is it possible to get some debug logs from the puppet execution? Otherwise, that's going to be difficult to debug...

Comment 4 M.Rembas 2017-01-16 15:38:48 UTC
Hi Thank you for answer but the setup was deployed over 1 month ago and any of the puppet log files have not been preserved.

Comment 5 seb 2017-01-17 10:53:08 UTC
If there is any way you can reproduce this let me know, is it still an issue? Have you got successful deployments since then?

If so, please close this bug :).

Comment 6 M.Rembas 2017-02-01 08:45:04 UTC
Hello
We are starting to deploy new stack, please let me know which exactly log you would like to have.

Regards
M.Rembas

Comment 7 seb 2017-02-02 13:37:32 UTC
I don't know yet but are you still experiencing the same problem on a new deployment?

Comment 8 M.Rembas 2017-02-17 14:42:51 UTC
Yes still the same issue with osd
fresh installation looks:
[root@overcloud-cephstorage-0 heat-admin]# ceph -s
    cluster 9f526dec-f517-11e6-8c38-001e67e2527d
     health HEALTH_ERR
            664 pgs stuck inactive
            664 pgs stuck unclean
            no osds
     monmap e1: 3 mons at {overcloud-controller-0=192.168.1.19:6789/0,overcloud-controller-1=192.168.1.14:6789/0,overcloud-controller-2=192.168.1.17:6789/0}
            election epoch 6, quorum 0,1,2 overcloud-controller-1,overcloud-controller-2,overcloud-controller-0
     osdmap e4: 0 osds: 0 up, 0 in
      pgmap v5: 664 pgs, 4 pools, 0 bytes data, 0 objects
            0 kB used, 0 kB / 0 kB avail
                 664 creating

Comment 9 M.Rembas 2017-02-17 14:45:23 UTC
Created attachment 1252704 [details]
dump_of_os-collect-config

logs from ceph node after fresh deployment

Comment 10 Giulio Fidente 2017-07-19 14:55:31 UTC
(In reply to M.Rembas from comment #9)
> Created attachment 1252704 [details]
> dump_of_os-collect-config
> 
> logs from ceph node after fresh deployment

Can you attach the custom env file where you specify the ceph::profile::params::osds list and report about the version of puppet-ceph installed in the overcloud ndoes?

Comment 11 John Fulton 2017-08-10 01:29:23 UTC
We don't have the needinfo requested on 2017-07-19 so I am going to go ahead and close this bug. Feel free to re-open if you have the requested information.