1398236 – No osd after successful deployment overcloud

Bug 1398236 - No osd after successful deployment overcloud

Summary: No osd after successful deployment overcloud

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	ceph
Sub Component:
Version:	9.0 (Mitaka)
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Sébastien Han
QA Contact:	Yogev Rabl
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1373538 1400606
TreeView+	depends on / blocked

Reported:	2016-11-24 10:29 UTC by M.Rembas
Modified:	2020-01-03 09:09 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-08-10 01:29:23 UTC
Target Upstream Version:
Embargoed:
Flags:	malgorzata.rembas: needinfo-

Attachments	(Terms of Use)
storage environment (2.86 KB, text/plain) 2016-11-24 10:29 UTC, M.Rembas	no flags	Details
dump_of_os-collect-config (626.11 KB, text/x-vhdl) 2017-02-17 14:45 UTC, M.Rembas	no flags	Details
View All

Description M.Rembas 2016-11-24 10:29:26 UTC

Created attachment 1223732 [details]
storage environment

Description of problem:
In setup with 3 storage nodes with 15 osds all full flash with nvme disk as journal pre partitioned every deployment osd disks arent prepared and activated in ceph cluster 


How reproducible:


Steps to Reproduce:
1.Deploy overcloud with heat environments files  :  -e /home/stack/templates/storage-environment.yaml and included wipe_disk.yaml


Actual results:
Ceph status after successful deployment:
[root@overcloud-cephstorage-0' heat-admin]# ceph -s
    cluster 16eb2f62-ac3c-11e6-807b-001e67e2527d
     health HEALTH_ERR
            664 pgs stuck inactive
            664 pgs stuck unclean
            no osds
     monmap e1: 3 mons at {overcloud-controller-0=192.168.1.18:6789/0,overcloud-controller-1=192.168.1.14:6789/0,overcloud-controller-2=192.168.1.15:6789/0}
            election epoch 6, quorum 0,1,2 overcloud-controller-1,overcloud-controller-2,overcloud-controller-0
     osdmap e4: 0 osds: 0 up, 0 in
      pgmap v5: 664 pgs, 4 pools, 0 bytes data, 0 objects
            0 kB used, 0 kB / 0 kB avail
                 664 creating

Drives listening after deployment:
/dev/nvme0n1 :
 /dev/nvme0n1p1 ceph journal 
 /dev/nvme0n1p2 ceph journal
 /dev/nvme0n1p3 ceph journal
 /dev/nvme0n1p4 ceph journal
 /dev/nvme0n1p5 ceph journal
/dev/sda other, unknown
/dev/sdb other, unknown
/dev/sdc other, unknown
/dev/sdd other, unknown
/dev/sde other, unknown
/dev/sdf other, unknown
 /dev/sdf1 other, iso9660
 /dev/sdf2 other, ext4, mounted on /


Expected results:

I'm expect ceph prepared disk for cluster and activated all osd disks instead I have only partitioned disk


Additional info:

Comment 1 seb 2017-01-04 16:13:51 UTC

what's the partition status of the not-configured drives?

Comment 2 M.Rembas 2017-01-13 13:55:22 UTC

(In reply to seb from comment #1)
> what's the partition status of the not-configured drives?

/dev/sda other, unknown
/dev/sdb other, unknown
/dev/sdc other, unknown
/dev/sdd other, unknown
/dev/sde other, unknown
/dev/sdf other, unknown

Not prepared for ceph cluster need to manualy perform :
ceph-disk prepare .......

Comment 3 seb 2017-01-16 13:05:23 UTC

It looks like OSD have been skipped, is it possible to get some debug logs from the puppet execution? Otherwise, that's going to be difficult to debug...

Comment 4 M.Rembas 2017-01-16 15:38:48 UTC

Hi Thank you for answer but the setup was deployed over 1 month ago and any of the puppet log files have not been preserved.

Comment 5 seb 2017-01-17 10:53:08 UTC

If there is any way you can reproduce this let me know, is it still an issue? Have you got successful deployments since then?

If so, please close this bug :).

Comment 6 M.Rembas 2017-02-01 08:45:04 UTC

Hello
We are starting to deploy new stack, please let me know which exactly log you would like to have.

Regards
M.Rembas

Comment 7 seb 2017-02-02 13:37:32 UTC

I don't know yet but are you still experiencing the same problem on a new deployment?

Comment 8 M.Rembas 2017-02-17 14:42:51 UTC

Yes still the same issue with osd
fresh installation looks:
[root@overcloud-cephstorage-0 heat-admin]# ceph -s
    cluster 9f526dec-f517-11e6-8c38-001e67e2527d
     health HEALTH_ERR
            664 pgs stuck inactive
            664 pgs stuck unclean
            no osds
     monmap e1: 3 mons at {overcloud-controller-0=192.168.1.19:6789/0,overcloud-controller-1=192.168.1.14:6789/0,overcloud-controller-2=192.168.1.17:6789/0}
            election epoch 6, quorum 0,1,2 overcloud-controller-1,overcloud-controller-2,overcloud-controller-0
     osdmap e4: 0 osds: 0 up, 0 in
      pgmap v5: 664 pgs, 4 pools, 0 bytes data, 0 objects
            0 kB used, 0 kB / 0 kB avail
                 664 creating

Comment 9 M.Rembas 2017-02-17 14:45:23 UTC

Created attachment 1252704 [details]
dump_of_os-collect-config

logs from ceph node after fresh deployment

Comment 10 Giulio Fidente 2017-07-19 14:55:31 UTC

(In reply to M.Rembas from comment #9)
> Created attachment 1252704 [details]
> dump_of_os-collect-config
> 
> logs from ceph node after fresh deployment

Can you attach the custom env file where you specify the ceph::profile::params::osds list and report about the version of puppet-ceph installed in the overcloud ndoes?

Comment 11 John Fulton 2017-08-10 01:29:23 UTC

We don't have the needinfo requested on 2017-07-19 so I am going to go ahead and close this bug. Feel free to re-open if you have the requested information.

Note You need to log in before you can comment on or make changes to this bug.