1304401 – OSPD failed to detect and install ceph storage node

Bug 1304401 - OSPD failed to detect and install ceph storage node

Summary: OSPD failed to detect and install ceph storage node

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	rhosp-director
Sub Component:
Version:	8.0 (Liberty)
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	10.0 (Newton)
Assignee:	Giulio Fidente
QA Contact:	Yogev Rabl
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-02-03 13:54 UTC by Yogev Rabl
Modified:	2016-10-13 20:06 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-10-13 20:06:19 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
installation log (2.61 MB, text/plain) 2016-02-03 13:54 UTC, Yogev Rabl	no flags	Details
View All

Description Yogev Rabl 2016-02-03 13:54:26 UTC

Created attachment 1120790 [details]
installation log

Description of problem:
The overcloud was successfully installed without 1 Ceph storage node installed properly. The overcloud required topology is:
1 controller
2 compute nodes
3 Ceph storage nodes, each with 3 hard drives, vdb, vdc ,vdd. 

The ceph.yaml file is configured: 
ceph::profile::params::osd_journal_size: 1024
ceph::profile::params::osd_pool_default_pg_num: 128
ceph::profile::params::osd_pool_default_pgp_num: 128
ceph::profile::params::osd_pool_default_size: 3
ceph::profile::params::osd_pool_default_min_size: 1
ceph::profile::params::osds:
     '/dev/vdb':
       journal: ''
     '/dev/vdc':
       journal: ''
     '/dev/vdd':
       journal: ''
ceph::profile::params::manage_repo: false
ceph::profile::params::authentication_type: cephx

ceph_pools:
  - volumes
  - vms
  - images

ceph_classes: []

ceph_osd_selinux_permissive: true

The result is the controller is installed on 1 of the servers with the 4 hard drives, 2 Ceph storage nodes are installed properly and an additional Ceph storage node is set with no OSDs. 

* the installation runs on a virtual setup

Version-Release number of selected component (if applicable):
python-tripleoclient-0.0.11-5.el7ost.noarch
openstack-tripleo-image-elements-0.9.7-1.el7ost.noarch
openstack-tripleo-common-0.0.2-4.el7ost.noarch
openstack-tripleo-puppet-elements-0.0.2-1.el7ost.noarch
openstack-tripleo-0.0.7-1.el7ost.noarch
openstack-tripleo-heat-templates-0.8.7-2.el7ost.noarch


How reproducible:
unknown

Steps to Reproduce:
1. Set ceph.yaml with additional hard drives
2. Install overcloud

Actual results:
The overcloud installation failed - the storage nodes are misconfigured and the OSPD says that installation finished successfully.

Expected results:
The OSPD should detect the server with the 4 hard drives, install and run the OSDs services 

Additional info:

Comment 2 Giulio Fidente 2016-02-03 15:50:49 UTC

Can you paste the deploy command, attach any customized yaml and the output from 'sudo ceph status' from one of the controller nodes?

Comment 3 Giulio Fidente 2016-02-17 12:00:09 UTC

hi Yogev, is this still a bug? Can you reply to comment #2?

Comment 4 Yogev Rabl 2016-02-18 15:14:44 UTC

(In reply to Giulio Fidente from comment #3)
> hi Yogev, is this still a bug? Can you reply to comment #2?

Hi Giulio,
I got a workaround this issue:
1) Create a new flavor 
$openstack flavor create --id auto --ram 4096 --disk 10 --vcpus 1 cephStorage

2) Add a property to the flavor
$openstack flavor set --property 'cpu_arch'='x86_64' --property 'capabilities:boot_option'='local' --property 'capabilities:profile'='cephStorage' cephStorage

3) Add a property to the node with ironic
$ ironic node-update <ceph storage node uuid> add properties/capabilities='profile:cephStorage,boot_option:local'

And the customize yaml file is in the description of the bug

Comment 5 Mike Burns 2016-04-07 21:07:13 UTC

This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 9 Giulio Fidente 2016-10-11 09:50:53 UTC

If disks were previously formatted by/for a different Ceph cluster, the cluster FSID won't match and the OSP Director won't reuse them.

Before BZ #1370439, the deployment would not fail in such a scenario, but silently discard pre-owned disks.

With recent builds instead (which include the fix for BZ #1370439), the deployment should fail instead.

Can you retry formatting the Ceph disks with an empty GPT label during the deployment, as documented in: https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/single/red-hat-ceph-storage-for-the-overcloud/#Formatting_Ceph_Storage_Nodes_Disks_to_GPT

Comment 10 Yogev Rabl 2016-10-13 07:31:10 UTC

(In reply to Giulio Fidente from comment #9)
> If disks were previously formatted by/for a different Ceph cluster, the
> cluster FSID won't match and the OSP Director won't reuse them.
> 
> Before BZ #1370439, the deployment would not fail in such a scenario, but
> silently discard pre-owned disks.
> 
> With recent builds instead (which include the fix for BZ #1370439), the
> deployment should fail instead.
> 
> Can you retry formatting the Ceph disks with an empty GPT label during the
> deployment, as documented in:
> https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/
> single/red-hat-ceph-storage-for-the-overcloud/
> #Formatting_Ceph_Storage_Nodes_Disks_to_GPT

I have tried it and it worked.

Comment 11 Yogev Rabl 2016-10-13 07:32:08 UTC

A deployment finished successfully, with Giulio's comment

Note You need to log in before you can comment on or make changes to this bug.