Bug 1304401 - OSPD failed to detect and install ceph storage node
Summary: OSPD failed to detect and install ceph storage node
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 8.0 (Liberty)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 10.0 (Newton)
Assignee: Giulio Fidente
QA Contact: Yogev Rabl
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-02-03 13:54 UTC by Yogev Rabl
Modified: 2016-10-13 20:06 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-10-13 20:06:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
installation log (2.61 MB, text/plain)
2016-02-03 13:54 UTC, Yogev Rabl
no flags Details

Description Yogev Rabl 2016-02-03 13:54:26 UTC
Created attachment 1120790 [details]
installation log

Description of problem:
The overcloud was successfully installed without 1 Ceph storage node installed properly. The overcloud required topology is:
1 controller
2 compute nodes
3 Ceph storage nodes, each with 3 hard drives, vdb, vdc ,vdd. 

The ceph.yaml file is configured: 
ceph::profile::params::osd_journal_size: 1024
ceph::profile::params::osd_pool_default_pg_num: 128
ceph::profile::params::osd_pool_default_pgp_num: 128
ceph::profile::params::osd_pool_default_size: 3
ceph::profile::params::osd_pool_default_min_size: 1
ceph::profile::params::osds:
     '/dev/vdb':
       journal: ''
     '/dev/vdc':
       journal: ''
     '/dev/vdd':
       journal: ''
ceph::profile::params::manage_repo: false
ceph::profile::params::authentication_type: cephx

ceph_pools:
  - volumes
  - vms
  - images

ceph_classes: []

ceph_osd_selinux_permissive: true

The result is the controller is installed on 1 of the servers with the 4 hard drives, 2 Ceph storage nodes are installed properly and an additional Ceph storage node is set with no OSDs. 

* the installation runs on a virtual setup

Version-Release number of selected component (if applicable):
python-tripleoclient-0.0.11-5.el7ost.noarch
openstack-tripleo-image-elements-0.9.7-1.el7ost.noarch
openstack-tripleo-common-0.0.2-4.el7ost.noarch
openstack-tripleo-puppet-elements-0.0.2-1.el7ost.noarch
openstack-tripleo-0.0.7-1.el7ost.noarch
openstack-tripleo-heat-templates-0.8.7-2.el7ost.noarch


How reproducible:
unknown

Steps to Reproduce:
1. Set ceph.yaml with additional hard drives
2. Install overcloud

Actual results:
The overcloud installation failed - the storage nodes are misconfigured and the OSPD says that installation finished successfully.

Expected results:
The OSPD should detect the server with the 4 hard drives, install and run the OSDs services 

Additional info:

Comment 2 Giulio Fidente 2016-02-03 15:50:49 UTC
Can you paste the deploy command, attach any customized yaml and the output from 'sudo ceph status' from one of the controller nodes?

Comment 3 Giulio Fidente 2016-02-17 12:00:09 UTC
hi Yogev, is this still a bug? Can you reply to comment #2?

Comment 4 Yogev Rabl 2016-02-18 15:14:44 UTC
(In reply to Giulio Fidente from comment #3)
> hi Yogev, is this still a bug? Can you reply to comment #2?

Hi Giulio,
I got a workaround this issue:
1) Create a new flavor 
$openstack flavor create --id auto --ram 4096 --disk 10 --vcpus 1 cephStorage

2) Add a property to the flavor
$openstack flavor set --property 'cpu_arch'='x86_64' --property 'capabilities:boot_option'='local' --property 'capabilities:profile'='cephStorage' cephStorage

3) Add a property to the node with ironic
$ ironic node-update <ceph storage node uuid> add properties/capabilities='profile:cephStorage,boot_option:local'

And the customize yaml file is in the description of the bug

Comment 5 Mike Burns 2016-04-07 21:07:13 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 9 Giulio Fidente 2016-10-11 09:50:53 UTC
If disks were previously formatted by/for a different Ceph cluster, the cluster FSID won't match and the OSP Director won't reuse them.

Before BZ #1370439, the deployment would not fail in such a scenario, but silently discard pre-owned disks.

With recent builds instead (which include the fix for BZ #1370439), the deployment should fail instead.

Can you retry formatting the Ceph disks with an empty GPT label during the deployment, as documented in: https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/single/red-hat-ceph-storage-for-the-overcloud/#Formatting_Ceph_Storage_Nodes_Disks_to_GPT

Comment 10 Yogev Rabl 2016-10-13 07:31:10 UTC
(In reply to Giulio Fidente from comment #9)
> If disks were previously formatted by/for a different Ceph cluster, the
> cluster FSID won't match and the OSP Director won't reuse them.
> 
> Before BZ #1370439, the deployment would not fail in such a scenario, but
> silently discard pre-owned disks.
> 
> With recent builds instead (which include the fix for BZ #1370439), the
> deployment should fail instead.
> 
> Can you retry formatting the Ceph disks with an empty GPT label during the
> deployment, as documented in:
> https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/
> single/red-hat-ceph-storage-for-the-overcloud/
> #Formatting_Ceph_Storage_Nodes_Disks_to_GPT

I have tried it and it worked.

Comment 11 Yogev Rabl 2016-10-13 07:32:08 UTC
A deployment finished successfully, with Giulio's comment


Note You need to log in before you can comment on or make changes to this bug.