Bug 1304401 - OSPD failed to detect and install ceph storage node
OSPD failed to detect and install ceph storage node
Status: CLOSED NOTABUG
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director (Show other bugs)
8.0 (Liberty)
x86_64 Linux
high Severity high
: ---
: 10.0 (Newton)
Assigned To: Giulio Fidente
Yogev Rabl
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-02-03 08:54 EST by Yogev Rabl
Modified: 2016-10-13 16:06 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-10-13 16:06:19 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
installation log (2.61 MB, text/plain)
2016-02-03 08:54 EST, Yogev Rabl
no flags Details

  None (edit)
Description Yogev Rabl 2016-02-03 08:54:26 EST
Created attachment 1120790 [details]
installation log

Description of problem:
The overcloud was successfully installed without 1 Ceph storage node installed properly. The overcloud required topology is:
1 controller
2 compute nodes
3 Ceph storage nodes, each with 3 hard drives, vdb, vdc ,vdd. 

The ceph.yaml file is configured: 
ceph::profile::params::osd_journal_size: 1024
ceph::profile::params::osd_pool_default_pg_num: 128
ceph::profile::params::osd_pool_default_pgp_num: 128
ceph::profile::params::osd_pool_default_size: 3
ceph::profile::params::osd_pool_default_min_size: 1
ceph::profile::params::osds:
     '/dev/vdb':
       journal: ''
     '/dev/vdc':
       journal: ''
     '/dev/vdd':
       journal: ''
ceph::profile::params::manage_repo: false
ceph::profile::params::authentication_type: cephx

ceph_pools:
  - volumes
  - vms
  - images

ceph_classes: []

ceph_osd_selinux_permissive: true

The result is the controller is installed on 1 of the servers with the 4 hard drives, 2 Ceph storage nodes are installed properly and an additional Ceph storage node is set with no OSDs. 

* the installation runs on a virtual setup

Version-Release number of selected component (if applicable):
python-tripleoclient-0.0.11-5.el7ost.noarch
openstack-tripleo-image-elements-0.9.7-1.el7ost.noarch
openstack-tripleo-common-0.0.2-4.el7ost.noarch
openstack-tripleo-puppet-elements-0.0.2-1.el7ost.noarch
openstack-tripleo-0.0.7-1.el7ost.noarch
openstack-tripleo-heat-templates-0.8.7-2.el7ost.noarch


How reproducible:
unknown

Steps to Reproduce:
1. Set ceph.yaml with additional hard drives
2. Install overcloud

Actual results:
The overcloud installation failed - the storage nodes are misconfigured and the OSPD says that installation finished successfully.

Expected results:
The OSPD should detect the server with the 4 hard drives, install and run the OSDs services 

Additional info:
Comment 2 Giulio Fidente 2016-02-03 10:50:49 EST
Can you paste the deploy command, attach any customized yaml and the output from 'sudo ceph status' from one of the controller nodes?
Comment 3 Giulio Fidente 2016-02-17 07:00:09 EST
hi Yogev, is this still a bug? Can you reply to comment #2?
Comment 4 Yogev Rabl 2016-02-18 10:14:44 EST
(In reply to Giulio Fidente from comment #3)
> hi Yogev, is this still a bug? Can you reply to comment #2?

Hi Giulio,
I got a workaround this issue:
1) Create a new flavor 
$openstack flavor create --id auto --ram 4096 --disk 10 --vcpus 1 cephStorage

2) Add a property to the flavor
$openstack flavor set --property 'cpu_arch'='x86_64' --property 'capabilities:boot_option'='local' --property 'capabilities:profile'='cephStorage' cephStorage

3) Add a property to the node with ironic
$ ironic node-update <ceph storage node uuid> add properties/capabilities='profile:cephStorage,boot_option:local'

And the customize yaml file is in the description of the bug
Comment 5 Mike Burns 2016-04-07 17:07:13 EDT
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.
Comment 9 Giulio Fidente 2016-10-11 05:50:53 EDT
If disks were previously formatted by/for a different Ceph cluster, the cluster FSID won't match and the OSP Director won't reuse them.

Before BZ #1370439, the deployment would not fail in such a scenario, but silently discard pre-owned disks.

With recent builds instead (which include the fix for BZ #1370439), the deployment should fail instead.

Can you retry formatting the Ceph disks with an empty GPT label during the deployment, as documented in: https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/single/red-hat-ceph-storage-for-the-overcloud/#Formatting_Ceph_Storage_Nodes_Disks_to_GPT
Comment 10 Yogev Rabl 2016-10-13 03:31:10 EDT
(In reply to Giulio Fidente from comment #9)
> If disks were previously formatted by/for a different Ceph cluster, the
> cluster FSID won't match and the OSP Director won't reuse them.
> 
> Before BZ #1370439, the deployment would not fail in such a scenario, but
> silently discard pre-owned disks.
> 
> With recent builds instead (which include the fix for BZ #1370439), the
> deployment should fail instead.
> 
> Can you retry formatting the Ceph disks with an empty GPT label during the
> deployment, as documented in:
> https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/
> single/red-hat-ceph-storage-for-the-overcloud/
> #Formatting_Ceph_Storage_Nodes_Disks_to_GPT

I have tried it and it worked.
Comment 11 Yogev Rabl 2016-10-13 03:32:08 EDT
A deployment finished successfully, with Giulio's comment

Note You need to log in before you can comment on or make changes to this bug.