Bug 1304367

Summary: overcloud deployment finished successfully and Ceph's OSDs are down
Product: Red Hat OpenStack Reporter: Yogev Rabl <yrabl>
Component: openstack-puppet-modulesAssignee: Emilien Macchi <emacchi>
Status: CLOSED ERRATA QA Contact: Yogev Rabl <yrabl>
Severity: high Docs Contact:
Priority: high    
Version: 8.0 (Liberty)CC: achernet, alan_bishop, arkady_kanevsky, cdevine, christopher_dearborn, gael_rehault, jguiditt, joherr, John_walsh, kbader, kbasil, kschinck, kurt_hey, mburns, morazi, nlevine, randy_perryman, rhel-osp-director-maint, rsussman, sreichar, wayne_allen, yeylon
Target Milestone: ga   
Target Release: 8.0 (Liberty)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-puppet-modules-7.0.10-1.el7ost Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-04-07 21:27:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1261979, 1310828    
Attachments:
Description Flags
overcloud deployment log none

Description Yogev Rabl 2016-02-03 12:35:54 UTC
Created attachment 1120745 [details]
overcloud deployment log

Description of problem:
The deployment of the overcloud installed 3 Ceph storage nodes, each with 4 hard drives (1 for te OS 3 for the OSDs) was successful, finished with return value of 0. 
Though the deployment was a success the OSDs are down. The services didn't start - had to do it manually. 

Version-Release number of selected component (if applicable):
openstack-tripleo-image-elements-0.9.7-1.el7ost.noarch
openstack-tripleo-common-0.0.2-4.el7ost.noarch
openstack-tripleo-puppet-elements-0.0.2-1.el7ost.noarch
openstack-tripleo-0.0.7-1.el7ost.noarch
openstack-tripleo-heat-templates-0.8.7-2.el7ost.noarch

(though I know the same happens with OSPD 7.2 and 7.3) 

How reproducible:
100%

Steps to Reproduce:
1. Add additional hard drives to the would be Ceph storage nodes
2. Set the ceph.yaml file with additional hard drives:
ceph::profile::params::osds:
     '/dev/vdb':
       journal: ''
     '/dev/vdc':
       journal: ''
     '/dev/vdd':
       journal: ''
3. Deploy the overcloud

Actual results:
The OSDs are down

[heat-admin@overcloud-controller-0 ~]$ sudo ceph osd tree 
ID WEIGHT  TYPE NAME                        UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1 0.59995 root default                                                       
-2 0.29997     host overcloud-cephstorage-2                                   
 0 0.09999         osd.0                       down        0          1.00000 
 4 0.09999         osd.4                       down        0          1.00000 
 7 0.09999         osd.7                       down        0          1.00000 
-3 0.29997     host overcloud-cephstorage-1                                   
 1 0.09999         osd.1                       down        0          1.00000 
 6 0.09999         osd.6                       down        0          1.00000 
 8 0.09999         osd.8                       down        0          1.00000 
 2       0 osd.2                               down        0          1.00000 
 3       0 osd.3                               down        0          1.00000 
 5       0 osd.5                               down        0          1.00000

Expected results:
All the OSDs should be up

Additional info:

Comment 2 Alan Bishop 2016-02-05 19:01:06 UTC
Could this be a duplicate of #1298620?

Comment 3 arkady kanevsky 2016-02-16 04:47:59 UTC
Alan,
yes, but this is really an equivalent BZ for 1297251 but targeted for OSP8.

Comment 4 Alan Bishop 2016-02-19 13:36:44 UTC
There are many BZs with same root cause (udev rules cause OSDs to be down after deployment), one of which is 1309926 and is targeted for OSP8. That BZ is being actively worked, and an external tracker (https://review.openstack.org/276141) is nearly resolved. I think this BZ should be marked as a duplicate of 1309926.

Comment 5 Emilien Macchi 2016-02-25 22:12:25 UTC
*** Bug 1309926 has been marked as a duplicate of this bug. ***

Comment 8 Yogev Rabl 2016-03-24 15:20:51 UTC
The deployment finished successfully with the OSDs up and running 

version:
openstack-puppet-modules-7.0.15-1.el7ost.noarch

Comment 9 errata-xmlrpc 2016-04-07 21:27:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0603.html