Bug 1410554 - OSPD failed to change the ownership of disks [NEEDINFO]
Summary: OSPD failed to change the ownership of disks
Status: CLOSED DUPLICATE of bug 1445436
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-puppet-modules
Version: 10.0 (Newton)
Hardware: x86_64
OS: Linux
Target Milestone: ---
: 10.0 (Newton)
Assignee: Giulio Fidente
QA Contact: Yogev Rabl
Whiteboard: scale_lab
Depends On:
Blocks: 1481685 1414467
TreeView+ depends on / blocked
Reported: 2017-01-05 17:55 UTC by Yogev Rabl
Modified: 2017-09-20 14:49 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2017-09-20 14:49:21 UTC
Target Upstream Version:
gfidente: needinfo? (yrabl)

Attachments (Terms of Use)
all of the logs, templates and deployment command (10.64 MB, application/x-gzip)
2017-01-05 17:55 UTC, Yogev Rabl
no flags Details

Description Yogev Rabl 2017-01-05 17:55:45 UTC
Created attachment 1237770 [details]
all of the logs, templates and deployment command

Description of problem:
A deployment of 3 controller nodes, 3 compute nodes and 15 Ceph storage nodes. Each Ceph storage node should have 24 OSDs running. 
The overall OSD number should be 360, the deployment ended with a failure and 355 OSDs. 

the failure logs are:
  deploy_stderr: |
            chown -h ceph:ceph /dev/sda
    ceph-disk prepare  --cluster-uuid 7c12ae5a-c871-11e6-9b00-b8ca3a66e37c /dev/sda /dev/nvme0n1
    udevadm settle

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Deploy the overcloud on a similar environment as described in the files attached

Actual results:
the deployment failed with lower number of OSDs deployed

Expected results:
the deployment passed successfully with all OSDs active

Additional info:

Comment 1 John Fulton 2017-01-05 19:50:36 UTC
It looks like puppet-ceph ran into a problem when trying to prepare the OSDs. I've updated this to DFG:Ceph and assigned it to gfidente for now.

Comment 6 John Fulton 2017-01-18 18:26:58 UTC
TripleO reports the Overcloud deploy failed, but look at the numbers: 

- 360 OSDs were requested
- 355 OSDs that work were provided, a 98.6% success rate

It reports a failure because it didn't have 100% success, but I suspect that the Ceph cluster and the rest of the Overcloud were still be usable; just not with all available OSDs. 

We should test if the deployer can simply re-run the deploy command on the existing overcloud (the way they do overcloud updates) and if it gets all of the OSDs working. If that doesn't work, then I think a desired behavior would be for the deployer to be able to do this.

Comment 12 jomurphy 2017-09-20 14:49:21 UTC

*** This bug has been marked as a duplicate of bug 1445436 ***

Note You need to log in before you can comment on or make changes to this bug.