Bug 1410554

Summary: OSPD failed to change the ownership of disks
Product: Red Hat OpenStack Reporter: Yogev Rabl <yrabl>
Component: openstack-puppet-modulesAssignee: Giulio Fidente <gfidente>
Status: CLOSED DUPLICATE QA Contact: Yogev Rabl <yrabl>
Severity: medium Docs Contact: Derek <dcadzow>
Priority: medium    
Version: 10.0 (Newton)CC: bengland, jefbrown, johfulto, jomurphy, jtaleric, mburns, nlevine, psanchez, rhel-osp-director-maint, srevivo, twilkins, yrabl
Target Milestone: ---Keywords: FutureFeature, Triaged, ZStream
Target Release: 10.0 (Newton)   
Hardware: x86_64   
OS: Linux   
Whiteboard: scale_lab
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-09-20 14:49:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1414467, 1481685    
Attachments:
Description Flags
all of the logs, templates and deployment command none

Description Yogev Rabl 2017-01-05 17:55:45 UTC
Created attachment 1237770 [details]
all of the logs, templates and deployment command

Description of problem:
A deployment of 3 controller nodes, 3 compute nodes and 15 Ceph storage nodes. Each Ceph storage node should have 24 OSDs running. 
The overall OSD number should be 360, the deployment ended with a failure and 355 OSDs. 

the failure logs are:
  deploy_stderr: |
    ...
            chown -h ceph:ceph /dev/sda
        fi
    fi
    ceph-disk prepare  --cluster-uuid 7c12ae5a-c871-11e6-9b00-b8ca3a66e37c /dev/sda /dev/nvme0n1
    udevadm settle


Version-Release number of selected component (if applicable):
openstack-tripleo-puppet-elements-5.1.0-2.el7ost.noarch
openstack-tripleo-ui-1.0.5-3.el7ost.noarch
openstack-tripleo-image-elements-5.1.0-1.el7ost.noarch
openstack-tripleo-heat-templates-5.1.0-7.el7ost.noarch
python-tripleoclient-5.4.0-2.el7ost.noarch
puppet-tripleo-5.4.0-3.el7ost.noarch
openstack-tripleo-common-5.4.0-3.el7ost.noarch
openstack-tripleo-validations-5.1.0-5.el7ost.noarch
openstack-tripleo-0.0.8-0.2.4de13b3git.el7ost.noarch

How reproducible:
25%

Steps to Reproduce:
1. Deploy the overcloud on a similar environment as described in the files attached

Actual results:
the deployment failed with lower number of OSDs deployed

Expected results:
the deployment passed successfully with all OSDs active

Additional info:

Comment 1 John Fulton 2017-01-05 19:50:36 UTC
It looks like puppet-ceph ran into a problem when trying to prepare the OSDs. I've updated this to DFG:Ceph and assigned it to gfidente for now.

Comment 6 John Fulton 2017-01-18 18:26:58 UTC
TripleO reports the Overcloud deploy failed, but look at the numbers: 

- 360 OSDs were requested
- 355 OSDs that work were provided, a 98.6% success rate

It reports a failure because it didn't have 100% success, but I suspect that the Ceph cluster and the rest of the Overcloud were still be usable; just not with all available OSDs. 

We should test if the deployer can simply re-run the deploy command on the existing overcloud (the way they do overcloud updates) and if it gets all of the OSDs working. If that doesn't work, then I think a desired behavior would be for the deployer to be able to do this.

Comment 12 jomurphy 2017-09-20 14:49:21 UTC

*** This bug has been marked as a duplicate of bug 1445436 ***

Comment 13 Red Hat Bugzilla 2023-09-14 03:36:57 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days