Bug 1410554 - OSPD failed to change the ownership of disks
Summary: OSPD failed to change the ownership of disks
Keywords:
Status: CLOSED DUPLICATE of bug 1445436
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-puppet-modules
Version: 10.0 (Newton)
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: 10.0 (Newton)
Assignee: Giulio Fidente
QA Contact: Yogev Rabl
Derek
URL:
Whiteboard: scale_lab
Depends On:
Blocks: 1414467 1481685
TreeView+ depends on / blocked
 
Reported: 2017-01-05 17:55 UTC by Yogev Rabl
Modified: 2023-09-14 03:36 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-09-20 14:49:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
all of the logs, templates and deployment command (10.64 MB, application/x-gzip)
2017-01-05 17:55 UTC, Yogev Rabl
no flags Details

Description Yogev Rabl 2017-01-05 17:55:45 UTC
Created attachment 1237770 [details]
all of the logs, templates and deployment command

Description of problem:
A deployment of 3 controller nodes, 3 compute nodes and 15 Ceph storage nodes. Each Ceph storage node should have 24 OSDs running. 
The overall OSD number should be 360, the deployment ended with a failure and 355 OSDs. 

the failure logs are:
  deploy_stderr: |
    ...
            chown -h ceph:ceph /dev/sda
        fi
    fi
    ceph-disk prepare  --cluster-uuid 7c12ae5a-c871-11e6-9b00-b8ca3a66e37c /dev/sda /dev/nvme0n1
    udevadm settle


Version-Release number of selected component (if applicable):
openstack-tripleo-puppet-elements-5.1.0-2.el7ost.noarch
openstack-tripleo-ui-1.0.5-3.el7ost.noarch
openstack-tripleo-image-elements-5.1.0-1.el7ost.noarch
openstack-tripleo-heat-templates-5.1.0-7.el7ost.noarch
python-tripleoclient-5.4.0-2.el7ost.noarch
puppet-tripleo-5.4.0-3.el7ost.noarch
openstack-tripleo-common-5.4.0-3.el7ost.noarch
openstack-tripleo-validations-5.1.0-5.el7ost.noarch
openstack-tripleo-0.0.8-0.2.4de13b3git.el7ost.noarch

How reproducible:
25%

Steps to Reproduce:
1. Deploy the overcloud on a similar environment as described in the files attached

Actual results:
the deployment failed with lower number of OSDs deployed

Expected results:
the deployment passed successfully with all OSDs active

Additional info:

Comment 1 John Fulton 2017-01-05 19:50:36 UTC
It looks like puppet-ceph ran into a problem when trying to prepare the OSDs. I've updated this to DFG:Ceph and assigned it to gfidente for now.

Comment 6 John Fulton 2017-01-18 18:26:58 UTC
TripleO reports the Overcloud deploy failed, but look at the numbers: 

- 360 OSDs were requested
- 355 OSDs that work were provided, a 98.6% success rate

It reports a failure because it didn't have 100% success, but I suspect that the Ceph cluster and the rest of the Overcloud were still be usable; just not with all available OSDs. 

We should test if the deployer can simply re-run the deploy command on the existing overcloud (the way they do overcloud updates) and if it gets all of the OSDs working. If that doesn't work, then I think a desired behavior would be for the deployer to be able to do this.

Comment 12 jomurphy 2017-09-20 14:49:21 UTC

*** This bug has been marked as a duplicate of bug 1445436 ***

Comment 13 Red Hat Bugzilla 2023-09-14 03:36:57 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.