Bug 1410554

Summary:

OSPD failed to change the ownership of disks

Product:

Red Hat OpenStack

Reporter:

Yogev Rabl <yrabl>

Component:

openstack-puppet-modules

Assignee:

Giulio Fidente <gfidente>

Status:

CLOSED DUPLICATE

QA Contact:

Yogev Rabl <yrabl>

Severity:

medium

Docs Contact:

Derek <dcadzow>

Priority:

medium

Version:

10.0 (Newton)

CC:

bengland, jefbrown, johfulto, jomurphy, jtaleric, mburns, nlevine, psanchez, rhel-osp-director-maint, srevivo, twilkins, yrabl

Target Milestone:

---

Keywords:

FutureFeature, Triaged, ZStream

Target Release:

10.0 (Newton)

Hardware:

x86_64

OS:

Linux

Whiteboard:

scale_lab

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2017-09-20 14:49:21 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1414467, 1481685

Attachments:

Description	Flags
all of the logs, templates and deployment command	none

Description Yogev Rabl 2017-01-05 17:55:45 UTC

Created attachment 1237770 [details]
all of the logs, templates and deployment command

Description of problem:
A deployment of 3 controller nodes, 3 compute nodes and 15 Ceph storage nodes. Each Ceph storage node should have 24 OSDs running. 
The overall OSD number should be 360, the deployment ended with a failure and 355 OSDs. 

the failure logs are:
  deploy_stderr: |
    ...
            chown -h ceph:ceph /dev/sda
        fi
    fi
    ceph-disk prepare  --cluster-uuid 7c12ae5a-c871-11e6-9b00-b8ca3a66e37c /dev/sda /dev/nvme0n1
    udevadm settle


Version-Release number of selected component (if applicable):
openstack-tripleo-puppet-elements-5.1.0-2.el7ost.noarch
openstack-tripleo-ui-1.0.5-3.el7ost.noarch
openstack-tripleo-image-elements-5.1.0-1.el7ost.noarch
openstack-tripleo-heat-templates-5.1.0-7.el7ost.noarch
python-tripleoclient-5.4.0-2.el7ost.noarch
puppet-tripleo-5.4.0-3.el7ost.noarch
openstack-tripleo-common-5.4.0-3.el7ost.noarch
openstack-tripleo-validations-5.1.0-5.el7ost.noarch
openstack-tripleo-0.0.8-0.2.4de13b3git.el7ost.noarch

How reproducible:
25%

Steps to Reproduce:
1. Deploy the overcloud on a similar environment as described in the files attached

Actual results:
the deployment failed with lower number of OSDs deployed

Expected results:
the deployment passed successfully with all OSDs active

Additional info:

Comment 1 John Fulton 2017-01-05 19:50:36 UTC

It looks like puppet-ceph ran into a problem when trying to prepare the OSDs. I've updated this to DFG:Ceph and assigned it to gfidente for now.

Comment 6 John Fulton 2017-01-18 18:26:58 UTC

TripleO reports the Overcloud deploy failed, but look at the numbers: 

- 360 OSDs were requested
- 355 OSDs that work were provided, a 98.6% success rate

It reports a failure because it didn't have 100% success, but I suspect that the Ceph cluster and the rest of the Overcloud were still be usable; just not with all available OSDs. 

We should test if the deployer can simply re-run the deploy command on the existing overcloud (the way they do overcloud updates) and if it gets all of the OSDs working. If that doesn't work, then I think a desired behavior would be for the deployer to be able to do this.

Comment 12 jomurphy 2017-09-20 14:49:21 UTC


*** This bug has been marked as a duplicate of bug 1445436 ***

Comment 13 Red Hat Bugzilla 2023-09-14 03:36:57 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days