Bug 1252546 - Ceph pg_num and pgp_num are correctly set in ceph.yaml but the pools always use 64
Summary: Ceph pg_num and pgp_num are correctly set in ceph.yaml but the pools always u...
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
urgent
unspecified
Target Milestone: ga
: 8.0 (Liberty)
Assignee: Giulio Fidente
QA Contact: Yogev Rabl
URL:
Whiteboard:
Keywords: Triaged, ZStream
Depends On:
Blocks: 1191185 1243520 1261979 1283721 1310828 1330065
TreeView+ depends on / blocked
 
Reported: 2015-08-11 16:23 UTC by jliberma@redhat.com
Modified: 2016-06-01 07:33 UTC (History)
40 users (show)

(edit)
Clone Of:
: 1330065 (view as bug list)
(edit)
Last Closed: 2016-04-07 21:38:41 UTC


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2016:0604 normal SHIPPED_LIVE Red Hat OpenStack Platform 8 director Enhancement Advisory 2016-04-08 01:03:56 UTC
OpenStack gerrit 277508 None None None 2016-02-08 17:39 UTC

Description jliberma@redhat.com 2015-08-11 16:23:07 UTC
Description of problem:

The hierdeata ceph.yaml default pg_num and pgp_num sizes are 128. 

Recommended placement groups =             
             (OSDs * 100)
Total PGs =  ------------ rounded up to nearest power of 2.
              pool size

With 40 OSDs, 3 pools and 3 replications, rounded up to the nearest power of 2, we have pg_num of 512.

http://docs.ceph.com/docs/master/rados/operations/placement-groups/#set-the-number-of-placement-groups

Changing the default in ceph.yaml to 512 and deploying from templates creates stack successfully but no OSDs are created and ceph is unusable.

Version-Release number of selected component (if applicable):
python-rdomanager-oscplugin-0.0.8-44.el7ost.noarch

How reproducible:

Steps to Reproduce:
1. Deploy undercloud
2. Modify ceph.yaml defaults to recommended number of pgs:
ceph::profile::params::osd_pool_default_pg_num: 512
ceph::profile::params::osd_pool_default_pgp_num: 512
ceph::profile::params::osd_pool_default_size: 3
3. Deploy overcloud with ceph storage servers via templates.

Actual results:
heat stack creates successfully but no OSDs are created. Ceph service starts/runs but is unusable.

Expected results:
ceph OSDs are created with correct pg_num OR heat stack create reports failure because no osds are created.

Additional info:

This is related to https://bugzilla.redhat.com/show_bug.cgi?id=1252158 but a separate case.

Comment 4 Giulio Fidente 2015-10-01 10:40:44 UTC
What I am seeing is a little different, the two hiera config parameters seem to work as expected; the deployment succeeded for me and I got the correct values set in ceph.conf.

Yet, the pools created by OSPd do not make use of the defaults and enforce pg_num to 64. Maybe we could update the BZ subject to reflect this status (and attach a fix for it)?

Comment 6 JF Bibeau 2015-11-04 18:42:33 UTC
I just ran into this today, I'm seeing the same behaviour as Giulio.

My OSDs are created properly, /etc/ceph/ceph.conf has the right value for pg_num and pgp_num (as provided in the ceph.yaml template), however the 4 pools that are created in the overcloud (rbd, images, volumes, vms) all have 64 PG. It's almost like if the pools were created *before* ceph.conf was updated, or that their values were passed in hardcoded to 64 upon creation?

Comment 7 Giulio Fidente 2015-11-05 10:05:01 UTC
Looks like there is a default value for pg_num in the pupept module for newly created pools [1] set to 64, which makes it ignore the default we put in the config file; needs fixing.

1. https://github.com/openstack/puppet-ceph/blob/master/manifests/pool.pp#L48

Comment 8 Giulio Fidente 2015-11-05 10:38:54 UTC
Apparently we can't omit passing a pg_num when creating a new pool because the 'ceph osd pool create' command requires it as well, instead of using the default value from ceph.conf

Maybe we can make the default value at [2] come from [3] though?

1. http://tracker.ceph.com/issues/13702
2. https://github.com/openstack/puppet-ceph/blob/master/manifests/pool.pp#L48
3. https://github.com/openstack/puppet-ceph/blob/master/manifests/profile/params.pp#L44

Comment 10 Pratik Pravin Bandarkar 2015-11-06 10:15:47 UTC
Is there any workaround for the issue ? can we try after editing "$pg_num" value from /usr/share/openstack-puppet/modules/ceph/manifests/pool.pp ?

Comment 12 Martin Magr 2015-11-06 10:42:47 UTC
TBH I don't think changing default value is a proper solution (while it may fix the problem for now). Next time you will need to change pg_num you will open same bug, realy :)? Let's fix it properly once and for all. I believe that reusing values from ceph::profile::params [1] should be moved to ceph::pool definition itself [2]. Also I wonder where ceph_pools resouce comes from? There's none in puppet-ceph [3] and neither in THT [4]. So I believe this is a bug in THT manifests.
 
[1] https://github.com/openstack/tripleo-heat-templates/blob/618d14d7cf7f9a6499a7bf75ef2c01337d53715c/puppet/manifests/overcloud_controller_pacemaker.pp#L686
[2] https://github.com/openstack/tripleo-heat-templates/blob/618d14d7cf7f9a6499a7bf75ef2c01337d53715c/puppet/manifests/overcloud_controller_pacemaker.pp#L693
[3] https://github.com/openstack/puppet-ceph/search?utf8=%E2%9C%93&q=ceph_pool
[4] https://github.com/openstack/tripleo-heat-templates/search?utf8=%E2%9C%93&q=ceph_pool

Comment 13 Giulio Fidente 2015-11-06 11:24:11 UTC
hi Martin, thanks for helping.

I can move the hiera calls for the ceph::profile::params into [1] but is that the best approach? Shouldn't it be the puppet-ceph module using the default instead of hardcoding pg_num to 64 on new pools [2]?

Regarding $ceph_pools, it comes from Heat as hieradata.

1. https://github.com/openstack/tripleo-heat-templates/blob/618d14d7cf7f9a6499a7bf75ef2c01337d53715c/puppet/manifests/overcloud_controller_pacemaker.pp#L693
2. https://github.com/openstack/puppet-ceph/blob/master/manifests/pool.pp#L48

Comment 14 Felipe Alfaro Solana 2015-11-20 09:26:25 UTC
I have proposed https://bugzilla.redhat.com/show_bug.cgi?id=1283721 which should fix this problem. Hopefully, it's very easy.

Comment 15 Giulio Fidente 2015-11-20 10:44:55 UTC
*** Bug 1283721 has been marked as a duplicate of this bug. ***

Comment 16 Giulio Fidente 2015-11-20 10:56:40 UTC
hi Felipe, thanks for the update! We posted a change to fix this bug, that is https://review.openstack.org/#/c/242456/ and it will be backported in future versions of the OSP Director.

I see you are working on a change which will make this even more flexible by allowing usage of different pg_num, pgp_num and size for each Ceph pool, which I think is great to have. I will update the bz #1283721 to track your change there.

Comment 17 jliberma@redhat.com 2015-11-20 14:03:53 UTC
There is a simple algorithm from inktank in mojo for calculating sensible pg settings. Should we add some intelligence to the director to set those numbers automatically? Its a factor os pool count, OSD count, etc. I can file a separate RFE once this bug is resolved.

Comment 18 Jaromir Coufal 2015-11-30 09:32:45 UTC
That's very reasonable, Jacob. Can you please file a separate RFE for it?

Comment 20 jliberma@redhat.com 2015-11-30 21:08:08 UTC
Filed RFE for automating sensible pg settings.

https://bugzilla.redhat.com/show_bug.cgi?id=1286841

Comment 23 Yogev Rabl 2016-03-03 15:00:05 UTC
verified on 
python-tripleoclient-0.1.1-4.el7ost.noarch
openstack-tripleo-puppet-elements-0.0.2-3.el7ost.noarch
openstack-tripleo-0.0.7-1.el7ost.noarch
openstack-tripleo-common-0.1.1-1.el7ost.noarch
openstack-tripleo-heat-templates-kilo-0.8.8-2.el7ost.noarch
openstack-tripleo-image-elements-0.9.7-2.el7ost.noarch
openstack-tripleo-heat-templates-0.8.8-2.el7ost.noarch

Comment 27 errata-xmlrpc 2016-04-07 21:38:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0604.html


Note You need to log in before you can comment on or make changes to this bug.