Description of problem:
The hierdeata ceph.yaml default pg_num and pgp_num sizes are 128.
Recommended placement groups =
(OSDs * 100)
Total PGs = ------------ rounded up to nearest power of 2.
With 40 OSDs, 3 pools and 3 replications, rounded up to the nearest power of 2, we have pg_num of 512.
Changing the default in ceph.yaml to 512 and deploying from templates creates stack successfully but no OSDs are created and ceph is unusable.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Deploy undercloud
2. Modify ceph.yaml defaults to recommended number of pgs:
3. Deploy overcloud with ceph storage servers via templates.
heat stack creates successfully but no OSDs are created. Ceph service starts/runs but is unusable.
ceph OSDs are created with correct pg_num OR heat stack create reports failure because no osds are created.
This is related to https://bugzilla.redhat.com/show_bug.cgi?id=1252158 but a separate case.
What I am seeing is a little different, the two hiera config parameters seem to work as expected; the deployment succeeded for me and I got the correct values set in ceph.conf.
Yet, the pools created by OSPd do not make use of the defaults and enforce pg_num to 64. Maybe we could update the BZ subject to reflect this status (and attach a fix for it)?
I just ran into this today, I'm seeing the same behaviour as Giulio.
My OSDs are created properly, /etc/ceph/ceph.conf has the right value for pg_num and pgp_num (as provided in the ceph.yaml template), however the 4 pools that are created in the overcloud (rbd, images, volumes, vms) all have 64 PG. It's almost like if the pools were created *before* ceph.conf was updated, or that their values were passed in hardcoded to 64 upon creation?
Looks like there is a default value for pg_num in the pupept module for newly created pools  set to 64, which makes it ignore the default we put in the config file; needs fixing.
Apparently we can't omit passing a pg_num when creating a new pool because the 'ceph osd pool create' command requires it as well, instead of using the default value from ceph.conf
Maybe we can make the default value at  come from  though?
Is there any workaround for the issue ? can we try after editing "$pg_num" value from /usr/share/openstack-puppet/modules/ceph/manifests/pool.pp ?
TBH I don't think changing default value is a proper solution (while it may fix the problem for now). Next time you will need to change pg_num you will open same bug, realy :)? Let's fix it properly once and for all. I believe that reusing values from ceph::profile::params  should be moved to ceph::pool definition itself . Also I wonder where ceph_pools resouce comes from? There's none in puppet-ceph  and neither in THT . So I believe this is a bug in THT manifests.
hi Martin, thanks for helping.
I can move the hiera calls for the ceph::profile::params into  but is that the best approach? Shouldn't it be the puppet-ceph module using the default instead of hardcoding pg_num to 64 on new pools ?
Regarding $ceph_pools, it comes from Heat as hieradata.
I have proposed https://bugzilla.redhat.com/show_bug.cgi?id=1283721 which should fix this problem. Hopefully, it's very easy.
*** Bug 1283721 has been marked as a duplicate of this bug. ***
hi Felipe, thanks for the update! We posted a change to fix this bug, that is https://review.openstack.org/#/c/242456/ and it will be backported in future versions of the OSP Director.
I see you are working on a change which will make this even more flexible by allowing usage of different pg_num, pgp_num and size for each Ceph pool, which I think is great to have. I will update the bz #1283721 to track your change there.
There is a simple algorithm from inktank in mojo for calculating sensible pg settings. Should we add some intelligence to the director to set those numbers automatically? Its a factor os pool count, OSD count, etc. I can file a separate RFE once this bug is resolved.
That's very reasonable, Jacob. Can you please file a separate RFE for it?
Filed RFE for automating sensible pg settings.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.