Bug 1292981

Summary: Support customization of PGs per OSD in director
Product: Red Hat OpenStack Reporter: Dan Yocum <dyocum>
Component: rhosp-directorAssignee: John Fulton <johfulto>
Status: CLOSED DUPLICATE QA Contact: Yogev Rabl <yrabl>
Severity: high Docs Contact:
Priority: medium    
Version: 7.0 (Kilo)CC: jcoufal, johfulto, jomurphy, mburns, nmorell, rhel-osp-director-maint
Target Milestone: Upstream M2Keywords: Triaged
Target Release: 11.0 (Ocata)   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-11 16:10:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1387433, 1413723    

Description Dan Yocum 2015-12-18 23:07:27 UTC
Description of problem:

Director doesn't create enough PGs even when the hieradata/ceph.yaml file is updated to create 4096.

Version-Release number of selected component (if applicable):

v7.1

How reproducible:

always

Steps to Reproduce:
1. Deploy overcloud (3 control, 4 ceph, 1 compute), each ceph node has 11 SATA devices for OSDs
2. Run 'ceph health'
3.

Actual results:

HEALTH_WARN too few pgs per osd (19 < min 30)

Expected results:

Not sure.  Not that error.
Additional info:

Comment 2 Mike Burns 2015-12-20 10:34:13 UTC
*** Bug 1292982 has been marked as a duplicate of this bug. ***

Comment 4 Mike Burns 2016-04-07 21:00:12 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 7 seb 2016-10-13 14:03:39 UTC
PGs are configurable so we don't enforce any PGs.
If it's too low or too big it's a misconfiguration of the variable.

Comment 8 John Fulton 2016-10-13 15:16:36 UTC
Two points here. 

A. The admin needs to know how to set this value correctly as described in the doc: https://access.redhat.com/documentation/en/red-hat-ceph-storage/2/paged/storage-strategies-guide/chapter-3-placement-groups-pgs

B. If the admin changes the value, it should be propagated to the overcloud. 

The bug here is that point B above is not working correctly. Even with the OSPd 10 puddle 2016-10-07.4. 

Instead, what's happening is if the value gets updated on OSPd and the deploy re-run, then the value is only getting updated on the OSD servers ceph.conf. Instead, what should happen is this: 

1. The value should be getting set on the ceph MONITOR's ceph.conf (mons are responsible for creating pools, osds don't need to care)

2. A command like `ceph osd pool set $x pg_num $y` needs to be run on one of the ceph monitors for each pool, but only in the case of an update (on initial create this dose not need to be run). 

Marking this bug as verified. 

Note that this only affects updates to the Ceph cluster. Newly created Ceph clusters do not have this problem provided that the value was set correctly as described in point A above. However, OSPd should correctly support scenario B and do steps 1 and 2 above.

Comment 9 John Fulton 2016-10-13 15:18:40 UTC
Here's my testing to backup my claim that it's not doing the right thing and provide a workaround until the bug is fixed. 

I have a ceph cluster deployed by OSDpd: 

[root@overcloud-controller-0 ~]# ceph -s
    cluster de69d22e-90bb-11e6-b2c6-525400330666
     health HEALTH_WARN
            clock skew detected on mon.overcloud-controller-0
            too few PGs per OSD (14 < min 30)
            Monitor clock skew detected 
     monmap e1: 3 mons at {overcloud-controller-0=172.16.1.12:6789/0,overcloud-controller-1=172.16.1.11:6789/0,overcloud-controller-2=172.16.1.18:6789/0}
            election epoch 6, quorum 0,1,2 overcloud-controller-1,overcloud-controller-0,overcloud-controller-2
     osdmap e163: 48 osds: 48 up, 48 in
            flags sortbitwise
      pgmap v2999: 224 pgs, 6 pools, 10247 MB data, 1646 objects
            32631 MB used, 53596 GB / 53628 GB avail
                 224 active+clean
[root@overcloud-controller-0 ~]# 

I updated PG number setting on undercloud from 256 to 512: 

[stack@hci-director ~]$ diff custom-templates/custom-hci.yaml ~/backup/custom-templates/custom-hci.yaml 
56c56
<     ceph::profile::params::osd_pool_default_pg_num: 512
---
>     ceph::profile::params::osd_pool_default_pg_num: 256
[stack@hci-director ~]$ 

My deploy command uses the updated template. 

[stack@hci-director ~]$ cat deploy-hci.sh 
source ~/stackrc
time openstack overcloud deploy --templates ~/templates \
-e ~/templates/environments/puppet-pacemaker.yaml \
-e ~/templates/environments/storage-environment.yaml \
-e ~/templates/environments/network-isolation.yaml \
-e ~/templates/environments/hyperconverged-ceph.yaml \
-e ~/custom-templates/custom-hci.yaml \
--control-flavor control \
--control-scale 3 \
--compute-flavor compute \
--compute-scale 4 \
--ntp-server 10.5.26.10 \
--neutron-bridge-mappings datacentre:br-ex,tenant:br-tenant \
--neutron-network-type vlan \
--neutron-network-vlan-ranges tenant:4051:4060 \
--neutron-disable-tunneling 

[stack@hci-director ~]$

re-running deploy to push configuration update to coverlcoud

[stack@hci-director ~]$ ./deploy-hci.sh 
Removing the current plan files
Uploading new plan files
Started Mistral Workflow. Execution ID: a8b01497-6504-4abb-ac8a-86e5c779b27c
...
2016-10-13 14:39:53Z [AllNodesDeploySteps]: UPDATE_COMPLETE  state changed
2016-10-13 14:40:03Z [overcloud]: UPDATE_COMPLETE  Stack UPDATE completed successfully

 Stack overcloud UPDATE_COMPLETE 

Overcloud Endpoint: http://10.19.139.46:5000/v2.0
Overcloud Deployed

real    17m40.239s
user    0m2.207s
sys     0m0.201s
[stack@hci-director ~]$ 

The OSD has new value in hiera: 

[root@overcloud-novacompute-0 ~]# hiera ceph::profile::params::osd_pool_default_pg_num
512
[root@overcloud-novacompute-0 ~]#

PG number updated in ceph.conf: 

[root@overcloud-novacompute-0 ~]# grep pg_num /etc/ceph/ceph.conf
osd_pool_default_pg_num = 512
[root@overcloud-novacompute-0 ~]# 

However, the Monitor does not. 

[root@overcloud-controller-0 hieradata]# hiera ceph::profile::params::osd_pool_default_pg_num
32
[root@overcloud-controller-0 hieradata]# grep pg_num /etc/ceph/ceph.conf
osd_pool_default_pg_num = 32
[root@overcloud-controller-0 hieradata]# ceph osd pool get vms pg_num
pg_num: 32
[root@overcloud-controller-0 hieradata]# 

As a workaround the admin would need to run the following on the ceph cluster: 

 ceph osd pool set $pool pg_num $new_size

When the bug is fixed, ideally we'd want puppet-ceph to know that an update is happening and run the above. 

# for i in rbd images volumes vms; do
 ceph osd pool set $i pg_num 256;
 sleep 10
 ceph osd pool set $i pgp_num 256;
 sleep 10
done
set pool 0 pg_num to 256
set pool 0 pgp_num to 256
set pool 1 pg_num to 256
set pool 1 pgp_num to 256
set pool 2 pg_num to 256
set pool 2 pgp_num to 256
set pool 3 pg_num to 256
set pool 3 pgp_num to 256
...

As per Jacob Liberman: "The sleep statements are intended to ensure the cluster has time to complete the previous action before proceeding. If a large increase is needed increase pg_num in stages."

Comment 10 John Fulton 2016-10-13 16:24:33 UTC
reproduction above was with

[stack@hci-director ~]$ rpm -q openstack-tripleo-heat-templates puppet-ceph 
openstack-tripleo-heat-templates-5.0.0-0.20161003064637.d636e3a.1.1.el7ost.noarch
puppet-ceph-2.2.0-1.el7ost.noarch
[stack@hci-director ~]$

Comment 11 John Fulton 2016-10-13 21:05:16 UTC
Another way to look at this is if we support updates to these values with OSPd or if OSPd's job here is just to set the value correctly the first and the admin should update it later by running `ceph osd pool set $pool pg_num $new_size` as part of the normal cloud maintenance. Perhaps supporting the update could be considered an RFE.

Comment 12 John Fulton 2017-01-11 16:10:49 UTC
Update on this: 

- Reviewed this with more senior puppet-ceph devs. Consensus is that TripleO should assign the default correctly in ceph.conf of the monitor nodes so that when new pools are created they get the new default. 

- In an admin wishes to change his pool size for an existing pool, then they need to update it with `ceph osd pool set $i pgp_num $num`

- We verified that not only does TripleO support customization of PGs, it also supports customization of PGs _per pool_ as per 1283721 and see comment #33 to show that it's been tested. Thus, I'm closing this as a duplicate of said bug. https://bugzilla.redhat.com/show_bug.cgi?id=1283721#c33

*** This bug has been marked as a duplicate of bug 1283721 ***