Bug 1603615 - Ceph PG calculator conflict with mon_max_pg_per_osd
Summary: Ceph PG calculator conflict with mon_max_pg_per_osd
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 3.1
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: rc
: 3.2
Assignee: Neha Ojha
QA Contact: ceph-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-19 18:05 UTC by Ben England
Modified: 2018-08-22 21:25 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-08-22 21:25:18 UTC
Embargoed:


Attachments (Terms of Use)
screenshot of Ceph PG calculator output (deleted)
2018-07-19 18:05 UTC, Ben England
no flags Details
screenshot try 2 of Ceph PG calculator (285.07 KB, image/png)
2018-07-19 18:09 UTC, Ben England
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 25112 0 None None None 2018-07-26 00:54:22 UTC

Description Ben England 2018-07-19 18:05:26 UTC
Description of problem:

The Ceph PG calculator can generate recommendations for pool PG counts that will conflict with the osd_max_pgs_per_osd parameter.    This can cause significant aggravation for the installer, particularly when OpenStack is deploying a Ceph cluster.  

Version-Release number of selected component (if applicable):

RHCS 3.1 - ceph-common-12.2.4-10.el7cp.x86_64
RHOSP 13 rc6
RHEL 7.5 - 3.10.0-862.3.3.el7.x86_64

How reproducible:

every time.

Steps to Reproduce:

- plug this scenario into the PG calculator at 

https://access.redhat.com/labs/cephpgc/

-- 1000 OSDs 
-- 95% space used for "vms" pool
-- 5% space used for glance "images" pool
-- none for any other pool
See attachment for Ceph PG calculator output

2. Add up the PG counts for each pool and multiply by 3 (replication count), the total is:

(512+32768+32768+4096)*3 = 70144*3 = 210432

Compare to mon_max_pgs_per_osd * 1000 OSDs = 200 * 1000 = 200000

3. 

Actual results:

Pool creation will fail.

Expected results:

PG Calc should not conflict with osd_max_pgs_per_osd, ever!

Additional info:

I spoke with Ceph developers at upstream perf weekly, their conclusion was that we needed to start using the ceph-mgr balancer module (which is in Luminous = RHCS 3) and then we wouldn't need so many PGs.  But then PG calculator needs an update at a minimum.  I was able to enable the balancer module in RHOSP 13 ceph-mgr container, but I don't know if it works yet.  RHOSP 13 installer and ceph-ansible certainly do not enable it by default.

http://docs.ceph.com/docs/luminous/mgr/balancer/

My suggestion would be to lower the PG calculator's recommendations, since it was developed prior to having the ceph-mgr balancer module.  But by how much?  I would need more experience with effectiveness of balancer module in different-sized configurations before I could give you a clear answer on this.

background:

change to RHCS 3 that leads to this:
https://ceph.com/community/new-luminous-pg-overdose-protection/

code that implements osd_max_pgs_per_osd check:
https://github.com/ceph/ceph/blob/e59258943bcfe3e52d40a59ff30df55e1e6a3865/src/mon/OSDMonitor.cc#L5670-L5698

Comment 3 Ben England 2018-07-19 18:09:42 UTC
Created attachment 1460926 [details]
screenshot try 2 of Ceph PG calculator

shows output of Ceph PG calculator for specified inputs

Comment 4 Josh Durgin 2018-07-25 22:52:57 UTC
Since this is already in the wild docs/pg calc, let's increase the mon_max_pgs_per_osd to 300 to avoid this.

Comment 5 Greg Farnum 2018-08-08 21:33:38 UTC
Upstream PR: https://github.com/ceph/ceph/pull/23251

Still really need to fix the PG calculator though. In that screenshot it appears to be default recommending the user target 200 PGs/OSD, so saying to create 70144 PGs with 3x replication.
Or else somebody or something else put in the target of 200, in which case we should fix those.

Michael, any thoughts?

Comment 6 Michael J. Kidd 2018-08-08 21:58:38 UTC
I think then, the best course of action based on all the info (and new balancer module) is to reduce the default target PGs per OSD to 100 in the PG Calc tool, and remove all mentions of 300 as a target.

If someone is intentionally setting it to 200 and they get that warning, I think the expectation is that the cluster will be expanded soon and they can figure out how to adjust the warning threshold.


Sound reasonable?

Comment 9 Shiyi Yuan 2018-08-10 06:12:04 UTC
Hi  Michael

I have updated the changes according to your suggestions, please review it 
on https://labsci.usersys.redhat.com/labs/cephpgc/

Thanks!

Comment 10 Michael J. Kidd 2018-08-10 19:43:04 UTC
Hello Shiyi,
  The update looks good to me.  Please push it to the production instance.

Thanks!

Comment 11 Shiyi Yuan 2018-08-13 02:15:16 UTC
Hi Michael

The new update has been on production instance.

Thanks!


Note You need to log in before you can comment on or make changes to this bug.