1603615 – Ceph PG calculator conflict with mon_max_pg_per_osd

Bug 1603615 - Ceph PG calculator conflict with mon_max_pg_per_osd

Summary: Ceph PG calculator conflict with mon_max_pg_per_osd

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RADOS
Sub Component:
Version:	3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	rc
Target Release:	3.2
Assignee:	Neha Ojha
QA Contact:	ceph-qe-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-07-19 18:05 UTC by Ben England
Modified:	2018-08-22 21:25 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-08-22 21:25:18 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
screenshot of Ceph PG calculator output (deleted) 2018-07-19 18:05 UTC, Ben England	no flags	Details
screenshot try 2 of Ceph PG calculator (285.07 KB, image/png) 2018-07-19 18:09 UTC, Ben England	no flags	Details
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	25112	0	None	None	None	2018-07-26 00:54:22 UTC

Description Ben England 2018-07-19 18:05:26 UTC

Description of problem:

The Ceph PG calculator can generate recommendations for pool PG counts that will conflict with the osd_max_pgs_per_osd parameter. This can cause significant aggravation for the installer, particularly when OpenStack is deploying a Ceph cluster.

Version-Release number of selected component (if applicable):

RHCS 3.1 - ceph-common-12.2.4-10.el7cp.x86_64
RHOSP 13 rc6
RHEL 7.5 - 3.10.0-862.3.3.el7.x86_64

How reproducible:

every time.

Steps to Reproduce:

- plug this scenario into the PG calculator at

https://access.redhat.com/labs/cephpgc/

-- 1000 OSDs
-- 95% space used for "vms" pool
-- 5% space used for glance "images" pool
-- none for any other pool
See attachment for Ceph PG calculator output

2. Add up the PG counts for each pool and multiply by 3 (replication count), the total is:

(512+32768+32768+4096)*3 = 70144*3 = 210432

Compare to mon_max_pgs_per_osd * 1000 OSDs = 200 * 1000 = 200000

Actual results:

Pool creation will fail.

Expected results:

PG Calc should not conflict with osd_max_pgs_per_osd, ever!

Additional info:

I spoke with Ceph developers at upstream perf weekly, their conclusion was that we needed to start using the ceph-mgr balancer module (which is in Luminous = RHCS 3) and then we wouldn't need so many PGs. But then PG calculator needs an update at a minimum. I was able to enable the balancer module in RHOSP 13 ceph-mgr container, but I don't know if it works yet. RHOSP 13 installer and ceph-ansible certainly do not enable it by default.

http://docs.ceph.com/docs/luminous/mgr/balancer/

My suggestion would be to lower the PG calculator's recommendations, since it was developed prior to having the ceph-mgr balancer module. But by how much? I would need more experience with effectiveness of balancer module in different-sized configurations before I could give you a clear answer on this.

background:

change to RHCS 3 that leads to this:
https://ceph.com/community/new-luminous-pg-overdose-protection/

code that implements osd_max_pgs_per_osd check:
https://github.com/ceph/ceph/blob/e59258943bcfe3e52d40a59ff30df55e1e6a3865/src/mon/OSDMonitor.cc#L5670-L5698

Comment 3 Ben England 2018-07-19 18:09:42 UTC

Created attachment 1460926 [details]
screenshot try 2 of Ceph PG calculator

shows output of Ceph PG calculator for specified inputs

Comment 4 Josh Durgin 2018-07-25 22:52:57 UTC

Since this is already in the wild docs/pg calc, let's increase the mon_max_pgs_per_osd to 300 to avoid this.

Comment 5 Greg Farnum 2018-08-08 21:33:38 UTC

Upstream PR: https://github.com/ceph/ceph/pull/23251

Still really need to fix the PG calculator though. In that screenshot it appears to be default recommending the user target 200 PGs/OSD, so saying to create 70144 PGs with 3x replication.
Or else somebody or something else put in the target of 200, in which case we should fix those.

Michael, any thoughts?

Comment 6 Michael J. Kidd 2018-08-08 21:58:38 UTC

I think then, the best course of action based on all the info (and new balancer module) is to reduce the default target PGs per OSD to 100 in the PG Calc tool, and remove all mentions of 300 as a target.

If someone is intentionally setting it to 200 and they get that warning, I think the expectation is that the cluster will be expanded soon and they can figure out how to adjust the warning threshold.


Sound reasonable?

Comment 9 Shiyi Yuan 2018-08-10 06:12:04 UTC

Hi  Michael

I have updated the changes according to your suggestions, please review it 
on https://labsci.usersys.redhat.com/labs/cephpgc/

Thanks!

Comment 10 Michael J. Kidd 2018-08-10 19:43:04 UTC

Hello Shiyi,
  The update looks good to me.  Please push it to the production instance.

Thanks!

Comment 11 Shiyi Yuan 2018-08-13 02:15:16 UTC

Hi Michael

The new update has been on production instance.

Thanks!

Note You need to log in before you can comment on or make changes to this bug.