1491007 – [RFE] Implement a checking mechanism to validate PG count to OSD's before allowing PG increases

Bug 1491007 - [RFE] Implement a checking mechanism to validate PG count to OSD's before allowing PG increases

Summary: [RFE] Implement a checking mechanism to validate PG count to OSD's before all...

Keywords:
Status:	CLOSED DUPLICATE of bug 1489064
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RADOS
Sub Component:
Version:	2.3
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	3.0
Assignee:	Josh Durgin
QA Contact:	ceph-qe-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-09-12 18:10 UTC by Mike Hackett
Modified:	2020-12-14 10:00 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-10-16 15:30:00 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	ceph ceph pull 17814	0	None	None	None	2017-10-09 23:01:08 UTC

Description Mike Hackett 2017-09-12 18:10:45 UTC

Description of problem:

We have had customers encounter issues due to large number of PG's on OSD's. 
The latest incident had an OSD with a PG count of 19k, eventually this lead to cluster being unable to recover.

Looking to include some mechanism that checks the PG to OSD ratio prior to allowing a user to create further PG's in a cluster to prevent a user from encountering a situation where the PG count exceeds a safe limit for cluster recovery.


Version-Release number of selected component (if applicable):
2.4


Steps to Reproduce:
1. Ceph cluster with 4 OSD nodes.
2. Create a pool with a large amount of PG's on it, 5k-7k.
3. Put data on the cluster.
4. Remove an OSD node from the cluster so the PG's need to rebalance to another node in the cluster.
5. Add node back into cluster and force rebalance.

Actual results:
Creating the pool with a large number of OSD's succeeds without warning.

Expected results:
We should block creating a large number of PG's on a pool where there are a limited number of OSD's

Additional info:
BU MOC encountered this issue and rndered the cluster completely down.
Gamestream aslo encountered this issue in the past.

Comment 8 Josh Durgin 2017-10-16 15:20:31 UTC

*** Bug 1489064 has been marked as a duplicate of this bug. ***

Comment 9 Josh Durgin 2017-10-16 15:30:00 UTC


*** This bug has been marked as a duplicate of bug 1489064 ***

Note You need to log in before you can comment on or make changes to this bug.