Bug 1602913 - Determining and applying expected_num_objects value
Summary: Determining and applying expected_num_objects value
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Documentation
Version: 3.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 3.1
Assignee: John Wilkins
QA Contact: John Harrigan
URL:
Whiteboard:
Depends On:
Blocks: 1581350 1592497 1593418
TreeView+ depends on / blocked
 
Reported: 2018-07-18 19:49 UTC by John Harrigan
Modified: 2019-02-26 07:25 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-02-26 07:25:24 UTC
Embargoed:


Attachments (Terms of Use)

Description John Harrigan 2018-07-18 19:49:47 UTC
Description of problem:
RHCS 3.1 supports specifying expected_num_objects and procedural documentation
should be added to "Ceph Object Gateway for Production" guide

Version-Release number of selected component (if applicable):
"Ceph Object Gateway for Production" guide

Additional info:
In order to help users avoid Filestore splitting operations
which can dramatically slow client I/O performance. While this
behaviour can affect all Ceph users, it is especially likely to
impact RGW customers since they likely have pools with many
objects. Guide users through the procedure for determining the
correct value for expected_num_objects and illustrate with
several customer use cases.

Comment 4 Harish NV Rao 2018-08-09 05:46:27 UTC
@John, will you be verifying this bug?

Comment 5 John Harrigan 2018-08-09 14:39:37 UTC
Doug, how are we going to resolve this if pgcalc does not add support?
Can we recommend values for expected_num_objects, perhaps based on cluster size (small, medium, large)?

Comment 6 John Harrigan 2018-08-09 14:46:57 UTC
(In reply to John Wilkins from comment #3)
> https://access.qa.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-
> single/ceph_object_gateway_for_production/#considering-expected-object-count-
> rgw-adv
> 
There is a gotcha here, expected_num_objects only works correctly as of RHCS 3.1
So the notes on "RHCS3.1 and earlier" need to be rephrased.
STATES - 5.1. Considering Expected Object Count (RHCS 3.1 and Earlier)
SHOULD BE - 5.1. Considering Expected Object Count (RHCS 3.1 and later, when using Filestore

> https://access.qa.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-
> single/ceph_object_gateway_for_production/#creating-a-bucket-index-pool-rgw-
> adv
section 5.6.1
STATES - For RHCS 3.1 and earlier releases, 
SHOULD BE - For RHCS 3.1 and later releases, when using Filestore

> https://access.qa.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-
> single/ceph_object_gateway_for_production/#creating-a-data-pool-rgw-adv
section 5.6.2
STATES - For RHCS 3.1 and earlier releases, 
SHOULD BE - For RHCS 3.1 and later releases, when using Filestore

Comment 7 John Harrigan 2018-08-09 14:47:40 UTC
(In reply to Harish NV Rao from comment #4)
> @John, will you be verifying this bug?

Yes, I can do that.

Comment 8 John Harrigan 2018-08-09 14:53:31 UTC
We are still missing an RGW pools creation procedure here, to guide an RGW admin.
This is not trivial
During Scale Lab testing I used this script
https://github.com/jharriga/GCrate/blob/master/resetRGW.sh

Perhaps it can be used to document a procedure

Comment 9 John Wilkins 2018-08-10 00:21:31 UTC
(In reply to John Harrigan from comment #8)
> We are still missing an RGW pools creation procedure here, to guide an RGW
> admin.
> This is not trivial
> During Scale Lab testing I used this script
> https://github.com/jharriga/GCrate/blob/master/resetRGW.sh
> 
> Perhaps it can be used to document a procedure

I've fixed the headings. Is there something else you want to do on a pool procedure? The PG calculator isn't something our team controls. 

https://access.qa.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/ceph_object_gateway_for_production/#considering-expected-object-count-rgw-adv

https://access.qa.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/ceph_object_gateway_for_production/#creating-an-index-pool-rgw-adv

https://access.qa.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/ceph_object_gateway_for_production/#creating-a-data-pool-rgw-adv

Comment 10 Douglas Fuller 2018-08-10 13:43:43 UTC
(In reply to John Harrigan from comment #5)
> Doug, how are we going to resolve this if pgcalc does not add support?
> Can we recommend values for expected_num_objects, perhaps based on cluster
> size (small, medium, large)?

That's a good idea. I'll ask Neha to suggest some values.

Comment 11 Neha Ojha 2018-08-10 21:55:46 UTC
We have discussed different values for expected_num_objects, like 1M, 5M, 50M, 500M, but I am not sure if there is a way to determine values based on small, medium and large clusters.

Perhaps, we could use the guidelines that Josh mentioned here: https://bugzilla.redhat.com/show_bug.cgi?id=1592497#c29

"E.g. the default rgw data pool could be created with expected_num_objects = num_osds * 10M"

Comment 12 John Harrigan 2018-08-13 13:07:49 UTC
(In reply to nojha from comment #11)
> We have discussed different values for expected_num_objects, like 1M, 5M,
> 50M, 500M, but I am not sure if there is a way to determine values based on
> small, medium and large clusters.
> 
> Perhaps, we could use the guidelines that Josh mentioned here:
> https://bugzilla.redhat.com/show_bug.cgi?id=1592497#c29
> 
> "E.g. the default rgw data pool could be created with expected_num_objects =
> num_osds * 10M"

In the Scale Lab we had 12x OSD nodes, each with 26 OSD devices (24 HDDs and 2 bucket index OSDs), fo a total of 312 OSDs. I ran testing with 500M and for our
workload it mitigated filestore splitting.

We did run into OSD suicide timeouts when expected_num_objects was set very
high (ie. one trillion). 
You need to leave time for the pool creation when using expected_num_objects. My
attempts to determine a pattern for predicting that time lag in a repeatable
manner was not successful.

Comment 13 Neha Ojha 2018-08-13 19:08:24 UTC
(In reply to John Harrigan from comment #12)
> (In reply to nojha from comment #11)
> > We have discussed different values for expected_num_objects, like 1M, 5M,
> > 50M, 500M, but I am not sure if there is a way to determine values based on
> > small, medium and large clusters.
> > 
> > Perhaps, we could use the guidelines that Josh mentioned here:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1592497#c29
> > 
> > "E.g. the default rgw data pool could be created with expected_num_objects =
> > num_osds * 10M"
> 
> In the Scale Lab we had 12x OSD nodes, each with 26 OSD devices (24 HDDs and
> 2 bucket index OSDs), fo a total of 312 OSDs. I ran testing with 500M and
> for our
> workload it mitigated filestore splitting.
> 
> We did run into OSD suicide timeouts when expected_num_objects was set very
> high (ie. one trillion). 

Would it make sense to suggest an upper bound for expected_num_objects, based on your experience, John, along with the suggested formula?

> You need to leave time for the pool creation when using
> expected_num_objects. My
> attempts to determine a pattern for predicting that time lag in a repeatable
> manner was not successful.

Comment 20 John Harrigan 2018-09-05 18:27:33 UTC
Looks pretty close but we should emphasize that setting expected_num_objects is only needed when creating pools which will have high object counts, such as the rgw.data pool

The expected number of objects for this pool. By setting this value (together with a negative filestore merge threshold), the PG folder splitting would happen at the pool creation time, to avoid the latency impact to do a runtime folder splitting. While this behaviour can affect all Ceph users, it is especially likely to impact RGW customers since they likely have pools with many
objects (i.e. “default.rgw.buckets.data").


And yes, the above referenced RED HAT CEPH STORAGE HARDWARE SELECTION GUIDE is a public link

Comment 22 John Harrigan 2018-09-06 13:08:49 UTC
Looks good. Thanks.


Note You need to log in before you can comment on or make changes to this bug.