2006084 – [Workload-DFG] RGW pool creation code cleanup with pg autoscaler for mostly_omap use case

Bug 2006084 - [Workload-DFG] RGW pool creation code cleanup with pg autoscaler for mostly_omap use case

Summary: [Workload-DFG] RGW pool creation code cleanup with pg autoscaler for mostly_o...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RGW
Sub Component:
Version:	5.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	5.2
Assignee:	Matt Benjamin (redhat)
QA Contact:	Vivekanandan K
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2102272
TreeView+	depends on / blocked

Reported:	2021-09-20 20:19 UTC by Vikhyat Umrao
Modified:	2023-09-15 01:36 UTC (History)
CC List:	10 users (show)
Fixed In Version:	ceph-16.2.8-22.el8cp
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-08-09 17:36:05 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	52673	None	None	None	2021-09-20 20:19:20 UTC
Github	ceph ceph pull 46235	None	open	pacific: rgw: remove rgw_rados_pool_pg_num_min and its use on pool creation use the cluster defaults for pg_num_min	2022-05-11 22:01:55 UTC
Red Hat Issue Tracker	RHCEPH-1824	None	None	None	2021-09-20 20:21:55 UTC
Red Hat Product Errata	RHSA-2022:5997	None	None	None	2022-08-09 17:36:30 UTC

Internal Links: 2006083

Description Vikhyat Umrao 2021-09-20 20:19:20 UTC

Description of problem:
RGW pool creation code cleanup with pg autoscaler for mostly_omap use case
https://tracker.ceph.com/issues/52673

RHCS 5 currently sets pg-autoscaler bias to 4 to only `.meta` pool, not for .log, .index, or any other metadata pools. 

The suggestion is to move this code/logic out of RGW and put it in Ceph Orchestrator(ceph-adm), Please check - https://bugzilla.redhat.com/show_bug.cgi?id=2006083




Version-Release number of selected component (if applicable):
RHCS 5

Comment 1 Sebastian Wagner 2021-09-28 09:14:03 UTC

Why do you think it is better to create pools in cephadm (btw, it's called cephadm, not ceph-adm) instead of RGW? The only reason I can think of having this in cephadm is better than within RGW is by allowing users to modify pool arguments, like

{code:yaml}
# create the realm first
kind: rgw_realm
name: myrealm
---
# create the zone
kind: rgw_zone
name: myzone
realm: myrealm
---
# now deploy the daemons
service_type: rgw
service_id: myrealm.myzone
spec:
  realm: myrealm
  zone: myzone
  pools:
    .meta:
      type: replicated
      replica_count: 3
      pg_autoscaler_bias: 4
    .index:
      ...
{code}

In case we're just talking about improving some default values, we should do that in RGW, right?

Comment 2 Vikhyat Umrao 2021-10-20 19:15:25 UTC

(In reply to Sebastian Wagner from comment #1)
> Why do you think it is better to create pools in cephadm (btw, it's called
> cephadm, not ceph-adm) instead of RGW? The only reason I can think of having
> this in cephadm is better than within RGW is by allowing users to modify
> pool arguments, like

Thanks, for pointing out the mistake for cephadm naming part looks like most of us had got this habit from ceph-ansible and ceph-deploy days :) and it is tagging along with us because these two had a '-' in b/w the name. But good you pointed so I will keep this in my mind :)

I think the biggest benefit is all the orchestration configuration or/and installation configuration we want to keep at cephadm layer instead of in a component code. So yes we need some pre-defind defaults in cephadm what we want to do during installation and for the same, we have this cephadm bug - https://bugzilla.redhat.com/show_bug.cgi?id=2006083.

This bug is mostly for dropping the code which is not doing it correctly and causing issues. 

@Matt - I think this one needs a triage.

> 
> {code:yaml}
> # create the realm first
> kind: rgw_realm
> name: myrealm
> ---
> # create the zone
> kind: rgw_zone
> name: myzone
> realm: myrealm
> ---
> # now deploy the daemons
> service_type: rgw
> service_id: myrealm.myzone
> spec:
>   realm: myrealm
>   zone: myzone
>   pools:
>     .meta:
>       type: replicated
>       replica_count: 3
>       pg_autoscaler_bias: 4
>     .index:
>       ...
> {code}
> 
> In case we're just talking about improving some default values, we should do
> that in RGW, right?
 
As explained above it is both improving default and managing all installation related to default and configuration at cephadm. The above method looks good to me.

Comment 3 Casey Bodley 2021-10-21 19:18:38 UTC

(In reply to Vikhyat Umrao from comment #2)
> (In reply to Sebastian Wagner from comment #1)
> > Why do you think it is better to create pools in cephadm (btw, it's called
> > cephadm, not ceph-adm) instead of RGW? The only reason I can think of having
> > this in cephadm is better than within RGW is by allowing users to modify
> > pool arguments, like
> 
> Thanks, for pointing out the mistake for cephadm naming part looks like most
> of us had got this habit from ceph-ansible and ceph-deploy days :) and it is
> tagging along with us because these two had a '-' in b/w the name. But good
> you pointed so I will keep this in my mind :)
> 
> I think the biggest benefit is all the orchestration configuration or/and
> installation configuration we want to keep at cephadm layer instead of in a
> component code. So yes we need some pre-defind defaults in cephadm what we
> want to do during installation and for the same, we have this cephadm bug -
> https://bugzilla.redhat.com/show_bug.cgi?id=2006083.
> 
> This bug is mostly for dropping the code which is not doing it correctly and
> causing issues. 
> 
> @Matt - I think this one needs a triage.
> 
> > 
> > {code:yaml}
> > # create the realm first
> > kind: rgw_realm
> > name: myrealm
> > ---
> > # create the zone
> > kind: rgw_zone
> > name: myzone
> > realm: myrealm
> > ---
> > # now deploy the daemons
> > service_type: rgw
> > service_id: myrealm.myzone
> > spec:
> >   realm: myrealm
> >   zone: myzone
> >   pools:
> >     .meta:
> >       type: replicated
> >       replica_count: 3
> >       pg_autoscaler_bias: 4
> >     .index:
> >       ...
> > {code}
> > 
> > In case we're just talking about improving some default values, we should do
> > that in RGW, right?
>  
> As explained above it is both improving default and managing all
> installation related to default and configuration at cephadm. The above
> method looks good to me.


this has long been an issue in rgw (see https://tracker.ceph.com/issues/21497 https://tracker.ceph.com/issues/38311 https://tracker.ceph.com/issues/36491 etc), where the pools we rely on don't exist and rgw is unable to create them because of pg limits. this can be a challenge to debug, especially when most S3 operations succeed, but others that need a missing pool fail with strange errors like '416 Requested Range Not Satisfiable'

we do log this error message whenever pool creation fails, but even i wouldn't really know what to do about it:

> ERROR: librados::Rados::pool_create returned (34) Numerical result out of range (this can be due to a pool or placement group misconfiguration, e.g. pg_num < pgp_num or mon_max_pg_per_osd exceeded)


i remember discussing this issue upstream in a CDM earlier this year. i also took the position that:

1) the orchestrator is better equipped to make decisions about pool sizing and configuration. radosgw itself is stateless and doesn't know what other services are deployed, or whether the cluster can satisfy all of its pool creations
2) the failure to create a pool for rgw should be a deployment error, not a runtime error
3) pool creation adds significant latency (seconds) to s3 requests that have to stop and wait for pool creation; though the impact is negligible in a long-running rgw cluster


if i recall, Josh (cc'ed) and i discussed a compromise where rgw would use the absolute minimum pg sizes when auto-creating its pools, then rely on the autoscaler to converge on a stable configuration. i don't believe any action was taken there, and it still seems unable to proide any guarantees that pool creation will succeed at runtime


the radosgw-admin commands for zone creation and zone placement could potentially be responsible for this pool creation, and this approach does successfully turn the runtime errors into deployment ones. but the radosgw-admin commands don't know a reasonable set of defaults either, so would have to take additional arguments to specify them, presumably provided by the orchestrator. we could choose a set of defaults there in the orchestrator, with the ability to override them with configuration. and in the event that pool creation does fail during deployment, *something* needs to be responsible for cleaning up pools that it did create

in addition to adding new pool names, the radosgw-admin commands could also remove or change the zone pool names. i'd really prefer not to put radosgw[-admin] in the position of automatically renaming or deleting rados pools

Comment 19 errata-xmlrpc 2022-08-09 17:36:05 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage Security, Bug Fix, and Enhancement Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5997

Comment 20 Red Hat Bugzilla 2023-09-15 01:36:10 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days

Note You need to log in before you can comment on or make changes to this bug.