Description of problem: RGW pool creation code cleanup with pg autoscaler for mostly_omap use case https://tracker.ceph.com/issues/52673 RHCS 5 currently sets pg-autoscaler bias to 4 to only `.meta` pool, not for .log, .index, or any other metadata pools. The suggestion is to move this code/logic out of RGW and put it in Ceph Orchestrator(ceph-adm), Please check - https://bugzilla.redhat.com/show_bug.cgi?id=2006083 Version-Release number of selected component (if applicable): RHCS 5
Why do you think it is better to create pools in cephadm (btw, it's called cephadm, not ceph-adm) instead of RGW? The only reason I can think of having this in cephadm is better than within RGW is by allowing users to modify pool arguments, like {code:yaml} # create the realm first kind: rgw_realm name: myrealm --- # create the zone kind: rgw_zone name: myzone realm: myrealm --- # now deploy the daemons service_type: rgw service_id: myrealm.myzone spec: realm: myrealm zone: myzone pools: .meta: type: replicated replica_count: 3 pg_autoscaler_bias: 4 .index: ... {code} In case we're just talking about improving some default values, we should do that in RGW, right?
(In reply to Sebastian Wagner from comment #1) > Why do you think it is better to create pools in cephadm (btw, it's called > cephadm, not ceph-adm) instead of RGW? The only reason I can think of having > this in cephadm is better than within RGW is by allowing users to modify > pool arguments, like Thanks, for pointing out the mistake for cephadm naming part looks like most of us had got this habit from ceph-ansible and ceph-deploy days :) and it is tagging along with us because these two had a '-' in b/w the name. But good you pointed so I will keep this in my mind :) I think the biggest benefit is all the orchestration configuration or/and installation configuration we want to keep at cephadm layer instead of in a component code. So yes we need some pre-defind defaults in cephadm what we want to do during installation and for the same, we have this cephadm bug - https://bugzilla.redhat.com/show_bug.cgi?id=2006083. This bug is mostly for dropping the code which is not doing it correctly and causing issues. @Matt - I think this one needs a triage. > > {code:yaml} > # create the realm first > kind: rgw_realm > name: myrealm > --- > # create the zone > kind: rgw_zone > name: myzone > realm: myrealm > --- > # now deploy the daemons > service_type: rgw > service_id: myrealm.myzone > spec: > realm: myrealm > zone: myzone > pools: > .meta: > type: replicated > replica_count: 3 > pg_autoscaler_bias: 4 > .index: > ... > {code} > > In case we're just talking about improving some default values, we should do > that in RGW, right? As explained above it is both improving default and managing all installation related to default and configuration at cephadm. The above method looks good to me.
(In reply to Vikhyat Umrao from comment #2) > (In reply to Sebastian Wagner from comment #1) > > Why do you think it is better to create pools in cephadm (btw, it's called > > cephadm, not ceph-adm) instead of RGW? The only reason I can think of having > > this in cephadm is better than within RGW is by allowing users to modify > > pool arguments, like > > Thanks, for pointing out the mistake for cephadm naming part looks like most > of us had got this habit from ceph-ansible and ceph-deploy days :) and it is > tagging along with us because these two had a '-' in b/w the name. But good > you pointed so I will keep this in my mind :) > > I think the biggest benefit is all the orchestration configuration or/and > installation configuration we want to keep at cephadm layer instead of in a > component code. So yes we need some pre-defind defaults in cephadm what we > want to do during installation and for the same, we have this cephadm bug - > https://bugzilla.redhat.com/show_bug.cgi?id=2006083. > > This bug is mostly for dropping the code which is not doing it correctly and > causing issues. > > @Matt - I think this one needs a triage. > > > > > {code:yaml} > > # create the realm first > > kind: rgw_realm > > name: myrealm > > --- > > # create the zone > > kind: rgw_zone > > name: myzone > > realm: myrealm > > --- > > # now deploy the daemons > > service_type: rgw > > service_id: myrealm.myzone > > spec: > > realm: myrealm > > zone: myzone > > pools: > > .meta: > > type: replicated > > replica_count: 3 > > pg_autoscaler_bias: 4 > > .index: > > ... > > {code} > > > > In case we're just talking about improving some default values, we should do > > that in RGW, right? > > As explained above it is both improving default and managing all > installation related to default and configuration at cephadm. The above > method looks good to me. this has long been an issue in rgw (see https://tracker.ceph.com/issues/21497 https://tracker.ceph.com/issues/38311 https://tracker.ceph.com/issues/36491 etc), where the pools we rely on don't exist and rgw is unable to create them because of pg limits. this can be a challenge to debug, especially when most S3 operations succeed, but others that need a missing pool fail with strange errors like '416 Requested Range Not Satisfiable' we do log this error message whenever pool creation fails, but even i wouldn't really know what to do about it: > ERROR: librados::Rados::pool_create returned (34) Numerical result out of range (this can be due to a pool or placement group misconfiguration, e.g. pg_num < pgp_num or mon_max_pg_per_osd exceeded) i remember discussing this issue upstream in a CDM earlier this year. i also took the position that: 1) the orchestrator is better equipped to make decisions about pool sizing and configuration. radosgw itself is stateless and doesn't know what other services are deployed, or whether the cluster can satisfy all of its pool creations 2) the failure to create a pool for rgw should be a deployment error, not a runtime error 3) pool creation adds significant latency (seconds) to s3 requests that have to stop and wait for pool creation; though the impact is negligible in a long-running rgw cluster if i recall, Josh (cc'ed) and i discussed a compromise where rgw would use the absolute minimum pg sizes when auto-creating its pools, then rely on the autoscaler to converge on a stable configuration. i don't believe any action was taken there, and it still seems unable to proide any guarantees that pool creation will succeed at runtime the radosgw-admin commands for zone creation and zone placement could potentially be responsible for this pool creation, and this approach does successfully turn the runtime errors into deployment ones. but the radosgw-admin commands don't know a reasonable set of defaults either, so would have to take additional arguments to specify them, presumably provided by the orchestrator. we could choose a set of defaults there in the orchestrator, with the ability to override them with configuration. and in the event that pool creation does fail during deployment, *something* needs to be responsible for cleaning up pools that it did create in addition to adding new pool names, the radosgw-admin commands could also remove or change the zone pool names. i'd really prefer not to put radosgw[-admin] in the position of automatically renaming or deleting rados pools
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage Security, Bug Fix, and Enhancement Update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5997
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days