Bug 1905431

Summary: [GSS][RFE] Optimize PG removal for huge number of objects in Red Hat Ceph Storage 4
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Gaurav Sitlani <gsitlani>
Component: RADOSAssignee: Neha Ojha <nojha>
Status: CLOSED ERRATA QA Contact: Pawan <pdhiran>
Severity: urgent Docs Contact: Amrita <asakthiv>
Priority: urgent    
Version: 4.1CC: akupczyk, asakthiv, bhubbard, ceph-eng-bugs, dzafman, jdurgin, kchai, mhackett, mmanjuna, mmuench, nojha, pdhiran, racpatel, rzarzyns, sseshasa, tserlin, vereddy, vumrao
Target Milestone: ---Keywords: FeatureBackport, FutureFeature, Performance
Target Release: 4.2z2   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-14.2.11-151.el8cp, ceph-14.2.11-151.el7cp Doc Type: Enhancement
Doc Text:
.Improvement in the efficiency of the PG removal code Previously,the code was inefficient as it did not keep a pointer to the last deleted object in the placement group (PG) in every pass which caused an unnecessary iteration over all the objects each time. With this release,there is an improved PG deletion performance with less impact on the client I/O. The parameters `osd_delete_sleep_ssd` and `osd_delete_sleep_hybrid` now have the default value of 1 second.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-06-15 17:13:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1890121    

Description Gaurav Sitlani 2020-12-08 11:03:56 UTC
Description of problem:

This is a request for backporting the PG optimization from the following pull requests in Red Hat Ceph Storage 4.1 :

https://github.com/ceph/ceph/pull/37314
https://github.com/ceph/ceph/pull/37496


Version-Release number of selected component (if applicable):
ceph version 14.2.8-111.el7cp

Steps to Reproduce:

1. Instantiate a test cluster with 2 pools sharing the same crush rule (or at least the same OSDs), one of them filled with a high number of objects (let's say 100M)
2. Delete the pool containing the 100M objects
3. Observing the read/write latencies increasing over time on the other pool

Comment 4 Vikhyat Umrao 2020-12-10 17:56:21 UTC
Earlier reports from workload dfg:

https://bugzilla.redhat.com/show_bug.cgi?id=1770510
https://tracker.ceph.com/issues/47174

Comment 10 Vikhyat Umrao 2021-05-17 11:59:32 UTC
*** Bug 1952920 has been marked as a duplicate of this bug. ***

Comment 18 Amrita 2021-06-02 13:09:15 UTC
Hi Neha,

Could you please provide the doc text, this is for inclusion in the 4.2z2 Release Notes?

Thanks
Amrita

Comment 25 errata-xmlrpc 2021-06-15 17:13:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Ceph Storage 4.2 Security and Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2445

Comment 26 Vikhyat Umrao 2022-05-03 17:56:53 UTC
*** Bug 1770510 has been marked as a duplicate of this bug. ***