Bug 1611763 - RGW Dynamic bucket index resharding keeps resharding same buckets
Summary: RGW Dynamic bucket index resharding keeps resharding same buckets
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RGW
Version: 3.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: rc
: 3.2
Assignee: J. Eric Ivancich
QA Contact: vidushi
: 1644212 (view as bug list)
Depends On:
TreeView+ depends on / blocked
Reported: 2018-08-02 16:52 UTC by Scoots Hamilton
Modified: 2019-01-03 19:02 UTC (History)
19 users (show)

Fixed In Version: RHEL: ceph-12.2.8-26.el7cp Ubuntu: ceph_12.2.8-25redhat1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2019-01-03 19:01:46 UTC
Target Upstream Version:

Attachments (Terms of Use)
Bucket List Pre/Post shard (5.22 KB, application/zip)
2018-08-02 16:52 UTC, Scoots Hamilton
no flags Details

System ID Priority Status Summary Last Updated
Ceph Project Bug Tracker 24551 None None None 2018-08-02 18:00:22 UTC
Red Hat Knowledge Base (Solution) 3625121 None None None 2018-09-26 00:26:17 UTC
Red Hat Product Errata RHBA-2019:0020 None None None 2019-01-03 19:02:01 UTC

Description Scoots Hamilton 2018-08-02 16:52:52 UTC
Created attachment 1472783 [details]
Bucket List Pre/Post  shard

Description of problem:

Our customer opened a case noting that there was an increase in IOPS when they enabled the dynamic sharding feature in their environment. 

The issue reflected to be the same as luminous: Resharding hangs with versioning-enabled buckets and other in upstream trackers: 


Version-Release number of selected component (if applicable):

How reproducible:


We did have the customer capture the bucket list pre and post sharding and have evidence that the same buckets are being flagged over and over again. 

Steps to Reproduce:
The Customer allows the sharding process to run 
The bucket list is checked pre-run 
The Bucket list is checked post-run 

Actual results:
The buckets in the lists are identical in every respect 

Expected results:
The buckets which were queued to be shared, should not been seen in the list again with identical info as when they were first flagged 

Additional info:

Comment 16 J. Eric Ivancich 2018-10-03 20:51:16 UTC
The upstream PR is here, currently DNM, cleaning up:


Comment 18 J. Eric Ivancich 2018-10-31 19:34:28 UTC
Pushed to ceph-3.2-rhel-patches.

Comment 21 J. Eric Ivancich 2018-10-31 20:31:10 UTC
Here are some of the tests I used to verify the cases. All tests are based on inserting code at the top of the inner-most loop in RGWBucketReshard::do_reshard. In each case we're also testing resharding on a bucket with more than 30 objects and with the rgw_reshard_bucket_lock_duration set to 30.

Test 1: Insert sleep(1); -- this tests whether renewing the lock works when the resharding is taking somewhat longer than expected.

Test 2: Insert sleep(32); -- this tests proper recovery when we're unable to renew the lock before it expires.

Test 3: Insert static int i = 0; if (++i > 10) exit(1); -- this tests crashing of the radosgw code leaving things in a non-cleaned up state. I then restart everything and make sure I can read/write the bucket index (e.g., list the bucket, remove an object from the bucket). Furthermore I checked the radosgw log file to see if it includes "apparently successfully cleared resharding flags for bucket...".

Comment 25 Manjunatha 2018-12-26 17:46:05 UTC
*** Bug 1644212 has been marked as a duplicate of this bug. ***

Comment 27 errata-xmlrpc 2019-01-03 19:01:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.