1611763 – RGW Dynamic bucket index resharding keeps resharding same buckets

Bug 1611763 - RGW Dynamic bucket index resharding keeps resharding same buckets

Summary: RGW Dynamic bucket index resharding keeps resharding same buckets

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RGW
Sub Component:
Version:	3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	rc
Target Release:	3.2
Assignee:	J. Eric Ivancich
QA Contact:	Vidushi Mishra
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1644212 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-08-02 16:52 UTC by Scoots Hamilton
Modified:	2022-03-13 15:20 UTC (History)
CC List:	19 users (show)
Fixed In Version:	RHEL: ceph-12.2.8-26.el7cp Ubuntu: ceph_12.2.8-25redhat1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-01-03 19:01:46 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Bucket List Pre/Post shard (5.22 KB, application/zip) 2018-08-02 16:52 UTC, Scoots Hamilton	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	24551	None	None	None	2018-08-02 18:00:22 UTC
Red Hat Issue Tracker	RHCEPH-2682	None	None	None	2021-12-10 17:06:13 UTC
Red Hat Knowledge Base (Solution)	3625121	None	None	None	2018-09-26 00:26:17 UTC
Red Hat Product Errata	RHBA-2019:0020	None	None	None	2019-01-03 19:02:01 UTC

Description Scoots Hamilton 2018-08-02 16:52:52 UTC

Created attachment 1472783 [details]
Bucket List Pre/Post  shard

Description of problem:

Our customer opened a case noting that there was an increase in IOPS when they enabled the dynamic sharding feature in their environment. 

The issue reflected to be the same as luminous: Resharding hangs with versioning-enabled buckets and other in upstream trackers: 

http://tracker.ceph.com/issues/24937
https://tracker.ceph.com/issues/24551

Version-Release number of selected component (if applicable):
Luminous


How reproducible:

TBD 

We did have the customer capture the bucket list pre and post sharding and have evidence that the same buckets are being flagged over and over again. 

Steps to Reproduce:
The Customer allows the sharding process to run 
The bucket list is checked pre-run 
The Bucket list is checked post-run 

Actual results:
The buckets in the lists are identical in every respect 


Expected results:
The buckets which were queued to be shared, should not been seen in the list again with identical info as when they were first flagged 

Additional info:

Comment 16 J. Eric Ivancich 2018-10-03 20:51:16 UTC

The upstream PR is here, currently DNM, cleaning up:

    https://github.com/ceph/ceph/pull/24406

Comment 18 J. Eric Ivancich 2018-10-31 19:34:28 UTC

Pushed to ceph-3.2-rhel-patches.

Comment 21 J. Eric Ivancich 2018-10-31 20:31:10 UTC

Here are some of the tests I used to verify the cases. All tests are based on inserting code at the top of the inner-most loop in RGWBucketReshard::do_reshard. In each case we're also testing resharding on a bucket with more than 30 objects and with the rgw_reshard_bucket_lock_duration set to 30.

Test 1: Insert sleep(1); -- this tests whether renewing the lock works when the resharding is taking somewhat longer than expected.

Test 2: Insert sleep(32); -- this tests proper recovery when we're unable to renew the lock before it expires.

Test 3: Insert static int i = 0; if (++i > 10) exit(1); -- this tests crashing of the radosgw code leaving things in a non-cleaned up state. I then restart everything and make sure I can read/write the bucket index (e.g., list the bucket, remove an object from the bucket). Furthermore I checked the radosgw log file to see if it includes "apparently successfully cleared resharding flags for bucket...".

Comment 25 Manjunatha 2018-12-26 17:46:05 UTC

*** Bug 1644212 has been marked as a duplicate of this bug. ***

Comment 27 errata-xmlrpc 2019-01-03 19:01:46 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0020

Note You need to log in before you can comment on or make changes to this bug.