Bug 1688378 - ops waiting for resharding to complete may not be able to complete when resharding does complete
Summary: ops waiting for resharding to complete may not be able to complete when resha...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RGW
Version: 3.2
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: z2
: 3.2
Assignee: J. Eric Ivancich
QA Contact: ceph-qe-bugs
John Brier
URL:
Whiteboard:
Depends On:
Blocks: 1629656
TreeView+ depends on / blocked
 
Reported: 2019-03-13 16:11 UTC by J. Eric Ivancich
Modified: 2019-10-17 21:08 UTC (History)
11 users (show)

Fixed In Version: RHEL: ceph-12.2.8-106.el7cp Ubuntu: ceph_12.2.8-91redhat1
Doc Type: Bug Fix
Doc Text:
.Operations waiting for resharding to complete are able to complete after resharding Previously, when using dynamic resharding, some operations that were waiting to complete after resharding failed to complete. This was due to code changes to the Ceph Object Gateway when automatically cleaning up no longer used bucket index shards. While this reduced storage demands and eliminated the need for manual clean up, the process removed one source of an identifier needed for operations to complete after resharding. The code has been updated so that identifier is retrieved from a different source after resharding and operations requiring it can now complete.
Clone Of:
Environment:
Last Closed: 2019-04-30 15:57:07 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 38990 0 None None None 2019-10-17 21:08:16 UTC
Red Hat Product Errata RHSA-2019:0911 0 None None None 2019-04-30 15:57:22 UTC

Description J. Eric Ivancich 2019-03-13 16:11:55 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 J. Eric Ivancich 2019-03-13 16:18:54 UTC
Description of problem: ops waiting for reshard to complete will fail when resharding successfully completes


Version-Release number of selected component (if applicable):


How reproducible:

Has reproduced twice by Thomas Serlin (tserlin). Once dyanamic resharding was turned off it did not reproduce.


Steps to Reproduce:
1. Set up cluster with dynamic resharding turned to on
2. Use the Veeam backup utility to write a back up to Ceph cluster
3. After about 31G of data is sent, a reshard will initiate and one of the ops will fail.

Actual results:

The op fails

Expected results:

The op succeeds

Additional info:

Is likely a result of a previous improvement where old bucket index data was removed once resharding completed

Comment 5 J. Eric Ivancich 2019-04-01 20:04:38 UTC
I tested the bug fix in the following manner....

1. Create test bucket

2. Create 7 jobs that do the following in parallel:
    a. upload file of around 256KB to test bucket
    b. go back to a. Use a counter and a unique tag per job so object names do not collide.

3. Do reshards repeatedly
    a. reshard bucket to a higher shard number
    b. wait for 5 seconds
    c. go back to a.

4. When examining the rgw log there should be no requests with a return status of either 500 or 404.

Without the bug fix, when I ran the above for 5 minutes and each reshard increasing number of shards by 50% I could very easily induce the error condition.

Comment 14 errata-xmlrpc 2019-04-30 15:57:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:0911


Note You need to log in before you can comment on or make changes to this bug.