Bug 1688378
Summary: | ops waiting for resharding to complete may not be able to complete when resharding does complete | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | J. Eric Ivancich <ivancich> |
Component: | RGW | Assignee: | J. Eric Ivancich <ivancich> |
Status: | CLOSED ERRATA | QA Contact: | ceph-qe-bugs <ceph-qe-bugs> |
Severity: | medium | Docs Contact: | John Brier <jbrier> |
Priority: | low | ||
Version: | 3.2 | CC: | agunn, anharris, cbodley, ceph-eng-bugs, ceph-qe-bugs, jbrier, kbader, mbenjamin, sweil, tserlin, vumrao |
Target Milestone: | z2 | ||
Target Release: | 3.2 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | RHEL: ceph-12.2.8-106.el7cp Ubuntu: ceph_12.2.8-91redhat1 | Doc Type: | Bug Fix |
Doc Text: |
.Operations waiting for resharding to complete are able to complete after resharding
Previously, when using dynamic resharding, some operations that were waiting to complete after resharding failed to complete. This was due to code changes to the Ceph Object Gateway when automatically cleaning up no longer used bucket index shards. While this reduced storage demands and eliminated the need for manual clean up, the process removed one source of an identifier needed for operations to complete after resharding. The code has been updated so that identifier is retrieved from a different source after resharding and operations requiring it can now complete.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2019-04-30 15:57:07 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1629656 |
Description
J. Eric Ivancich
2019-03-13 16:11:55 UTC
Description of problem: ops waiting for reshard to complete will fail when resharding successfully completes Version-Release number of selected component (if applicable): How reproducible: Has reproduced twice by Thomas Serlin (tserlin). Once dyanamic resharding was turned off it did not reproduce. Steps to Reproduce: 1. Set up cluster with dynamic resharding turned to on 2. Use the Veeam backup utility to write a back up to Ceph cluster 3. After about 31G of data is sent, a reshard will initiate and one of the ops will fail. Actual results: The op fails Expected results: The op succeeds Additional info: Is likely a result of a previous improvement where old bucket index data was removed once resharding completed I tested the bug fix in the following manner.... 1. Create test bucket 2. Create 7 jobs that do the following in parallel: a. upload file of around 256KB to test bucket b. go back to a. Use a counter and a unique tag per job so object names do not collide. 3. Do reshards repeatedly a. reshard bucket to a higher shard number b. wait for 5 seconds c. go back to a. 4. When examining the rgw log there should be no requests with a return status of either 500 or 404. Without the bug fix, when I ran the above for 5 minutes and each reshard increasing number of shards by 50% I could very easily induce the error condition. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:0911 |