Description of problem: Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Description of problem: ops waiting for reshard to complete will fail when resharding successfully completes Version-Release number of selected component (if applicable): How reproducible: Has reproduced twice by Thomas Serlin (tserlin). Once dyanamic resharding was turned off it did not reproduce. Steps to Reproduce: 1. Set up cluster with dynamic resharding turned to on 2. Use the Veeam backup utility to write a back up to Ceph cluster 3. After about 31G of data is sent, a reshard will initiate and one of the ops will fail. Actual results: The op fails Expected results: The op succeeds Additional info: Is likely a result of a previous improvement where old bucket index data was removed once resharding completed
I tested the bug fix in the following manner.... 1. Create test bucket 2. Create 7 jobs that do the following in parallel: a. upload file of around 256KB to test bucket b. go back to a. Use a counter and a unique tag per job so object names do not collide. 3. Do reshards repeatedly a. reshard bucket to a higher shard number b. wait for 5 seconds c. go back to a. 4. When examining the rgw log there should be no requests with a return status of either 500 or 404. Without the bug fix, when I ran the above for 5 minutes and each reshard increasing number of shards by 50% I could very easily induce the error condition.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:0911