Bug 1663570

Summary: multisite sync errors from operations on a versioning-suspended bucket
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Casey Bodley <cbodley>
Component: RGW-MultisiteAssignee: Casey Bodley <cbodley>
Status: CLOSED ERRATA QA Contact: Tejas <tchandra>
Severity: high Docs Contact: Bara Ancincova <bancinco>
Priority: high    
Version: 3.2CC: bancinco, cbodley, ceph-eng-bugs, ceph-qe-bugs, edonnell, hnallurv, ivancich, mbenjamin, mhackett, tchandra, tserlin, vimishra, vumrao
Target Milestone: z1   
Target Release: 3.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-12.2.8-65.el7cp Ubuntu: ceph_12.2.8-52redhat1xenial Doc Type: Bug Fix
Doc Text:
.Objects are now synced correctly in versioning-suspended buckets Due to a bug in multi-site sync of versioning-suspended buckets, certain object versioning attributes were overwritten with incorrect values. Consequently, the objects failed to sync and attempted to retry endlessly, blocking further sync progress. With this update, the sync process no longer overwrites versioning attributes. In addition, any broken attributes are now detected and repaired. As a result, objects are synced correctly in versioning-suspended buckets.
Story Points: ---
Clone Of:
: 1690927 (view as bug list) Environment:
Last Closed: 2019-03-07 15:51:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1629656, 1690927    

Description Casey Bodley 2019-01-04 21:50:35 UTC
Description of problem:

Overwrites in a versioning-suspended bucket will fail to sync with errors like "cls_rgw_bucket_link_olh() returned r=-125" and block further progress on the bucket.


How reproducible: 100%


Steps to Reproduce:

1. deploy a two-zone multisite configuration
2. on the master zone, create a bucket
3. upload an object "obj"
4. enable versioning on the bucket
5. reupload the same object "obj"
6. suspend versioning on the bucket
7. reupload the same object "obj"


Actual results:

Multisite sync on the secondary zone gets stuck attempting to sync the third upload, producing radosgw log errors like "cls_rgw_bucket_link_olh() returned r=-125" and osd log errors like "NOTICE: op.olh_tag (zxopy27aag3jjr38ddtow7517gdpgz4c) != olh.tag (bne5h7ou7gingobf89ae5crr2p3p284y)".

Expected results:

Sync in versioning-suspended buckets should succeed and converge on the same objects/versions in both zones.

Comment 38 errata-xmlrpc 2019-03-07 15:51:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0475