Bug 1459967

Summary: [RGW]: Data sync issue seen post failover and failback on a multisite environment
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Tejas <tchandra>
Component: RGWAssignee: Casey Bodley <cbodley>
Status: CLOSED CURRENTRELEASE QA Contact: ceph-qe-bugs <ceph-qe-bugs>
Severity: medium Docs Contact: Bara Ancincova <bancinco>
Priority: low    
Version: 2.3CC: anharris, cbodley, ceph-eng-bugs, edonnell, hnallurv, kbader, kdreyer, mbenjamin, sweil, uboppana
Target Milestone: rc   
Target Release: 3.*   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
.Failover and failback cause data sync issues in multi-site environments In environments using the Ceph Object Gateway multi-site feature, failover and failback cause data sync to stall. This is because the `radosgw-admin sync status` command reports that `data sync is behind` for an extended period of time. To workaround this issue, use the `radosgw-admin data sync init` command and restart the Gateways.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-03-06 00:09:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1437916, 1494421    
Attachments:
Description Flags
console log none

Description Tejas 2017-06-08 17:08:33 UTC
Created attachment 1286194 [details]
console log

Description of problem:

   In a 2 site setup, after a failover and failback is done, we see the error like " data is behind on 3 shards"

Version-Release number of selected component (if applicable):
ceph version 10.2.7-29redhat1xenial

How reproducible:
Always

Steps to Reproduce:
1. Create a user and a few buckets from either side.
2. Bring site A down(primary), and switch zone. Create a new bucket from site B.
3. Bring site A back up, and switch A back as master. Now list the contents of the bucket created when A was down.
4. The bucket can be listed, but the contents are not visible from A.
5. Upload  another object to the same bucket from B, and then data previously written to the same bucket can also be seen  from A.


Additional info:

status after failback:
radosgw-admin sync status --cluster master
          realm 0b2eced7-a62e-4509-bf5c-97b0273eb333 (movies)
      zonegroup d3177342-0542-48ad-aac9-6d654415769c (us)
           zone 7a75101f-019b-4735-b9e1-ff4f5758ac4c (us-east)
  metadata sync no sync (zone is master)
      data sync source: b7476a47-586d-4c21-a795-57fd32297c61 (us-west)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is behind on 3 shards
I will attach a log of my findings to this BZ.

Comment 9 Erin Donnelly 2017-06-15 12:28:18 UTC
Thanks Casey--updated doc text info.