Bug 1459967 - [RGW]: Data sync issue seen post failover and failback on a multisite environment
Summary: [RGW]: Data sync issue seen post failover and failback on a multisite environ...
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RGW
Version: 2.3
Hardware: Unspecified
OS: Linux
Target Milestone: rc
: 3.*
Assignee: Casey Bodley
QA Contact: ceph-qe-bugs
Bara Ancincova
Depends On:
Blocks: 1437916 1494421
TreeView+ depends on / blocked
Reported: 2017-06-08 17:08 UTC by Tejas
Modified: 2019-03-06 00:09 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
.Failover and failback cause data sync issues in multi-site environments In environments using the Ceph Object Gateway multi-site feature, failover and failback cause data sync to stall. This is because the `radosgw-admin sync status` command reports that `data sync is behind` for an extended period of time. To workaround this issue, use the `radosgw-admin data sync init` command and restart the Gateways.
Clone Of:
Last Closed: 2019-03-06 00:09:20 UTC
Target Upstream Version:

Attachments (Terms of Use)
console log (9.11 KB, text/plain)
2017-06-08 17:08 UTC, Tejas
no flags Details

Description Tejas 2017-06-08 17:08:33 UTC
Created attachment 1286194 [details]
console log

Description of problem:

   In a 2 site setup, after a failover and failback is done, we see the error like " data is behind on 3 shards"

Version-Release number of selected component (if applicable):
ceph version 10.2.7-29redhat1xenial

How reproducible:

Steps to Reproduce:
1. Create a user and a few buckets from either side.
2. Bring site A down(primary), and switch zone. Create a new bucket from site B.
3. Bring site A back up, and switch A back as master. Now list the contents of the bucket created when A was down.
4. The bucket can be listed, but the contents are not visible from A.
5. Upload  another object to the same bucket from B, and then data previously written to the same bucket can also be seen  from A.

Additional info:

status after failback:
radosgw-admin sync status --cluster master
          realm 0b2eced7-a62e-4509-bf5c-97b0273eb333 (movies)
      zonegroup d3177342-0542-48ad-aac9-6d654415769c (us)
           zone 7a75101f-019b-4735-b9e1-ff4f5758ac4c (us-east)
  metadata sync no sync (zone is master)
      data sync source: b7476a47-586d-4c21-a795-57fd32297c61 (us-west)
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is behind on 3 shards
I will attach a log of my findings to this BZ.

Comment 9 Erin Donnelly 2017-06-15 12:28:18 UTC
Thanks Casey--updated doc text info.

Note You need to log in before you can comment on or make changes to this bug.