.Failover and failback cause data sync issues in multi-site environments
In environments using the Ceph Object Gateway multi-site feature, failover and failback cause data sync to stall. This is because the `radosgw-admin sync status` command reports that `data sync is behind` for an extended period of time.
To workaround this issue, use the `radosgw-admin data sync init` command and restart the Gateways.
Created attachment 1286194 [details]
Description of problem:
In a 2 site setup, after a failover and failback is done, we see the error like " data is behind on 3 shards"
Version-Release number of selected component (if applicable):
ceph version 10.2.7-29redhat1xenial
Steps to Reproduce:
1. Create a user and a few buckets from either side.
2. Bring site A down(primary), and switch zone. Create a new bucket from site B.
3. Bring site A back up, and switch A back as master. Now list the contents of the bucket created when A was down.
4. The bucket can be listed, but the contents are not visible from A.
5. Upload another object to the same bucket from B, and then data previously written to the same bucket can also be seen from A.
status after failback:
radosgw-admin sync status --cluster master
realm 0b2eced7-a62e-4509-bf5c-97b0273eb333 (movies)
zonegroup d3177342-0542-48ad-aac9-6d654415769c (us)
zone 7a75101f-019b-4735-b9e1-ff4f5758ac4c (us-east)
metadata sync no sync (zone is master)
data sync source: b7476a47-586d-4c21-a795-57fd32297c61 (us-west)
full sync: 0/128 shards
incremental sync: 128/128 shards
data is behind on 3 shards
I will attach a log of my findings to this BZ.
Thanks Casey--updated doc text info.