Description of problem: Create 100 buckets and upload object in each bucket on both rgw zones in parallel. The objects sync successfully to master zone whereas on the non-master zone, the buckets are synced from master but not the objects Version-Release number of selected component (if applicable): ceph-radosgw-10.2.2-5.el7cp.x86_64 How reproducible: Always Steps to Reproduce: 1. Start a boto script on each zone to create and upload 100 buckets with an object of 1.5G in each bucket. 2. All objects and buckets are synced to master but only buckets are synced to non-master zone. No objects are created. Actual results: On master: ]# radosgw-admin sync status --rgw-zone=us-1 --debug-rgw=0 --debug-ms=0 realm 1c60c863-689d-441f-b370-62390562e2aa (earth) zonegroup 540c9b3f-5eb7-4a67-a581-54bc704ce827 (us) zone 505a3a8e-19cf-4295-a43d-559e763891f6 (us-1) metadata sync no sync (zone is master) data sync source: d48cb942-a5fa-4597-89fd-0bab3bb9c5a3 (us-2) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is caught up with source On non-master: # radosgw-admin sync status --rgw-zone=us-2 --debug-rgw=0 --debug-ms=0 realm 1c60c863-689d-441f-b370-62390562e2aa (earth) zonegroup 540c9b3f-5eb7-4a67-a581-54bc704ce827 (us) zone d48cb942-a5fa-4597-89fd-0bab3bb9c5a3 (us-2) metadata sync syncing full sync: 0/64 shards incremental sync: 64/64 shards metadata is behind on 1 shards oldest incremental change not applied: 2016-06-23 09:57:35.0.097857s data sync source: 505a3a8e-19cf-4295-a43d-559e763891f6 (us-1) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is behind on 8 shards oldest incremental change not applied: 2016-06-29 07:34:15.0.194232s Expected results: Sync should work from both side simultaneously Additional info: Will provide the path to logs
Created attachment 1174005 [details] boto script to create workload on magna115
It seems that data sync stopped following errors in requests from non-master to master (retrieving of all incremental data log shards failed, maybe timing out due to lack of request processing threads on the master).
Tested and verified on 10.2.2-15. Issues related to segfault in multisite operations are tracked in other BZ's.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1755.html