Created attachment 1171317 [details] Peer logs Version-Release number of selected component (if applicable): ceph-radosgw-10.2.2-5.el7cp.x86_64 curl-7.29.0-31.el7 How reproducible: Always Steps to Reproduce: 1. Upload an object of about 1.5G from master zone. Wait for it to finish uploading. While the object has not finished syncing, delete the object from the master zone. At this point checked the object sync operation has completed and now the object starts to reverse sync thereby re-creating the object on the master zone. 2. Introduced a network delay of 200ms between the two rgw nodes. Did the same operation of creating an object on master zone but with a smaller size file of 500MB and deleting it before the sync finishes. This time, the first object create sync operation never completed. The two zones are out of sync: # radosgw-admin sync status --rgw-zone=us-2 --debug-rgw=0 --debug-ms=0 realm fedc07d8-a4cc-40c0-b8ad-4e1be8251726 (earth) zonegroup 4401713c-7fdf-4619-adea-829c5e7fdd0d (us) zone 591f5f4f-2b22-4346-ae9c-45c7e37ad5ac (us-2) metadata sync syncing full sync: 0/64 shards metadata is caught up with master incremental sync: 64/64 shards data sync source: 38b0ab46-20fd-4c94-9f19-193e86c7e343 (us-1) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is behind on 2 shards oldest incremental change not applied: 2016-06-22 07:37:26.0.924151s
Created attachment 1171318 [details] Master zone logs
Work by Yehuda to address this issue was merged upstream in https://github.com/ceph/ceph/pull/9481. I have a small fix to that work in https://github.com/ceph/ceph/pull/9851 that has yet to be merged.
Ken, all 9 patches for this fix have been cherry-picked to ceph-2-rhel-patches
The issue is no longer seen from ceph 10.2.2-15
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1755.html