Description of problem: Stop radosgw process on a non-master zone in three-way multisite env. Continue doing I/o operations on the other zones. Start the rgw process. Creation of new buckets fail to sync post restart. Version-Release number of selected component (if applicable): ceph-radosgw-10.2.5-13.el7cp.x86_64 Steps to Reproduce: 1. Configure three-way multisite clusters with one zone in each site. 2. Create buckets and objects and ensure that all of them have synced. 3. Stop radosgw on a non-master zone. 4. Continue writing to other zones. 5. Start radosgw on the zone Actual results: The objects written to existing buckets sync to the zone where rgw was restarted. But fails to sync new bucket create operations after restart. Additional info: These errors occur in radosgw-admin sync status command: 2017-02-08 08:38:11.720841 7f79648d19c0 -1 ERROR: could not find remote sync shard status for shard_id=122 2017-02-08 08:38:11.720842 7f79648d19c0 -1 ERROR: could not find remote sync shard status for shard_id=123 2017-02-08 08:38:11.720842 7f79648d19c0 -1 ERROR: could not find remote sync shard status for shard_id=124 2017-02-08 08:38:11.720843 7f79648d19c0 -1 ERROR: could not find remote sync shard status for shard_id=125 2017-02-08 08:38:11.720844 7f79648d19c0 -1 ERROR: could not find remote sync shard status for shard_id=126 2017-02-08 08:38:11.720844 7f79648d19c0 -1 ERROR: could not find remote sync shard status for shard_id=127 Checking logs for shard_id=124 2017-02-08 07:32:49.105363 7f5af3fff700 10 sync: incremental_sync: shard_id=124 r=-22 2017-02-08 07:32:49.105364 7f5af3fff700 20 cr:s=0x7f5ae80e0d30:op=0x7f5ae8befa60:18RGWDataSyncShardCR: operate() returned r=-22 2017-02-08 07:32:49.105366 7f5af3fff700 20 cr:s=0x7f5ae80e0d30:op=0x7f5ae80e03f0:25RGWDataSyncShardControlCR: operate() 2017-02-08 07:32:49.105372 7f5af3fff700 5 Sync:94b94a1a:data:DataShard:datalog.sync-status.shard.94b94a1a-6aa1-4944-9064-a5ae68bf3811.124:finish 2017-02-08 07:32:49.105374 7f5af3fff700 0 rgw meta sync: ERROR: RGWBackoffControlCR called coroutine returned -22 2017-02-08 07:32:49.105378 7f5af3fff700 20 run: stack=0x7f5ae80e0d30 is io blocked 2017-02-08 07:32:49.105608 7f5af3fff700 20 cr:s=0x7f5ae8bf65f0:op=0x7f5ae81de070:21RGWRadosSetOmapKeysCR: operate() 2017-02-08 07:32:49.105615 7f5af3fff700 20 cr:s=0x7f5ae8bf65f0:op=0x7f5ae81de070:21RGWRadosSetOmapKeysCR: operate() 2017-02-08 07:32:49.105617 7f5af3fff700 20 cr:s=0x7f5ae8bf65f0:op=0x7f5ae81de070:21RGWRadosSetOmapKeysCR: operate() 2017-02-08 07:32:49.105618 7f5af3fff700 20 cr:s=0x7f5ae8bf65f0:op=0x7f5ae81de070:21RGWRadosSetOmapKeysCR: operate() 2017-02-08 07:32:49.105622 7f5af3fff700 20 cr:s=0x7f5ae8bf65f0:op=0x7f5ae8bf6870:13RGWOmapAppend: operate() 2017-02-08 07:32:49.105626 7f5af3fff700 20 run: stack=0x7f5ae8bf65f0 is done 2017-02-08 07:32:49.105628 7f5af3fff700 20 cr:s=0x7f5ae80fbdd0:op=0x7f5ae8bf48c0:18RGWDataSyncShardCR: operate() 2017-02-08 07:32:49.105629 7f5af3fff700 20 collect(): s=0x7f5ae80fbdd0 stack=0x7f5ae8bf65f0 is complete 2017-02-08 07:32:49.105630 7f5af3fff700 20 collect(): s=0x7f5ae80fbdd0 stack=0x7f5ae8bf7c30 is still running 2017-02-08 07:32:49.105631 7f5af3fff700 20 run: stack=0x7f5ae80fbdd0 is_blocked_by_stack()=0 is_sleeping=0 waiting_for_child()=1 2017-02-08 07:32:49.105761 7f5af3fff700 20 cr:s=0x7f5ae8bf7c30:op=0x7f5ae83893c0:22RGWSimpleRadosUnlockCR: operate() 2017-02-08 07:32:49.105767 7f5af3fff700 20 cr:s=0x7f5ae8bf7c30:op=0x7f5ae83893c0:22RGWSimpleRadosUnlockCR: operate() 2017-02-08 07:32:49.105769 7f5af3fff700 20 cr:s=0x7f5ae8bf7c30:op=0x7f5ae83893c0:22RGWSimpleRadosUnlockCR: operate() 2017-02-08 07:32:49.105771 7f5af3fff700 20 cr:s=0x7f5ae8bf7c30:op=0x7f5ae83893c0:22RGWSimpleRadosUnlockCR: operate() 2017-02-08 07:32:49.105777 7f5af3fff700 20 cr:s=0x7f5ae8bf7c30:op=0x7f5ae8bf7380:20RGWContinuousLeaseCR: operate() 2017-02-08 07:32:49.105778 7f5af3fff700 20 run: stack=0x7f5ae8bf7c30 is done 2017-02-08 07:32:49.105780 7f5af3fff700 20 cr:s=0x7f5ae80fbdd0:op=0x7f5ae8bf48c0:18RGWDataSyncShardCR: operate() 2017-02-08 07:32:49.105781 7f5af3fff700 20 collect(): s=0x7f5ae80fbdd0 stack=0x7f5ae8bf7c30 is complete 2017-02-08 07:32:49.105783 7f5af3fff700 20 cr:s=0x7f5ae80fbdd0:op=0x7f5ae8bf48c0:18RGWDataSyncShardCR: operate() Not sure if it is related to http://tracker.ceph.com/issues/17569
Hi Shilpa, can you provides radosgw logs (for all the nodes). Thanks