Description of problem: Rename zonegroup on master zone and do a period update commit. The non-master zones ends up with both the old and new zonegroup names.. However period gets updated correctly on all the zones. Version-Release number of selected component (if applicable): ceph-radosgw-10.2.5-22.el7cp.x86_64 How reproducible: Always Steps to Reproduce: 1. Configure 3-way multisite. 2. Change zonegroup name on master zone and update the changes on all zones: radosgw-admin zonegroup rename --rgw-zonegroup=us --zonegroup-new-name=US --master --default --endpoints=http://magna039:8080 radosgw-admin period update --commit "period_map": { "id": "55c0334d-2fab-4eb5-b73a-532244717cb3", "zonegroups": [ { "id": "7a07c13c-85ea-4660-9ae9-e70fa57ee2dc", "name": "US", "api_name": "us", "is_master": "true", "endpoints": [ "http:\/\/magna039:8080" Actual results: After this, sync status errors out with: # radosgw-admin sync status --debug-rgw=0 realm 3d6b536c-0e74-446e-9f4e-08cd3ab01a6b (movies) zonegroup 7a07c13c-85ea-4660-9ae9-e70fa57ee2dc (us) zone 931420c9-f70c-4099-a843-b34c4ba3da7a (us-west) metadata sync syncing full sync: 0/64 shards metadata is caught up with master incremental sync: 64/64 shards 2017-02-17 08:59:22.141421 7f56eb6819c0 0 ERROR: failed to fetch datalog info data sync source: 94b94a1a-6aa1-4944-9064-a5ae68bf3811 (us-east) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is caught up with source source: b4607000-b77f-48c5-bd70-85913a788035 (us-central) failed to retrieve sync info: (5) Input/output error All swift/S3 commands hang. Additional info: On the non-master zones we end up with both older and new zonegroup names : { "default_info": "7a07c13c-85ea-4660-9ae9-e70fa57ee2dc", "zonegroups": [ "us", "US", "default" ] }
Ok, I tried to reproduce it. What I don't see is swift/S3 commands hanging. But I do see that on the non-master zones, we end up having both old and the new zonegroup names. On master after renaming zg from 'us' to 'US': #radosgw-admin zonegroup list { "default_info": "8eb889a4-9716-4bba-96db-231b260f6f61", "zonegroups": [ "us", "default" ] } On non-master zones: { "default_info": "8eb889a4-9716-4bba-96db-231b260f6f61", "zonegroups": [ "us", "US", "default" ] } But the period shows the zonegroup new name alone, which is correct. "period_map": { "id": "c2a1459d-eecc-4231-a340-0549f71b2d42", "zonegroups": [ { "id": "8eb889a4-9716-4bba-96db-231b260f6f61", "name": "US", "api_name": "us", "is_master": "true", "endpoints": [ "http:\/\/magna039:8080"
Okay, thanks Shilpa. So the issue is a leak of the old zonegroup name object on non-master zones. I can confirm that radosgw doesn't have any logic that tries to clean these up when it gets a new period. As a workaround, the rados tool can remove the 'zonegroups_names.us' object from the .rgw.root pool on non-master zones: $ rados -p .rgw.root rm zonegroups_names.us
Casey, will this be fixed in Ceph v10.2.7, or will we need more downstream patches on top of that release?
Ken, this has not been fixed upstream.
I have closed this issue because it has been inactive for some time now. If you feel this still deserves attention feel free to reopen it.