Description of problem: Switch master and non-master zones. Check sync status on the current non-master zone. Version-Release number of selected component (if applicable): ceph-radosgw-10.2.2-27.el7cp.x86_64 How reproducible: Always Steps to Reproduce: 1. Modify non-master zone with '--master' flag. radosgw-admin zone modify --rgw-zonegroup=us --rgw-zone=us-2 --access_key=secret --secret=secret --endpoints=http://magna059:80 --default --master 2. Update and commit the period and restart rgw gateway 3. On the current non-master zone: # radosgw-admin sync status --rgw-zone=us-1 --debug-rgw=0 --debug-ms=0 2016-07-25 10:14:33.479604 7f82e22b39c0 0 error in read_id for id : (2) No such file or directory realm bee08496-1b97-4f04-8d3f-c682c08565a3 (earth) zonegroup 0bf0fc77-43ce-4a44-8b16-8f5fcfa84c95 (us) zone f5717851-2682-475a-b24b-7bcdec728cbe (us-1) metadata sync syncing full sync: 0/64 shards master is on a different period: master_period= local_period=86c3c833-ee7e-454c-a54d-b451fd829755 metadata is caught up with master incremental sync: 64/64 shards data sync source: a6dfdcd1-6d8a-4a34-8c47-4b6a02ac8105 (us-2) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is caught up with source Both the zone are actually on the same period and it is totally different from the one that is shown in the above command: ]# radosgw-admin period get-current --debug-rgw=0 --debug-ms=0 { "current_period": "21a40ae5-721f-4a67-8c3a-28baf3104d3f" "period_map": { "id": "21a40ae5-721f-4a67-8c3a-28baf3104d3f", "zonegroups": [ { "id": "0bf0fc77-43ce-4a44-8b16-8f5fcfa84c95", "name": "us", "api_name": "us", "is_master": "true", "endpoints": [ "http:\/\/magna115:80" ], "hostnames": [], "hostnames_s3website": [], "master_zone": "a6dfdcd1-6d8a-4a34-8c47-4b6a02ac8105", "zones": [ { "id": "a6dfdcd1-6d8a-4a34-8c47-4b6a02ac8105", "name": "us-2", "endpoints": [ "http:\/\/magna059:80" ], "log_meta": "true", "log_data": "true", "bucket_index_max_shards": 0, "read_only": "false" }, { "id": "f5717851-2682-475a-b24b-7bcdec728cbe", "name": "us-1", "endpoints": [ "http:\/\/magna115:80" ], "log_meta": "false", "log_data": "true", "bucket_index_max_shards": 0, "read_only": "false" } ], "placement_targets": [ { "name": "default-placement", "tags": [] } ], "default_placement": "default-placement", "realm_id": "bee08496-1b97-4f04-8d3f-c682c08565a3" } ], "short_zone_ids": [ { "key": "a6dfdcd1-6d8a-4a34-8c47-4b6a02ac8105", "val": 4235791203 }, { "key": "f5717851-2682-475a-b24b-7bcdec728cbe", "val": 311909243 } ] }, "master_zonegroup": "0bf0fc77-43ce-4a44-8b16-8f5fcfa84c95", "master_zone": "a6dfdcd1-6d8a-4a34-8c47-4b6a02ac8105",
can you provide the procedure to switch masters? can you provide rgw logs?
I've reproduced this "master is on a different period: master_period= " error. Here's what I've learned: The "radosgw-admin zone modify --rgw-zonegroup=us --rgw-zone=us-2 --master" command correctly changes the zonegroup's "master_zone" field to point to us-2, but doesn't modify the zonegroup's "endpoints" field. When the "radosgw-admin sync status" command sees that it's not the master zone, it tries to send a "get_metadata_log_info" request to the new master zone's gateway. It uses the RGWRados::rest_master_conn connection, which is initialized with the zonegroup's endpoints, to do this. So after switching the master zone, it's accidentally sending the request to the endpoint associated with the old master zone, us-1. us-1 knows it's not the master zone, so it returns the empty period id that's displayed as "master_period= ". This issue is larger than just the "sync status" output, though. We also use rest_master_conn: * for all metadata sync requests * when forwarding bucket/user creation operations that need to be processed by the metadata master * when committing periods or fetching periods that we're missing
As a workaround, you can fix the zonegroup endpoints with: $ radosgw-admin zonegroup modify --rgw-zonegroup=us --endpoints=http://magna059:80 I'll discuss this with the rest of the multisite team to see if we can do better.
(In reply to Casey Bodley from comment #6) > As a workaround, you can fix the zonegroup endpoints with: > > $ radosgw-admin zonegroup modify --rgw-zonegroup=us > --endpoints=http://magna059:80 > > I'll discuss this with the rest of the multisite team to see if we can do > better. The workaround helps. Thanks.
Shilpa, did you follow any document when running the initial "radosgw-admin zone modify" command that led to this bug? Discussed in the QE/Dev sync today. Next steps: Casey and the RGW team will update the zone modification workflow so that the workaround in Comment #6 is not needed.
Verified in ceph-10.2.2-32
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1755.html