Bug 1359696 - When the zones are switched, sync status complains of period mismatch
Summary: When the zones are switched, sync status complains of period mismatch
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RGW
Version: 2.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: rc
: 2.0
Assignee: Casey Bodley
QA Contact: ceph-qe-bugs
Depends On:
TreeView+ depends on / blocked
Reported: 2016-07-25 10:17 UTC by shilpa
Modified: 2017-07-30 15:49 UTC (History)
9 users (show)

Fixed In Version: RHEL: ceph-10.2.2-31.el7cp Ubuntu: ceph_10.2.2-24redhat1xenial
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2016-08-23 19:45:13 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 16834 0 None None None 2016-07-27 15:44:18 UTC
Red Hat Product Errata RHBA-2016:1755 0 normal SHIPPED_LIVE Red Hat Ceph Storage 2.0 bug fix and enhancement update 2016-08-23 23:23:52 UTC

Description shilpa 2016-07-25 10:17:19 UTC
Description of problem:
Switch master and non-master zones. Check sync status on the current non-master zone. 

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Modify non-master zone with '--master' flag.
radosgw-admin zone modify --rgw-zonegroup=us --rgw-zone=us-2 --access_key=secret --secret=secret --endpoints=http://magna059:80 --default --master
2. Update and commit the period and restart rgw gateway
3. On the current non-master zone:

# radosgw-admin sync status --rgw-zone=us-1 --debug-rgw=0 --debug-ms=0
2016-07-25 10:14:33.479604 7f82e22b39c0  0 error in read_id for id  : (2) No such file or directory
          realm bee08496-1b97-4f04-8d3f-c682c08565a3 (earth)
      zonegroup 0bf0fc77-43ce-4a44-8b16-8f5fcfa84c95 (us)
           zone f5717851-2682-475a-b24b-7bcdec728cbe (us-1)
  metadata sync syncing
                full sync: 0/64 shards
                master is on a different period: master_period= local_period=86c3c833-ee7e-454c-a54d-b451fd829755
                metadata is caught up with master
                incremental sync: 64/64 shards
      data sync source: a6dfdcd1-6d8a-4a34-8c47-4b6a02ac8105 (us-2)
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source

Both the zone are actually on the same period and it is totally different from the one that is shown in the above command:

]# radosgw-admin period get-current --debug-rgw=0 --debug-ms=0
    "current_period": "21a40ae5-721f-4a67-8c3a-28baf3104d3f"

  "period_map": {
        "id": "21a40ae5-721f-4a67-8c3a-28baf3104d3f",
        "zonegroups": [
                "id": "0bf0fc77-43ce-4a44-8b16-8f5fcfa84c95",
                "name": "us",
                "api_name": "us",
                "is_master": "true",
                "endpoints": [
                "hostnames": [],
                "hostnames_s3website": [],
                "master_zone": "a6dfdcd1-6d8a-4a34-8c47-4b6a02ac8105",
                "zones": [
                        "id": "a6dfdcd1-6d8a-4a34-8c47-4b6a02ac8105",
                        "name": "us-2",
                        "endpoints": [
                        "log_meta": "true",
                        "log_data": "true",
                        "bucket_index_max_shards": 0,
                        "read_only": "false"
                        "id": "f5717851-2682-475a-b24b-7bcdec728cbe",
                        "name": "us-1",
                        "endpoints": [
                        "log_meta": "false",
                        "log_data": "true",
                        "bucket_index_max_shards": 0,
                        "read_only": "false"
                "placement_targets": [
                        "name": "default-placement",
                        "tags": []
                "default_placement": "default-placement",
                "realm_id": "bee08496-1b97-4f04-8d3f-c682c08565a3"
        "short_zone_ids": [
                "key": "a6dfdcd1-6d8a-4a34-8c47-4b6a02ac8105",
                "val": 4235791203
                "key": "f5717851-2682-475a-b24b-7bcdec728cbe",
                "val": 311909243
    "master_zonegroup": "0bf0fc77-43ce-4a44-8b16-8f5fcfa84c95",
    "master_zone": "a6dfdcd1-6d8a-4a34-8c47-4b6a02ac8105",

Comment 2 Orit Wasserman 2016-07-25 18:49:15 UTC
can you provide the procedure to switch masters?
can you provide rgw logs?

Comment 5 Casey Bodley 2016-07-26 21:09:00 UTC
I've reproduced this "master is on a different period: master_period= " error. Here's what I've learned:

The "radosgw-admin zone modify --rgw-zonegroup=us --rgw-zone=us-2 --master" command correctly changes the zonegroup's "master_zone" field to point to us-2, but doesn't modify the zonegroup's "endpoints" field.

When the "radosgw-admin sync status" command sees that it's not the master zone, it tries to send a "get_metadata_log_info" request to the new master zone's gateway. It uses the RGWRados::rest_master_conn connection, which is initialized with the zonegroup's endpoints, to do this. So after switching the master zone, it's accidentally sending the request to the endpoint associated with the old master zone, us-1. us-1 knows it's not the master zone, so it returns the empty period id that's displayed as "master_period= ".

This issue is larger than just the "sync status" output, though. We also use rest_master_conn:
* for all metadata sync requests
* when forwarding bucket/user creation operations that need to be processed by the metadata master
* when committing periods or fetching periods that we're missing

Comment 6 Casey Bodley 2016-07-26 22:32:49 UTC
As a workaround, you can fix the zonegroup endpoints with:

$ radosgw-admin zonegroup modify --rgw-zonegroup=us --endpoints=http://magna059:80

I'll discuss this with the rest of the multisite team to see if we can do better.

Comment 7 shilpa 2016-07-27 13:39:18 UTC
(In reply to Casey Bodley from comment #6)
> As a workaround, you can fix the zonegroup endpoints with:
> $ radosgw-admin zonegroup modify --rgw-zonegroup=us
> --endpoints=http://magna059:80
> I'll discuss this with the rest of the multisite team to see if we can do
> better.

The workaround helps. Thanks.

Comment 8 Ken Dreyer (Red Hat) 2016-07-27 14:02:39 UTC
Shilpa, did you follow any document when running the initial "radosgw-admin zone modify" command that led to this bug?

Discussed in the QE/Dev sync today. Next steps: Casey and the RGW team will update the zone modification workflow so that the workaround in Comment #6 is not needed.

Comment 16 shilpa 2016-08-01 05:46:35 UTC
Verified in ceph-10.2.2-32

Comment 18 errata-xmlrpc 2016-08-23 19:45:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.