Bug 1423402 - Three-way Multisite: Zonegroup rename ends up in incorrect state
Summary: Three-way Multisite: Zonegroup rename ends up in incorrect state
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RGW
Version: 2.2
Hardware: Unspecified
OS: Unspecified
urgent
medium
Target Milestone: rc
: 3.*
Assignee: Casey Bodley
QA Contact: shilpa
Erin Donnelly
URL:
Whiteboard:
Keywords:
Depends On:
Blocks: 1412948 1437916 1494421
TreeView+ depends on / blocked
 
Reported: 2017-02-17 09:14 UTC by shilpa
Modified: 2019-01-30 15:03 UTC (History)
11 users (show)

(edit)
.Old zone group name is sometimes displayed alongside with the new one

In a multi-site configuration when a zone group is renamed, other zones can in some cases continue to display the old zone group name in the output of the `radosgw-admin zonegroup list` command. 

To work around this issue:

. Verify that the new zone group name is present on each cluster.
. Remove the old zone group name:
+
----
$ rados -p .rgw.root rm zonegroups_names.<old-name>
----
//
Clone Of:
(edit)
Last Closed: 2019-01-30 15:03:06 UTC


Attachments (Terms of Use)

Description shilpa 2017-02-17 09:14:04 UTC
Description of problem:
Rename zonegroup on master zone and do a period update commit. The non-master zones ends up with both the old and new zonegroup names.. However period gets updated correctly on all the zones. 

Version-Release number of selected component (if applicable):
ceph-radosgw-10.2.5-22.el7cp.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Configure 3-way multisite. 
2. Change zonegroup name on master zone and update the changes on all zones:

radosgw-admin zonegroup rename --rgw-zonegroup=us --zonegroup-new-name=US --master --default  --endpoints=http://magna039:8080

radosgw-admin period update --commit

  "period_map": {
        "id": "55c0334d-2fab-4eb5-b73a-532244717cb3",
        "zonegroups": [
            {
                "id": "7a07c13c-85ea-4660-9ae9-e70fa57ee2dc",
                "name": "US",
                "api_name": "us",
                "is_master": "true",
                "endpoints": [
                    "http:\/\/magna039:8080"



Actual results:

 After this, sync status errors out with:
# radosgw-admin sync status --debug-rgw=0
          realm 3d6b536c-0e74-446e-9f4e-08cd3ab01a6b (movies)
      zonegroup 7a07c13c-85ea-4660-9ae9-e70fa57ee2dc (us)
           zone 931420c9-f70c-4099-a843-b34c4ba3da7a (us-west)
  metadata sync syncing
                full sync: 0/64 shards
                metadata is caught up with master
                incremental sync: 64/64 shards
2017-02-17 08:59:22.141421 7f56eb6819c0  0 ERROR: failed to fetch datalog info
      data sync source: 94b94a1a-6aa1-4944-9064-a5ae68bf3811 (us-east)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source
                source: b4607000-b77f-48c5-bd70-85913a788035 (us-central)
                        failed to retrieve sync info: (5) Input/output error


All swift/S3 commands hang.


Additional info:

On the non-master zones we end up with both older and new zonegroup names :
{
    "default_info": "7a07c13c-85ea-4660-9ae9-e70fa57ee2dc",
    "zonegroups": [
        "us",
        "US",
        "default"
    ]
}

Comment 6 shilpa 2017-02-23 05:55:46 UTC
Ok, I tried to reproduce it. What I don't see is swift/S3 commands hanging. But I do see that on the non-master zones, we end up having both old and the new zonegroup names. 

On master after renaming zg from 'us' to 'US':

#radosgw-admin zonegroup list
{
    "default_info": "8eb889a4-9716-4bba-96db-231b260f6f61",
    "zonegroups": [
        "us",
        "default"
    ]
}

On non-master zones:

{
    "default_info": "8eb889a4-9716-4bba-96db-231b260f6f61",
    "zonegroups": [
        "us",
        "US",
        "default"
    ]
}


But the period shows the zonegroup new name alone, which is correct.

"period_map": {
        "id": "c2a1459d-eecc-4231-a340-0549f71b2d42",
        "zonegroups": [
            {
                "id": "8eb889a4-9716-4bba-96db-231b260f6f61",
                "name": "US",
                "api_name": "us",
                "is_master": "true",
                "endpoints": [
                    "http:\/\/magna039:8080"

Comment 7 Casey Bodley 2017-02-27 15:36:40 UTC
Okay, thanks Shilpa. So the issue is a leak of the old zonegroup name object on non-master zones. I can confirm that radosgw doesn't have any logic that tries to clean these up when it gets a new period.

As a workaround, the rados tool can remove the 'zonegroups_names.us' object from the .rgw.root pool on non-master zones:

$ rados -p .rgw.root rm zonegroups_names.us

Comment 15 Ken Dreyer (Red Hat) 2017-04-06 23:06:44 UTC
Casey, will this be fixed in Ceph v10.2.7, or will we need more downstream patches on top of that release?

Comment 16 Casey Bodley 2017-04-17 19:10:38 UTC
Ken, this has not been fixed upstream.

Comment 27 Drew Harris 2019-01-30 15:03:06 UTC
I have closed this issue because it has been inactive for some time now. If you feel this still deserves attention feel free to reopen it.


Note You need to log in before you can comment on or make changes to this bug.