Bug 1423402

Summary: Three-way Multisite: Zonegroup rename ends up in incorrect state
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: shilpa <smanjara>
Component: RGWAssignee: Casey Bodley <cbodley>
Status: CLOSED DEFERRED QA Contact: shilpa <smanjara>
Severity: medium Docs Contact: Erin Donnelly <edonnell>
Priority: urgent    
Version: 2.2CC: anharris, cbodley, ceph-eng-bugs, edonnell, hnallurv, kbader, kdreyer, mbenjamin, smanjara, sweil, uboppana
Target Milestone: rc   
Target Release: 3.*   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
.Old zone group name is sometimes displayed alongside with the new one In a multi-site configuration when a zone group is renamed, other zones can in some cases continue to display the old zone group name in the output of the `radosgw-admin zonegroup list` command. To work around this issue: . Verify that the new zone group name is present on each cluster. . Remove the old zone group name: + ---- $ rados -p .rgw.root rm zonegroups_names.<old-name> ---- //
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-01-30 15:03:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1412948, 1437916, 1494421    

Description shilpa 2017-02-17 09:14:04 UTC
Description of problem:
Rename zonegroup on master zone and do a period update commit. The non-master zones ends up with both the old and new zonegroup names.. However period gets updated correctly on all the zones. 

Version-Release number of selected component (if applicable):
ceph-radosgw-10.2.5-22.el7cp.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Configure 3-way multisite. 
2. Change zonegroup name on master zone and update the changes on all zones:

radosgw-admin zonegroup rename --rgw-zonegroup=us --zonegroup-new-name=US --master --default  --endpoints=http://magna039:8080

radosgw-admin period update --commit

  "period_map": {
        "id": "55c0334d-2fab-4eb5-b73a-532244717cb3",
        "zonegroups": [
            {
                "id": "7a07c13c-85ea-4660-9ae9-e70fa57ee2dc",
                "name": "US",
                "api_name": "us",
                "is_master": "true",
                "endpoints": [
                    "http:\/\/magna039:8080"



Actual results:

 After this, sync status errors out with:
# radosgw-admin sync status --debug-rgw=0
          realm 3d6b536c-0e74-446e-9f4e-08cd3ab01a6b (movies)
      zonegroup 7a07c13c-85ea-4660-9ae9-e70fa57ee2dc (us)
           zone 931420c9-f70c-4099-a843-b34c4ba3da7a (us-west)
  metadata sync syncing
                full sync: 0/64 shards
                metadata is caught up with master
                incremental sync: 64/64 shards
2017-02-17 08:59:22.141421 7f56eb6819c0  0 ERROR: failed to fetch datalog info
      data sync source: 94b94a1a-6aa1-4944-9064-a5ae68bf3811 (us-east)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source
                source: b4607000-b77f-48c5-bd70-85913a788035 (us-central)
                        failed to retrieve sync info: (5) Input/output error


All swift/S3 commands hang.


Additional info:

On the non-master zones we end up with both older and new zonegroup names :
{
    "default_info": "7a07c13c-85ea-4660-9ae9-e70fa57ee2dc",
    "zonegroups": [
        "us",
        "US",
        "default"
    ]
}

Comment 6 shilpa 2017-02-23 05:55:46 UTC
Ok, I tried to reproduce it. What I don't see is swift/S3 commands hanging. But I do see that on the non-master zones, we end up having both old and the new zonegroup names. 

On master after renaming zg from 'us' to 'US':

#radosgw-admin zonegroup list
{
    "default_info": "8eb889a4-9716-4bba-96db-231b260f6f61",
    "zonegroups": [
        "us",
        "default"
    ]
}

On non-master zones:

{
    "default_info": "8eb889a4-9716-4bba-96db-231b260f6f61",
    "zonegroups": [
        "us",
        "US",
        "default"
    ]
}


But the period shows the zonegroup new name alone, which is correct.

"period_map": {
        "id": "c2a1459d-eecc-4231-a340-0549f71b2d42",
        "zonegroups": [
            {
                "id": "8eb889a4-9716-4bba-96db-231b260f6f61",
                "name": "US",
                "api_name": "us",
                "is_master": "true",
                "endpoints": [
                    "http:\/\/magna039:8080"

Comment 7 Casey Bodley 2017-02-27 15:36:40 UTC
Okay, thanks Shilpa. So the issue is a leak of the old zonegroup name object on non-master zones. I can confirm that radosgw doesn't have any logic that tries to clean these up when it gets a new period.

As a workaround, the rados tool can remove the 'zonegroups_names.us' object from the .rgw.root pool on non-master zones:

$ rados -p .rgw.root rm zonegroups_names.us

Comment 15 Ken Dreyer (Red Hat) 2017-04-06 23:06:44 UTC
Casey, will this be fixed in Ceph v10.2.7, or will we need more downstream patches on top of that release?

Comment 16 Casey Bodley 2017-04-17 19:10:38 UTC
Ken, this has not been fixed upstream.

Comment 27 Drew Harris 2019-01-30 15:03:06 UTC
I have closed this issue because it has been inactive for some time now. If you feel this still deserves attention feel free to reopen it.