Bug 1734159
| Summary: | after converting single to multisite unable to access pre-existing containers | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | John Harrigan <jharriga> |
| Component: | RGW-Multisite | Assignee: | Matt Benjamin (redhat) <mbenjamin> |
| Status: | CLOSED NOTABUG | QA Contact: | Tejas <tchandra> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.2 | CC: | ceph-eng-bugs, ceph-qe-bugs, twilkins, vumrao |
| Target Milestone: | rc | ||
| Target Release: | 4.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-08-05 21:16:44 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1727980 | ||
|
Description
John Harrigan
2019-07-29 19:29:55 UTC
Hi John, Thank you for reporting this issue. I was looking into the details and when I reproduced the issue with debug_rgw=20 it gives a clear picture that what is the issue. It is on the same lines when this master site was converted to multi-site I think some misconfiguration happened and from logs, it is clear it was the zonegroup configuration. # swift stat mycontainers1 -A http://172.19.43.90:8080/auth/1.0 -U johndoe:swift -K AvY0ZEdywf6VffkZvBbKdoBDuK7DodzjHFV3ARTW Container HEAD failed: http://172.19.43.90:8080/swift/v1/mycontainers1 301 Moved Permanently Failed Transaction ID: tx00000000000000073b64f-005d3f5d27-166b80-site1 Log lines: 2019-07-29 20:55:03.025770 7fd64b38e700 0 NOTICE: request for data in a different zonegroup (23d95fab-b12e-44e4-b4d0-3426c5488433 != 32e0014b-0888-47a4-8b66-c306854477f9) ^^ Important line. 2019-07-29 20:55:03.025783 7fd64b38e700 20 op->ERRORHANDLER: err_no=-2024 new_err_no=-2024 2019-07-29 20:55:03.025806 7fd64b38e700 2 req 7583311:0.000192:swift:HEAD /swift/v1/mycontainers1:stat_bucket:op status=0 2019-07-29 20:55:03.025812 7fd64b38e700 2 req 7583311:0.000198:swift:HEAD /swift/v1/mycontainers1:stat_bucket:http status=301 2019-07-29 20:55:03.025814 7fd64b38e700 1 ====== req done req=0x7fd64b387f70 op status=0 http_status=301 ====== 2019-07-29 20:55:03.025820 7fd64b38e700 20 process_request() returned -2024 And if we check this bucket metadata: # radosgw-admin bucket stats --bucket=mycontainers1 { "bucket": "mycontainers1", "zonegroup": "23d95fab-b12e-44e4-b4d0-3426c5488433", <======================== "placement_rule": "default-placement", "explicit_placement": { "data_pool": "", "data_extra_pool": "", "index_pool": "" }, "id": "a43f457e-f9fe-45b7-8f1a-f71fc6607818.174418.6", "marker": "a43f457e-f9fe-45b7-8f1a-f71fc6607818.174418.1", "index_type": "Normal", "owner": "johndoe", "ver": "0#321181,1#319752,2#320261,3#321075", "master_ver": "0#0,1#0,2#0,3#0", "mtime": "2019-06-10 15:51:36.475965", "max_marker": "0#,1#,2#,3#", "usage": { "rgw.main": { "size": 11256588652000, "size_actual": 11256971444224, "size_utilized": 11256588652000, "size_kb": 10992762356, "size_kb_actual": 10993136176, "size_kb_utilized": 10992762356, "num_objects": 211743 } }, "bucket_quota": { "enabled": false, "check_on_raw": true, "max_size": -1, "max_size_kb": 0, "max_objects": -1 } } The given zonegroup is old zonegroup id - "23d95fab-b12e-44e4-b4d0-3426c5488433" and maybe the name was default and if we check the current configuration the zonegroup id is "32e0014b-0888-47a4-8b66-c306854477f9". # radosgw-admin realm list { "default_info": "b8494f5e-e2fc-4bf0-be91-16c879fc4cfe", "realms": [ "scaleLTA" ] } # radosgw-admin zonegroup list { "default_info": "32e0014b-0888-47a4-8b66-c306854477f9", "zonegroups": [ "cloud07" ] } # radosgw-admin zonegroup get --rgw-zonegroup=cloud07 { "id": "32e0014b-0888-47a4-8b66-c306854477f9", "name": "cloud07", "api_name": "", "is_master": "true", "endpoints": [ "http://172.19.43.90:8080", "http://172.19.43.218:8080", "http://172.19.43.67:8080", "http://172.19.43.86:8080" ], "hostnames": [], "hostnames_s3website": [], "master_zone": "7aff7505-6f63-406e-a424-240fc5720028", "zones": [ { "id": "7aff7505-6f63-406e-a424-240fc5720028", "name": "site1", "endpoints": [ "http://172.19.43.90:8080", "http://172.19.43.218:8080", "http://172.19.43.67:8080", "http://172.19.43.86:8080" ], "log_meta": "false", "log_data": "true", "bucket_index_max_shards": 0, "read_only": "false", "tier_type": "", "sync_from_all": "true", "sync_from": [] }, { "id": "9b6d2b94-7872-4328-b481-8b5bd7a58007", "name": "site2", "endpoints": [ "http://172.119.43.70:8080", "http://172.119.43.245:8080", "http://172.119.43.129:8080", "http://172.119.43.178:8080" ], "log_meta": "false", "log_data": "true", "bucket_index_max_shards": 0, "read_only": "false", "tier_type": "", "sync_from_all": "true", "sync_from": [] } ], "placement_targets": [ { "name": "default-placement", "tags": [] } ], "default_placement": "default-placement", "realm_id": "b8494f5e-e2fc-4bf0-be91-16c879fc4cfe" } # radosgw-admin zone list { "default_info": "7aff7505-6f63-406e-a424-240fc5720028", "zones": [ "site1" ] } # radosgw-admin zone get --rgw-zone=site1 { "id": "7aff7505-6f63-406e-a424-240fc5720028", "name": "site1", "domain_root": "default.rgw.meta:root", "control_pool": "default.rgw.control", "gc_pool": "default.rgw.log:gc", "lc_pool": "default.rgw.log:lc", "log_pool": "default.rgw.log", "intent_log_pool": "default.rgw.log:intent", "usage_log_pool": "default.rgw.log:usage", "reshard_pool": "default.rgw.log:reshard", "user_keys_pool": "default.rgw.meta:users.keys", "user_email_pool": "default.rgw.meta:users.email", "user_swift_pool": "default.rgw.meta:users.swift", "user_uid_pool": "default.rgw.meta:users.uid", "system_key": { "access_key": "xyz", "secret_key": "xyz" }, "placement_pools": [ { "key": "default-placement", "val": { "index_pool": "default.rgw.buckets.index", "data_pool": "default.rgw.buckets.data", "data_extra_pool": "default.rgw.buckets.non-ec", "index_type": 0, "compression": "" } } ], "metadata_heap": "", "tier_config": [], "realm_id": "b8494f5e-e2fc-4bf0-be91-16c879fc4cfe" } # ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 175TiB 101TiB 73.2TiB 41.95 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS scbench 37 11.0GiB 0.04 24.3TiB 2807 default.rgw.users.keys 198 0B 0 24.3TiB 0 default.rgw.data.root 199 0B 0 24.3TiB 0 .rgw.root 200 19.9KiB 0 24.3TiB 39 default.rgw.control 201 0B 0 24.3TiB 8 default.rgw.gc 202 0B 0 24.3TiB 0 default.rgw.buckets.data 203 51.1TiB 49.59 52.0TiB 14207772 default.rgw.buckets.index 204 0B 0 24.3TiB 26 default.rgw.buckets.extra 205 0B 0 24.3TiB 0 default.rgw.log 206 16.7KiB 0 24.3TiB 1319 default.rgw.meta 207 3.67KiB 0 24.3TiB 19 default.rgw.intent-log 208 0B 0 24.3TiB 0 default.rgw.usage 209 0B 0 24.3TiB 0 default.rgw.users 210 0B 0 24.3TiB 0 default.rgw.users.email 211 0B 0 24.3TiB 0 default.rgw.users.swift 212 0B 0 24.3TiB 0 default.rgw.users.uid 213 0B 0 24.3TiB 0 site1.rgw.log 217 0B 0 24.3TiB 3 site1.rgw.meta 218 0B 0 24.3TiB 0 I am suspecting what could have happened here. When converting from a single-site to multi-site it looks like old default zonegroup "23d95fab-b12e-44e4-b4d0-3426c5488433" where the old buckets/containers were created was removed instead of renaming it and a new zonegroup was created and it got new id - "32e0014b-0888-47a4-8b66-c306854477f9". Then zone was renamed because that still has old pool names etc. Can you confirm if zonegroup was renamed or newly created? We can fix older buckets metadata to replace new zonegroup id and it should work. We have an official way documented to convert single-site to multi-site - https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/object_gateway_guide_for_red_hat_enterprise_linux/index#migrating-a-single-site-system-to-multi-site-rgw. Did we follow this procedure? Tim did the work, ran the playbooks and followed the manual steps in the documentation He is on PTO the remainder of this week but we can check with him when he returns. What cmd sequences would be needed to remedy the misconfiguration? thanks, John Thanks, John. You can following steps to fix the old buckets. 1. Download the bucket instance. radosgw-admin metadata get --metadata-key bucket.instance:<bucket name>:<bucket id> > <bucket name>.instance.json 2. Take the backup of current metadata instnace to <bucket name>.instance.backup.json cp <bucket name>.instance.json <bucket name>.instance.backup.json 3. Update/Edit <bucket name>.instance.json to new zonegroup id: 32e0014b-0888-47a4-8b66-c306854477f9 4. Remove the old zonegroup configuration metadata-key for this bucket instance radosgw-admin metadata rm --rgw-cache-enabled=false --metadata-key bucket.instance:<bucket name>:<bucket id> 5. Put it back the updated bucket instance <bucket name>.instance.json file which has new zonegroup id: 32e0014b-0888-47a4-8b66-c306854477f9. radosgw-admin metadata put --metadata-key bucket.instance:<bucket name>:<bucket id> < <bucket name>.instance.json Hope the above steps will help. If any issues let me know. (In reply to John Harrigan from comment #3) > Tim did the work, ran the playbooks and followed the manual steps in the > documentation Once we realized the playbook created its own site1 pools even though our existing data to replicate was in defaults pools, I deleted the 'site1' zone (including site1.* pools) as well as its zonegroup and realm. Then I followed the 'migrate-single-to-multisite' docs [1] to turn the 'default' zone and zonegroup into the master zone/zonegroup of a multisite configuration. The commands executed are included below. radosgw-admin zone delete --rgw-zone=site1 radosgw-admin zonegroup delete --rgw-zonegroup=cloud07 radosgw-admin realm delete --rgw-realm=scaleLTA for i in control meta log ; do ceph osd pool delete site1.rgw.$i site1.rgw.$i \ --yes-i-really-really-mean-it done radosgw-admin realm create --rgw-realm=scaleLTA --default radosgw-admin zonegroup rename --rgw-zonegroup default --zonegroup-new-name=cloud07 radosgw-admin zone rename --rgw-zone default --zone-new-name site1 \ --rgw-zonegroup=cloud07 radosgw-admin zonegroup modify --rgw-realm=scaleLTA \ --rgw-zonegroup=cloud07 --endpoints http://f22-h01-000-6048r.rdu2.scalelab.redhat.com:8080,http://f22-h05-000-6048r.rdu2.scalela b.redhat.com:8080,http://f22-h09-000-6048r.rdu2.scalelab.redhat.com:8080,http://f22-h13-000-6048r.r du2.scalelab.redhat.com:8080 --master --default radosgw-admin user create --uid="synchronization-user" --display-name="Synchronization User" --system --rgw-realm=scaleLTA --rgw-zonegroup=cloud07 --rgw-zone=site1 radosgw-admin zone modify --rgw-realm=scaleLTA --rgw-zonegroup=cloud07 --rgw-zone=site1 --endpoints http://f22-h01-000-6048r.rdu2.scalelab.redhat.com:8080,http://f22-h05-000-6048r.rdu2.scalelab.redhat.com:8080,http://f22-h09-000-6048r.rdu2.scalelab.redhat.com:8080,http://f22-h13-000-6048r.rdu2.scalelab.redhat.com:8080 --access-key=5OG4L70N0TTDB8R89E78 --secret=3L6WIamMrwFggSJfrViY6RsPeADazxCYppw1cXfd --master --default radosgw-admin period update --commit systemctl restart ceph-radosgw@rgw.`hostname -s` # on each site1 RGW [1] http://docs.ceph.com/docs/master/radosgw/multisite/#migrating-a-single-site-system-to-multi-site Had a discussion with Casey. If we follow the document[1] the procedure is safe we are not sure which part of ansible run did this as converting RGW single site to multi-site is still not supported by ceph-ansible. Tim/John - maybe you want to create a feature request for that? [1] https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/object_gateway_guide_for_red_hat_enterprise_linux/index#migrating-a-single-site-system-to-multi-site-rgw As this is an issue caused by miss-configuration we are closing it as not a bug. |