Bug 1733612
| Summary: | multisite sync status incorrect after sync completion | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Tim Wilkinson <twilkins> |
| Component: | RGW-Multisite | Assignee: | shilpa <smanjara> |
| Status: | CLOSED NOTABUG | QA Contact: | Tejas <tchandra> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.2 | CC: | assingh, ceph-eng-bugs, ceph-qe-bugs, jharriga, kdreyer, mbenjamin, vumrao |
| Target Milestone: | rc | ||
| Target Release: | 4.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-10-21 14:14:22 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1727980 | ||
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. Regards, Giri Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. Regards, Giri Level setting the severity of this defect to "High" with a bulk update. Pls refine it to a more closure value, as defined by the severity definition in https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity Had a discussion with Tim. In this bug, the main concern was Tim was watching `ceph df` and it was not changing and sync was behind as we can see in above-provided sync outputs. As we do not have bucket stats from the time of the issue we do not know if the buckets were progressing after some time and when sync caught up did ceph df change and bucket stats changed or not. After the first report, Tim has not seen it again ... if he will see it again will update the bug. For now, changing severity to medium. (In reply to Tim Wilkinson from comment #0) > > > # SECONDARY SITE > # radosgw-admin sync status > realm b8494f5e-e2fc-4bf0-be91-16c879fc4cfe (scaleLTA) > zonegroup 32e0014b-0888-47a4-8b66-c306854477f9 (cloud07) > zone 9b6d2b94-7872-4328-b481-8b5bd7a58007 (site2) > metadata sync syncing > full sync: 64/64 shards > full sync: 12 entries to sync > incremental sync: 0/64 shards > metadata is behind on 64 shards > behind shards: > [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27, > 28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52, > 53,54,55,56,57,58,59,60,61,62,63] > data sync source: 7aff7505-6f63-406e-a424-240fc5720028 (site1) > syncing > full sync: 101/128 shards > full sync: 0 buckets to sync > incremental sync: 27/128 shards > data is behind on 101 shards > behind shards: > [4,5,6,8,9,10,11,12,13,14,19,20,21,22,23,24,25,26,27,28,29,30,31,33,35,36,37, > 38,39,40,41,42,43,44,45,46,51,52,53,54,55,56,57,58,59,60,61,62,67,68,69,70, > 71,72,73,74,76,77,78,79,80,81,82,83,84,85,87,88,89,90,91,92,93,94,99,100,101, > 102,103,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121, > 122,123,124,125,126] > > > > > # SECONDARY SITE > # radosgw-admin bucket sync status --bucket=mycontainers5 > realm b8494f5e-e2fc-4bf0-be91-16c879fc4cfe (scaleLTA) > zonegroup 32e0014b-0888-47a4-8b66-c306854477f9 (cloud07) > zone 9b6d2b94-7872-4328-b481-8b5bd7a58007 (site2) > bucket mycontainers5[a43f457e-f9fe-45b7-8f1a-f71fc6607818.174418.3] > > source zone 7aff7505-6f63-406e-a424-240fc5720028 (site1) > full sync: 0/4 shards > incremental sync: 4/4 shards > bucket is caught up with source Also when the global sync status is behind and bucket status show caught up - https://bugzilla.redhat.com/show_bug.cgi?id=1731554 this behavior could be because of this bug if the bucket was not synced but we are not sure if this particular bucket was synced or not as we do not have bucket sync stats from the time of the issue. |
Description of problem: ---------------------- After the sync from master to secondary site completes (i.e., all IO stops, no changes to ceph df results on secondary site) both the sync status and the individual bucket sync status report the sync is not complete. Version-Release number: ---------------------- 7.6 (Maipo) 3.10.0-957.el7.x86_64 ceph-base.x86_64 2:12.2.8-128.el7cp How reproducible: ---------------- consistent Steps to Reproduce: ------------------ 1. configure secondary site for multisite and see sync start 2. watch output of 'radosgw-admin sync status', 'ceph df', as well as the individual bucket status' throughout sync 3. see output of 'radosgw-admin sync status' as well as the individual bucket status' after the IO stops and 'ceph df' no longer changes on secondary site Actual results: -------------- All IO stops and ceph df reports no more activity 'radosgw-admin sync status' reports incomplete status (see Additional info) 'radosgw-admin bucket sync status <bucket>' reports incomplete status (see Additional info) Expected results: ---------------- All IO stops and ceph df reports no more activity 'radosgw-admin sync status' reports complettion status (metadata and data), no behind shards 'radosgw-admin bucket sync status <bucket>' reports complettion status (full sync, incremental sync, bucket is caught up with source) Additional info: --------------- # MASTER SITE # ceph df |egrep 'OBJ|buckets.data' NAME ID USED %USED MAX AVAIL OBJECTS default.rgw.buckets.data 203 51.1TiB 49.59 52.0TiB 14207772 # SECONDARY SITE # ceph df |egrep 'OBJ|buckets.data' NAME ID USED %USED MAX AVAIL OBJECTS default.rgw.buckets.data 10 0B 0 45.8TiB 0 site2.rgw.buckets.data 39 51.1TiB 52.78 45.8TiB 14206768 # SECONDARY SITE # radosgw-admin sync status realm b8494f5e-e2fc-4bf0-be91-16c879fc4cfe (scaleLTA) zonegroup 32e0014b-0888-47a4-8b66-c306854477f9 (cloud07) zone 9b6d2b94-7872-4328-b481-8b5bd7a58007 (site2) metadata sync syncing full sync: 64/64 shards full sync: 12 entries to sync incremental sync: 0/64 shards metadata is behind on 64 shards behind shards: [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63] data sync source: 7aff7505-6f63-406e-a424-240fc5720028 (site1) syncing full sync: 101/128 shards full sync: 0 buckets to sync incremental sync: 27/128 shards data is behind on 101 shards behind shards: [4,5,6,8,9,10,11,12,13,14,19,20,21,22,23,24,25,26,27,28,29,30,31,33,35,36,37,38,39,40,41,42,43,44,45,46,51,52,53,54,55,56,57,58,59,60,61,62,67,68,69,70,71,72,73,74,76,77,78,79,80,81,82,83,84,85,87,88,89,90,91,92,93,94,99,100,101,102,103,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126] # SECONDARY SITE # radosgw-admin bucket sync status --bucket=mycontainers5 realm b8494f5e-e2fc-4bf0-be91-16c879fc4cfe (scaleLTA) zonegroup 32e0014b-0888-47a4-8b66-c306854477f9 (cloud07) zone 9b6d2b94-7872-4328-b481-8b5bd7a58007 (site2) bucket mycontainers5[a43f457e-f9fe-45b7-8f1a-f71fc6607818.174418.3] source zone 7aff7505-6f63-406e-a424-240fc5720028 (site1) full sync: 0/4 shards incremental sync: 4/4 shards bucket is caught up with source