Bug 1733612 - multisite sync status incorrect after sync completion
Summary: multisite sync status incorrect after sync completion
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RGW-Multisite
Version: 3.2
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: 4.0
Assignee: shilpa
QA Contact: Tejas
URL:
Whiteboard:
Depends On:
Blocks: 1727980
TreeView+ depends on / blocked
 
Reported: 2019-07-26 18:46 UTC by Tim Wilkinson
Modified: 2019-10-21 14:14 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-21 14:14:22 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Tim Wilkinson 2019-07-26 18:46:33 UTC
Description of problem:
----------------------
After the sync from master to secondary site completes (i.e., all IO stops, no changes to ceph df results on secondary site) both the sync status and the individual bucket sync status report the sync is not complete.



Version-Release number:
----------------------
7.6 (Maipo)   3.10.0-957.el7.x86_64
ceph-base.x86_64   2:12.2.8-128.el7cp



How reproducible:
----------------
consistent



Steps to Reproduce:
------------------
1.  configure secondary site for multisite and see sync start
2.  watch output of 'radosgw-admin sync status', 'ceph df', as well as the
    individual bucket status' throughout sync 
3.  see output of 'radosgw-admin sync status' as well as the individual bucket
    status' after the IO stops and 'ceph df' no longer changes on secondary site



Actual results:
--------------
All IO stops and ceph df reports no more activity

'radosgw-admin sync status' reports incomplete status (see Additional info)

'radosgw-admin bucket sync status <bucket>' reports incomplete status (see Additional info)



Expected results:
----------------
All IO stops and ceph df reports no more activity

'radosgw-admin sync status' reports complettion status (metadata and data), no behind shards 

'radosgw-admin bucket sync status <bucket>' reports complettion status (full sync, incremental sync, bucket is caught up with source)



Additional info:
---------------

# MASTER SITE
# ceph df |egrep 'OBJ|buckets.data'
    NAME                          ID      USED        %USED     MAX AVAIL     OBJECTS  
    default.rgw.buckets.data      203     51.1TiB     49.59       52.0TiB     14207772 




# SECONDARY SITE
# ceph df |egrep 'OBJ|buckets.data'
    NAME                          ID     USED        %USED     MAX AVAIL     OBJECTS  
    default.rgw.buckets.data      10          0B         0       45.8TiB            0 
    site2.rgw.buckets.data        39     51.1TiB     52.78       45.8TiB     14206768 




# SECONDARY SITE
# radosgw-admin sync status
          realm b8494f5e-e2fc-4bf0-be91-16c879fc4cfe (scaleLTA)
      zonegroup 32e0014b-0888-47a4-8b66-c306854477f9 (cloud07)
           zone 9b6d2b94-7872-4328-b481-8b5bd7a58007 (site2)
  metadata sync syncing
                full sync: 64/64 shards
                full sync: 12 entries to sync
                incremental sync: 0/64 shards
                metadata is behind on 64 shards
                behind shards: [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63]
      data sync source: 7aff7505-6f63-406e-a424-240fc5720028 (site1)
                        syncing
                        full sync: 101/128 shards
                        full sync: 0 buckets to sync
                        incremental sync: 27/128 shards
                        data is behind on 101 shards
                        behind shards: [4,5,6,8,9,10,11,12,13,14,19,20,21,22,23,24,25,26,27,28,29,30,31,33,35,36,37,38,39,40,41,42,43,44,45,46,51,52,53,54,55,56,57,58,59,60,61,62,67,68,69,70,71,72,73,74,76,77,78,79,80,81,82,83,84,85,87,88,89,90,91,92,93,94,99,100,101,102,103,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126]




# SECONDARY SITE
# radosgw-admin bucket sync status --bucket=mycontainers5
          realm b8494f5e-e2fc-4bf0-be91-16c879fc4cfe (scaleLTA)
      zonegroup 32e0014b-0888-47a4-8b66-c306854477f9 (cloud07)
           zone 9b6d2b94-7872-4328-b481-8b5bd7a58007 (site2)
         bucket mycontainers5[a43f457e-f9fe-45b7-8f1a-f71fc6607818.174418.3]

    source zone 7aff7505-6f63-406e-a424-240fc5720028 (site1)
                full sync: 0/4 shards
                incremental sync: 4/4 shards
                bucket is caught up with source

Comment 1 Giridhar Ramaraju 2019-08-05 13:09:15 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 2 Giridhar Ramaraju 2019-08-05 13:10:34 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 3 Giridhar Ramaraju 2019-08-20 06:58:03 UTC
Level setting the severity of this defect to "High" with a bulk update. Pls refine it to a more closure value, as defined by the severity definition in https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity

Comment 4 Vikhyat Umrao 2019-08-21 17:06:27 UTC
Had a discussion with Tim. In this bug, the main concern was Tim was watching `ceph df` and it was not changing and sync was behind as we can see in above-provided sync outputs. As we do not have bucket stats from the time of the issue we do not know if the buckets were progressing after some time and when sync caught up did ceph df change and bucket stats changed or not.

After the first report, Tim has not seen it again ... if he will see it again will update the bug. For now, changing severity to medium.

Comment 5 Vikhyat Umrao 2019-08-21 17:14:11 UTC
(In reply to Tim Wilkinson from comment #0)

> 
> 
> # SECONDARY SITE
> # radosgw-admin sync status
>           realm b8494f5e-e2fc-4bf0-be91-16c879fc4cfe (scaleLTA)
>       zonegroup 32e0014b-0888-47a4-8b66-c306854477f9 (cloud07)
>            zone 9b6d2b94-7872-4328-b481-8b5bd7a58007 (site2)
>   metadata sync syncing
>                 full sync: 64/64 shards
>                 full sync: 12 entries to sync
>                 incremental sync: 0/64 shards
>                 metadata is behind on 64 shards
>                 behind shards:
> [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,
> 28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,
> 53,54,55,56,57,58,59,60,61,62,63]
>       data sync source: 7aff7505-6f63-406e-a424-240fc5720028 (site1)
>                         syncing
>                         full sync: 101/128 shards
>                         full sync: 0 buckets to sync
>                         incremental sync: 27/128 shards
>                         data is behind on 101 shards
>                         behind shards:
> [4,5,6,8,9,10,11,12,13,14,19,20,21,22,23,24,25,26,27,28,29,30,31,33,35,36,37,
> 38,39,40,41,42,43,44,45,46,51,52,53,54,55,56,57,58,59,60,61,62,67,68,69,70,
> 71,72,73,74,76,77,78,79,80,81,82,83,84,85,87,88,89,90,91,92,93,94,99,100,101,
> 102,103,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,
> 122,123,124,125,126]
> 
> 
> 
> 
> # SECONDARY SITE
> # radosgw-admin bucket sync status --bucket=mycontainers5
>           realm b8494f5e-e2fc-4bf0-be91-16c879fc4cfe (scaleLTA)
>       zonegroup 32e0014b-0888-47a4-8b66-c306854477f9 (cloud07)
>            zone 9b6d2b94-7872-4328-b481-8b5bd7a58007 (site2)
>          bucket mycontainers5[a43f457e-f9fe-45b7-8f1a-f71fc6607818.174418.3]
> 
>     source zone 7aff7505-6f63-406e-a424-240fc5720028 (site1)
>                 full sync: 0/4 shards
>                 incremental sync: 4/4 shards
>                 bucket is caught up with source

Also when the global sync status is behind and bucket status show caught up - https://bugzilla.redhat.com/show_bug.cgi?id=1731554 this behavior could be because of this bug if the bucket was not synced but we are not sure if this particular bucket was synced or not as we do not have bucket sync stats from the time of the issue.


Note You need to log in before you can comment on or make changes to this bug.