Bug 2181646

Summary: Multi-site replication not working after upgraded to RHCS 5.3 (both primary/secondary at 5.3), stuck on one shard
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Brett Hull <bhull>
Component: RGW-MultisiteAssignee: shilpa <smanjara>
Status: CLOSED DUPLICATE QA Contact: Madhavi Kasturi <mkasturi>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 5.3CC: bhull, cbodley, ceph-eng-bugs, cephqe-warriors, ckulal, hklein, mbenjamin, roemerso, smanjara
Target Milestone: ---   
Target Release: 6.1z1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-07-17 17:56:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Brett Hull 2023-03-24 20:14:24 UTC
Description of problem:
Customer upgraded both primary and secondary from RHCS4 to 5.3z1

Now sync is not progressing (secondary). The Shard which is not moving, is odd.

radosgw-admin data sync status --shard-id=19 --source-zone=s3r-tls
{
    "shard_id": 19,
    "marker": {
        "status": "incremental-sync",
        "marker": "1_1679675534.725362_139688646.1",
        "next_step_marker": "",
        "total_entries": 1,
        "pos": 0,
        "timestamp": "2023-03-24T16:32:14.725362Z"
    },
    "pending_buckets": [],
    "recovering_buckets": [
        "<80>\u0001\u0001Z\u0000\u0000\u0000\n\nG\u0000\u0000\u0000\u0006\u0000\u0000\u0000prod-t\u0000\u0000\u0000\u00000\u0000\u0000\u0000186ce0a8-0b74-494e-b1dd-aec059bd6eb2.166512982.5\u0000\u0000\u0000\u0000\u0000<FF><FF><FF><FF>\u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000"
    ]
}


Version-Release number of selected component (if applicable):
Multi-Site v5.3z1

How reproducible:
Unknown if we can reproduce it, but the state is constant. 

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
primary site   - tls
secondary site - idf

sync-status-tls
::::::::::::::
          realm 4bbafb70-8061-41b9-b177-0a44e54da08d (inra)
      zonegroup bd3982f3-93ae-466a-8a3b-99f06b0d1391 (georeplicated)
           zone 186ce0a8-0b74-494e-b1dd-aec059bd6eb2 (s3r-tls)
   current time 2023-03-24T16:07:43Z
zonegroup features enabled: 
                   disabled: resharding
  metadata sync no sync (zone is master)
      data sync source: 73d50a43-a638-4e33-a38d-dc45348040bc (s3r-idf)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source

sync-status-idf
::::::::::::::
          realm 4bbafb70-8061-41b9-b177-0a44e54da08d (inra)
      zonegroup bd3982f3-93ae-466a-8a3b-99f06b0d1391 (georeplicated)
           zone 73d50a43-a638-4e33-a38d-dc45348040bc (s3r-idf)
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is caught up with master
      data sync source: 186ce0a8-0b74-494e-b1dd-aec059bd6eb2 (s3r-tls)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is behind on 10 shards
                        behind shards: [2,14,15,30,56,58,94,109,116,119]
                        oldest incremental change not applied: 2023-03-24T17:07:54.637197+0100 [15]
                        1 shards are recovering
                        recovering shards: [19]

Data on supportshell - 03467528

Comment 29 Matt Benjamin (redhat) 2023-07-17 17:56:54 UTC

*** This bug has been marked as a duplicate of bug 2188022 ***

Comment 30 Red Hat Bugzilla 2023-11-15 04:25:14 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days