Bug 1351137

Summary: Data stopped syncing from master to non-master while uploading objects to both the zones simultaneously
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: shilpa <smanjara>
Component: RGWAssignee: Yehuda Sadeh <yehuda>
Status: CLOSED ERRATA QA Contact: shilpa <smanjara>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 2.0CC: cbodley, ceph-eng-bugs, ceph-qe-bugs, hnallurv, kbader, kdreyer, mbenjamin, owasserm, smanjara, sweil
Target Milestone: rc   
Target Release: 2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-10.2.2-15.el7cp Ubuntu: ceph_10.2.2-11redhat1xenial Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-23 19:43:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
boto script to create workload on magna115 none

Description shilpa 2016-06-29 10:34:57 UTC
Description of problem:
Create 100 buckets and upload object in each bucket on both rgw zones in parallel. The objects sync successfully to master zone whereas on the non-master zone, the buckets are synced from master but not the objects

Version-Release number of selected component (if applicable):
ceph-radosgw-10.2.2-5.el7cp.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Start a boto script on each zone to create and upload 100 buckets with an object of 1.5G in each bucket.
2. All objects and buckets are synced to master but only buckets are synced to non-master zone. No objects are created. 

Actual results:

On master:

]# radosgw-admin sync status --rgw-zone=us-1 --debug-rgw=0 --debug-ms=0          realm 1c60c863-689d-441f-b370-62390562e2aa (earth)
      zonegroup 540c9b3f-5eb7-4a67-a581-54bc704ce827 (us)
           zone 505a3a8e-19cf-4295-a43d-559e763891f6 (us-1)
  metadata sync no sync (zone is master)
      data sync source: d48cb942-a5fa-4597-89fd-0bab3bb9c5a3 (us-2)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source

On non-master:

# radosgw-admin sync status --rgw-zone=us-2 --debug-rgw=0 --debug-ms=0
          realm 1c60c863-689d-441f-b370-62390562e2aa (earth)
      zonegroup 540c9b3f-5eb7-4a67-a581-54bc704ce827 (us)
           zone d48cb942-a5fa-4597-89fd-0bab3bb9c5a3 (us-2)
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is behind on 1 shards
                oldest incremental change not applied: 2016-06-23 09:57:35.0.097857s
      data sync source: 505a3a8e-19cf-4295-a43d-559e763891f6 (us-1)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is behind on 8 shards
                        oldest incremental change not applied: 2016-06-29 07:34:15.0.194232s


Expected results:
Sync should work from both side simultaneously

Additional info:
Will provide the path to logs

Comment 4 shilpa 2016-06-29 15:41:06 UTC
Created attachment 1174005 [details]
boto script to create workload on magna115

Comment 5 Yehuda Sadeh 2016-06-29 18:40:47 UTC
It seems that data sync stopped following errors in requests from non-master to master (retrieving of all incremental data log shards failed, maybe timing out due to lack of request processing threads on the master).

Comment 13 shilpa 2016-07-14 08:54:44 UTC
Tested and verified on 10.2.2-15. Issues related to segfault in multisite operations are tracked in other BZ's.

Comment 15 errata-xmlrpc 2016-08-23 19:43:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1755.html