Bug 1567192

Summary: Multisite data sync inconsistent when PUTs race with DELETEs
Product: Red Hat Ceph Storage Reporter: Casey Bodley <cbodley>
Component: RGW-MultisiteAssignee: Matt Benjamin (redhat) <mbenjamin>
Status: CLOSED CURRENTRELEASE QA Contact: Tejas <tchandra>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.0CC: ceph-eng-bugs, ceph-qe-bugs, owasserm, pasik
Target Milestone: rc   
Target Release: 3.*   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-14 14:58:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Casey Bodley 2018-04-13 14:20:46 UTC
Description of problem:

In a multisite configuration where PUTs and DELETEs on the same object race to complete on one zone, other zones are unable to resolve that race during data sync.

This happens when DELETE loses this race (getting an ECANCELED error from the osd), because it writes a successful completion entry to the bucket index log instead of a canceled entry.

How reproducible:

Fairly reproducible in workloads that PUT and DELETE to the same object. The upstream tracker issue http://tracker.ceph.com/issues/22804 includes a cosbench workload that reproduces

Actual results:

Source and destination zones contain a different set of objects after data sync completes.

Expected results:

Source and destination zones contain the same set of objects after data sync.

Comment 3 Matt Benjamin (redhat) 2019-08-14 14:59:16 UTC
*** Bug 1567938 has been marked as a duplicate of this bug. ***