Bug 1527132

Summary: sync.error-log objects fill up with temporary EBUSY errors
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Casey Bodley <cbodley>
Component: RGW-MultisiteAssignee: Casey Bodley <cbodley>
Status: CLOSED ERRATA QA Contact: Tejas <tchandra>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.0CC: agunn, cbodley, ceph-eng-bugs, ceph-qe-bugs, hnallurv, kdreyer, mbenjamin
Target Milestone: rc   
Target Release: 2.5   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-10.2.10-9.el7cp Ubuntu: ceph_10.2.10-6redhat1xenial Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of:
: 1530665 (view as bug list) Environment:
Last Closed: 2018-02-21 19:47:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1530665    
Bug Blocks:    

Description Casey Bodley 2017-12-18 15:49:58 UTC
Description of problem:

Multisite sync encounters temporary EBUSY errors in normal operation, and will gracefully retry the operations until success. These temporary errors get written to the sync.error-log objects (visible via 'radosgw-admin sync error list').

The 'radosgw-admin sync error list' command should only contain actual sync errors that could require admin intervention. Including temporary EBUSY errors only serves to waste space in rados and obscure the more serious sync errors.

Version-Release number of selected component (if applicable): RHCS 2.0 and later


How reproducible:

Easily reproducible, especially with multiple gateways per zone.

Steps to Reproduce:
1. Create a multisite configuration with two zones and two gateways each.
2. On master zone, create a bucket and upload some objects.
3. On secondary zone, wait a few minutes, then run 'radosgw-admin sync error list'.

Actual results:

The output of 'radosgw-admin sync error list' contains errors of the form:

"message": "failed to sync bucket instance: (16) Device or resource busy"

Expected results:

The output of 'radosgw-admin sync error list' should only contain real sync failures that would require admin intervention.

Additional info:

Comment 5 Ken Dreyer (Red Hat) 2018-01-03 15:31:53 UTC
Would you please do the jewel and luminous backport PRs upstream as well so we don't have to carry this patch long-term?

Comment 6 Ken Dreyer (Red Hat) 2018-01-03 15:37:25 UTC
This bug is targeted for RHCEPH 2.5 and this fix is not in RHCEPH 3.

Would you please cherry-pick the change to ceph-3.0-rhel-patches (with the RHCEPH 3 clone ID number, "Resolves: rhbz#1530665") so customers do not experience a regression?

Comment 15 errata-xmlrpc 2018-02-21 19:47:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0340