1527132 – sync.error-log objects fill up with temporary EBUSY errors

Bug 1527132 - sync.error-log objects fill up with temporary EBUSY errors

Summary: sync.error-log objects fill up with temporary EBUSY errors

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RGW-Multisite
Sub Component:
Version:	3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	2.5
Assignee:	Casey Bodley
QA Contact:	Tejas
Docs Contact:
URL:
Whiteboard:
Depends On:	1530665
Blocks:
TreeView+	depends on / blocked

Reported:	2017-12-18 15:49 UTC by Casey Bodley
Modified:	2022-02-21 18:01 UTC (History)
CC List:	7 users (show)
Fixed In Version:	RHEL: ceph-10.2.10-9.el7cp Ubuntu: ceph_10.2.10-6redhat1xenial
Doc Type:	No Doc Update
Doc Text:	undefined
Clone Of:
Clones:	1530665 (view as bug list)
Environment:
Last Closed:	2018-02-21 19:47:28 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	22473	None	None	None	2017-12-18 15:49:57 UTC
Red Hat Issue Tracker	RHCEPH-1535	None	None	None	2021-09-09 13:00:56 UTC
Red Hat Product Errata	RHBA-2018:0340	normal	SHIPPED_LIVE	Red Hat Ceph Storage 2.5 bug fix and enhancement update	2018-02-22 00:50:32 UTC

Description Casey Bodley 2017-12-18 15:49:58 UTC

Description of problem:

Multisite sync encounters temporary EBUSY errors in normal operation, and will gracefully retry the operations until success. These temporary errors get written to the sync.error-log objects (visible via 'radosgw-admin sync error list').

The 'radosgw-admin sync error list' command should only contain actual sync errors that could require admin intervention. Including temporary EBUSY errors only serves to waste space in rados and obscure the more serious sync errors.

Version-Release number of selected component (if applicable): RHCS 2.0 and later


How reproducible:

Easily reproducible, especially with multiple gateways per zone.

Steps to Reproduce:
1. Create a multisite configuration with two zones and two gateways each.
2. On master zone, create a bucket and upload some objects.
3. On secondary zone, wait a few minutes, then run 'radosgw-admin sync error list'.

Actual results:

The output of 'radosgw-admin sync error list' contains errors of the form:

"message": "failed to sync bucket instance: (16) Device or resource busy"

Expected results:

The output of 'radosgw-admin sync error list' should only contain real sync failures that would require admin intervention.

Additional info:

Comment 5 Ken Dreyer (Red Hat) 2018-01-03 15:31:53 UTC

Would you please do the jewel and luminous backport PRs upstream as well so we don't have to carry this patch long-term?

Comment 6 Ken Dreyer (Red Hat) 2018-01-03 15:37:25 UTC

This bug is targeted for RHCEPH 2.5 and this fix is not in RHCEPH 3.

Would you please cherry-pick the change to ceph-3.0-rhel-patches (with the RHCEPH 3 clone ID number, "Resolves: rhbz#1530665") so customers do not experience a regression?

Comment 15 errata-xmlrpc 2018-02-21 19:47:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0340

Note You need to log in before you can comment on or make changes to this bug.