Bug 1763257

Summary:	rgw-multisite: log trimming does not make progress unless zones 'sync_from_all'
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Casey Bodley <cbodley>
Component:	RGW-Multisite	Assignee:	Casey Bodley <cbodley>
Status:	CLOSED ERRATA	QA Contact:	Vidushi Mishra <vimishra>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	4.0	CC:	ceph-eng-bugs, ceph-qe-bugs, tserlin, vimishra
Target Milestone:	rc
Target Release:	4.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	ceph-14.2.4-31.el8cp, ceph-14.2.4-2.el7cp	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-01-31 12:47:55 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Casey Bodley 2019-10-18 15:18:16 UTC

Description of problem:

The trimming process for data logs and bucket index logs relies on querying the sync status of peer zones to determine how much of your log is safe to trim. Log entries are only safe to trim if all peer zones report a sync status marker that is larger than the given log entry.

The default zone configuration sets 'sync_from_all=true', meaning that it syncs data from each peer zone in its zonegroup. A zone can also be configured to only 'sync_from' a subset of the zonegroup. When such a zone does not sync from one of its peers, it will return emtpy markers when that peer requests its sync status. This will prevent the peer zone from making progress in trimming its data logs and bucket index logs.

Version-Release number of selected component (if applicable):

How reproducible:

Whenever a zone is configured to -not- sync from one of its peer zones.

Steps to Reproduce:
1. Create a multisite configuration with two zones 'a' and 'b'.

2. On the primary cluster, modify zone 'a' to not sync from zone 'b':
$ radosgw-admin zone modify --rgw-zone a --sync-from-all=0
$ radosgw-admin period update --commit

3. On the primary cluster, create a bucket 'bucket' and upload some objects.

4. Verify that the objects sync to the secondary zone and that sync status catches up.

5. Wait for at least rgw_sync_log_trim_interval (default 20min)

6. List the data log and bucket index log on each zone:
$ radosgw-admin datalog list
$ radosgw-admin bilog list --bucket bucket

Actual results:

The logs on zone 'a' are empty, but the logs on zone 'b' are not.

Expected results:

The logs on both zones are empty.

Additional info:

Comment 1 RHEL Program Management 2019-10-18 15:18:22 UTC

Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 2 Casey Bodley 2019-10-18 15:19:00 UTC

this was fixed for 3.2z2 in https://bugzilla.redhat.com/show_bug.cgi?id=1699478

Comment 9 Yaniv Kaul 2020-01-08 13:53:08 UTC

Are you looking into the failure?

Comment 13 errata-xmlrpc 2020-01-31 12:47:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0312