Bug 1763257 - rgw-multisite: log trimming does not make progress unless zones 'sync_from_all'
Summary: rgw-multisite: log trimming does not make progress unless zones 'sync_from_all'
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RGW-Multisite
Version: 4.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: rc
: 4.0
Assignee: Casey Bodley
QA Contact: Vidushi Mishra
Depends On:
TreeView+ depends on / blocked
Reported: 2019-10-18 15:18 UTC by Casey Bodley
Modified: 2020-10-12 04:39 UTC (History)
4 users (show)

Fixed In Version: ceph-14.2.4-31.el8cp, ceph-14.2.4-2.el7cp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2020-01-31 12:47:55 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 39487 0 None None None 2019-10-18 15:18:16 UTC
Github ceph ceph pull 27794 0 'None' closed rgw: data/bilogs are trimmed when no peers are reading them 2020-10-12 04:38:07 UTC
Red Hat Product Errata RHBA-2020:0312 0 None None None 2020-01-31 12:48:02 UTC

Description Casey Bodley 2019-10-18 15:18:16 UTC
Description of problem:

The trimming process for data logs and bucket index logs relies on querying the sync status of peer zones to determine how much of your log is safe to trim. Log entries are only safe to trim if all peer zones report a sync status marker that is larger than the given log entry.

The default zone configuration sets 'sync_from_all=true', meaning that it syncs data from each peer zone in its zonegroup. A zone can also be configured to only 'sync_from' a subset of the zonegroup. When such a zone does not sync from one of its peers, it will return emtpy markers when that peer requests its sync status. This will prevent the peer zone from making progress in trimming its data logs and bucket index logs.

Version-Release number of selected component (if applicable):

How reproducible:

Whenever a zone is configured to -not- sync from one of its peer zones.

Steps to Reproduce:
1. Create a multisite configuration with two zones 'a' and 'b'.

2. On the primary cluster, modify zone 'a' to not sync from zone 'b':
$ radosgw-admin zone modify --rgw-zone a --sync-from-all=0
$ radosgw-admin period update --commit

3. On the primary cluster, create a bucket 'bucket' and upload some objects.

4. Verify that the objects sync to the secondary zone and that sync status catches up.

5. Wait for at least rgw_sync_log_trim_interval (default 20min)

6. List the data log and bucket index log on each zone:
$ radosgw-admin datalog list
$ radosgw-admin bilog list --bucket bucket

Actual results:

The logs on zone 'a' are empty, but the logs on zone 'b' are not.

Expected results:

The logs on both zones are empty.

Additional info:

Comment 1 RHEL Program Management 2019-10-18 15:18:22 UTC
Please specify the severity of this bug. Severity is defined here:

Comment 2 Casey Bodley 2019-10-18 15:19:00 UTC
this was fixed for 3.2z2 in https://bugzilla.redhat.com/show_bug.cgi?id=1699478

Comment 9 Yaniv Kaul 2020-01-08 13:53:08 UTC
Are you looking into the failure?

Comment 13 errata-xmlrpc 2020-01-31 12:47:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.