1379835 – [RFE] [rbd-mirror] - optionally unregister "laggy" journal clients

Bug 1379835 - [RFE] [rbd-mirror] - optionally unregister "laggy" journal clients

Summary: [RFE] [rbd-mirror] - optionally unregister "laggy" journal clients

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RBD
Sub Component:
Version:	2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	rc
Target Release:	2.1
Assignee:	Jason Dillaman
QA Contact:	Rachana Patel
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1365648
TreeView+	depends on / blocked

Reported:	2016-09-27 20:00 UTC by Jason Dillaman
Modified:	2022-02-21 18:17 UTC (History)
CC List:	5 users (show)
Fixed In Version:	RHEL: ceph-10.2.3-5.el7cp Ubuntu: ceph_10.2.3-6redhat1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-11-22 19:31:11 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	14738	0	None	None	None	2016-09-27 20:00:49 UTC
Red Hat Product Errata	RHSA-2016:2815	0	normal	SHIPPED_LIVE	Moderate: Red Hat Ceph Storage security, bug fix, and enhancement update	2017-03-22 02:06:33 UTC

Description Jason Dillaman 2016-09-27 20:00:49 UTC

Description of problem:
Support an optional configuration setting for the maximum number of object sets a journal client can be behind before it is automatically unregistered. This will protect the journal from growing to an infinite size.

Comment 2 Jason Dillaman 2016-09-27 20:02:18 UTC

Upstream, master branch PR: https://github.com/ceph/ceph/pull/10378

Comment 3 Harish NV Rao 2016-10-03 10:11:28 UTC

(In reply to Jason Dillaman from comment #0)
> Description of problem:
> Support an optional configuration setting for the maximum number of object
> sets a journal client can be behind before it is automatically unregistered.
1) Where and how this value is set?
2) Is the value persistent across reboot/restarts?
3) Is there a default value?
4) What is the range (lower and upper limit)?
5) how to induce the condition in which a journal client is 'laggy'

Comment 4 Jason Dillaman 2016-10-03 13:30:20 UTC

1) This is controlled by the config option "rbd_journal_max_concurrent_object_sets".  By default, it is set to zero (disabled). This indicates how many journal object sets (each object set is comprised of splay width number of journal data objects) can exist before the laggy rbd-mirror client is disconnected (and the object sets are pruned). 
2) It's a config option, so yes.
3) Yes, zero (disabled)
4) Zero to max uint64_t (effectively infinity)
6) Mirror and image to a secondary cluster, disable its rbd-mirror daemon so that journal events aren't being consumed. Inject enough IO data into the primary image to overflow the "concurrent object set" limit. You can use "rbd journal status --image <image name>" to see the active and minimum object set number. After overflow, it should lazily start trimming old object sets and the minimum object set number will start to converge with the active.

Comment 5 Jason Dillaman 2016-10-03 13:31:25 UTC

Upstream test case: https://github.com/ceph/ceph/blob/master/qa/workunits/rbd/rbd_mirror.sh#L298

Comment 11 errata-xmlrpc 2016-11-22 19:31:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2815.html

Note You need to log in before you can comment on or make changes to this bug.