Bug 1379835

Summary:	[RFE] [rbd-mirror] - optionally unregister "laggy" journal clients
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Jason Dillaman <jdillama>
Component:	RBD	Assignee:	Jason Dillaman <jdillama>
Status:	CLOSED ERRATA	QA Contact:	Rachana Patel <racpatel>
Severity:	medium	Docs Contact:
Priority:	high
Version:	2.0	CC:	ceph-eng-bugs, hnallurv, jdillama, tserlin, uboppana
Target Milestone:	rc	Keywords:	FutureFeature
Target Release:	2.1
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	RHEL: ceph-10.2.3-5.el7cp Ubuntu: ceph_10.2.3-6redhat1	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-11-22 19:31:11 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1365648

Description Jason Dillaman 2016-09-27 20:00:49 UTC

Description of problem:
Support an optional configuration setting for the maximum number of object sets a journal client can be behind before it is automatically unregistered. This will protect the journal from growing to an infinite size.

Comment 2 Jason Dillaman 2016-09-27 20:02:18 UTC

Upstream, master branch PR: https://github.com/ceph/ceph/pull/10378

Comment 3 Harish NV Rao 2016-10-03 10:11:28 UTC

(In reply to Jason Dillaman from comment #0)
> Description of problem:
> Support an optional configuration setting for the maximum number of object
> sets a journal client can be behind before it is automatically unregistered.
1) Where and how this value is set?
2) Is the value persistent across reboot/restarts?
3) Is there a default value?
4) What is the range (lower and upper limit)?
5) how to induce the condition in which a journal client is 'laggy'

Comment 4 Jason Dillaman 2016-10-03 13:30:20 UTC

1) This is controlled by the config option "rbd_journal_max_concurrent_object_sets".  By default, it is set to zero (disabled). This indicates how many journal object sets (each object set is comprised of splay width number of journal data objects) can exist before the laggy rbd-mirror client is disconnected (and the object sets are pruned). 
2) It's a config option, so yes.
3) Yes, zero (disabled)
4) Zero to max uint64_t (effectively infinity)
6) Mirror and image to a secondary cluster, disable its rbd-mirror daemon so that journal events aren't being consumed. Inject enough IO data into the primary image to overflow the "concurrent object set" limit. You can use "rbd journal status --image <image name>" to see the active and minimum object set number. After overflow, it should lazily start trimming old object sets and the minimum object set number will start to converge with the active.

Comment 5 Jason Dillaman 2016-10-03 13:31:25 UTC

Upstream test case: https://github.com/ceph/ceph/blob/master/qa/workunits/rbd/rbd_mirror.sh#L298

Comment 11 errata-xmlrpc 2016-11-22 19:31:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2815.html