Bug 1379835 - [RFE] [rbd-mirror] - optionally unregister "laggy" journal clients
Summary: [RFE] [rbd-mirror] - optionally unregister "laggy" journal clients
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RBD
Version: 2.0
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: rc
: 2.1
Assignee: Jason Dillaman
QA Contact: Rachana Patel
URL:
Whiteboard:
Depends On:
Blocks: 1365648
TreeView+ depends on / blocked
 
Reported: 2016-09-27 20:00 UTC by Jason Dillaman
Modified: 2017-07-30 15:32 UTC (History)
5 users (show)

Fixed In Version: RHEL: ceph-10.2.3-5.el7cp Ubuntu: ceph_10.2.3-6redhat1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-22 19:31:11 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2016:2815 normal SHIPPED_LIVE Moderate: Red Hat Ceph Storage security, bug fix, and enhancement update 2017-03-22 02:06:33 UTC
Ceph Project Bug Tracker 14738 None None None 2016-09-27 20:00:49 UTC

Description Jason Dillaman 2016-09-27 20:00:49 UTC
Description of problem:
Support an optional configuration setting for the maximum number of object sets a journal client can be behind before it is automatically unregistered. This will protect the journal from growing to an infinite size.

Comment 2 Jason Dillaman 2016-09-27 20:02:18 UTC
Upstream, master branch PR: https://github.com/ceph/ceph/pull/10378

Comment 3 Harish NV Rao 2016-10-03 10:11:28 UTC
(In reply to Jason Dillaman from comment #0)
> Description of problem:
> Support an optional configuration setting for the maximum number of object
> sets a journal client can be behind before it is automatically unregistered.
1) Where and how this value is set?
2) Is the value persistent across reboot/restarts?
3) Is there a default value?
4) What is the range (lower and upper limit)?
5) how to induce the condition in which a journal client is 'laggy'

Comment 4 Jason Dillaman 2016-10-03 13:30:20 UTC
1) This is controlled by the config option "rbd_journal_max_concurrent_object_sets".  By default, it is set to zero (disabled). This indicates how many journal object sets (each object set is comprised of splay width number of journal data objects) can exist before the laggy rbd-mirror client is disconnected (and the object sets are pruned). 
2) It's a config option, so yes.
3) Yes, zero (disabled)
4) Zero to max uint64_t (effectively infinity)
6) Mirror and image to a secondary cluster, disable its rbd-mirror daemon so that journal events aren't being consumed. Inject enough IO data into the primary image to overflow the "concurrent object set" limit. You can use "rbd journal status --image <image name>" to see the active and minimum object set number. After overflow, it should lazily start trimming old object sets and the minimum object set number will start to converge with the active.

Comment 5 Jason Dillaman 2016-10-03 13:31:25 UTC
Upstream test case: https://github.com/ceph/ceph/blob/master/qa/workunits/rbd/rbd_mirror.sh#L298

Comment 11 errata-xmlrpc 2016-11-22 19:31:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2815.html


Note You need to log in before you can comment on or make changes to this bug.