Bug 1608060 - Limit pg log length during recovery/backfill so that we don't run out of memory [NEEDINFO]
Summary: Limit pg log length during recovery/backfill so that we don't run out of memory
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RADOS
Version: 3.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: z1
: 3.2
Assignee: Neha Ojha
QA Contact: Parikshith
Bara Ancincova
: 1644409 (view as bug list)
Depends On:
Blocks: 1584264 1644409 1673654
TreeView+ depends on / blocked
Reported: 2018-07-24 21:31 UTC by Neha Ojha
Modified: 2019-02-07 17:14 UTC (History)
12 users (show)

Fixed In Version: RHEL: ceph-12.2.8-66.el7cp Ubuntu: ceph_12.2.8-52redhat1xenial
Doc Type: Bug Fix
Doc Text:
.PG log length is now limited Previously, the `osd_max_pg_log_entries` option did not set a hard limit for the placement group (PG) log length. Consequently, during recovery and backfill, the log could grow significantly and consume a lot of memory, in some cases even all of it. With this update, a hard limit is set on the number of log entries in the PG log even during recovery and backfill. A corner case, where it might be hard to limit the PG log length, is on erasure-coded pools, when the rollback information on some of replicas is too old for some reason. A flag called pglog_hardlimit has been introduced. It is off by default. This flag enables the feature that limits the length of the pg log. Users should run 'ceph osd set pglog_hardlimit' after a complete upgrade is over. Once all the OSDs have this flag set, the length of the pg log will be capped by a hard limit. This flag should not be unset.
Clone Of:
: 1644409 1673654 (view as bug list)
Last Closed: 2019-02-07 17:14:29 UTC
Target Upstream Version:
tserlin: needinfo? (tchandra)

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Ceph Project Bug Tracker 23979 None None None 2018-07-24 21:31:11 UTC
Github ceph ceph pull 25949 'None' 'closed' 'luminous: osd/mon: pg log hard limit with upgrades fixed' 2019-11-13 12:04:13 UTC
Github https://github.com/ceph ceph pull 23211 None None None 2019-11-13 12:04:13 UTC

Description Neha Ojha 2018-07-24 21:31:12 UTC
Description of problem: osd_max_pg_log_entries is not a hard upper limit for the pg log length. During recovery/backfill the pg log may end up growing considerably, and using a lot of memory.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

Comment 3 Neha Ojha 2018-07-24 23:51:51 UTC
For QA purposes:

The length of the pg log can be observed from the output of ceph pg dump.
It can also be viewed in the OSD logs as "approx pg log length = ".

Comment 4 Neha Ojha 2018-09-14 00:57:15 UTC
Pushed changes to ceph-3.2-rhel-patches

Comment 14 Neha Ojha 2018-10-30 21:13:15 UTC
Hi Bara,

I have fixed a typo and added some information to the Doc Text.


Comment 17 Neha Ojha 2018-11-05 23:48:43 UTC
Moving this to z1 in light of http://tracker.ceph.com/issues/36686.

Comment 23 errata-xmlrpc 2019-01-03 19:01:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Comment 24 Vikhyat Umrao 2019-01-08 00:32:27 UTC
*** Bug 1644409 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.