Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1608060

Summary: Limit pg log length during recovery/backfill so that we don't run out of memory
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Neha Ojha <nojha>
Component: RADOSAssignee: Neha Ojha <nojha>
Status: CLOSED NEXTRELEASE QA Contact: Parikshith <pbyregow>
Severity: high Docs Contact: Bara Ancincova <bancinco>
Priority: high    
Version: 3.0CC: ceph-eng-bugs, ceph-qe-bugs, dzafman, hnallurv, jdurgin, jquinn, kchai, kdreyer, ngangadh, nojha, tchandra, tserlin, vumrao
Target Milestone: z1Keywords: Reopened
Target Release: 3.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-12.2.8-66.el7cp Ubuntu: ceph_12.2.8-52redhat1xenial Doc Type: Bug Fix
Doc Text:
.PG log length is now limited Previously, the `osd_max_pg_log_entries` option did not set a hard limit for the placement group (PG) log length. Consequently, during recovery and backfill, the log could grow significantly and consume a lot of memory, in some cases even all of it. With this update, a hard limit is set on the number of log entries in the PG log even during recovery and backfill. A corner case, where it might be hard to limit the PG log length, is on erasure-coded pools, when the rollback information on some of replicas is too old for some reason. A flag called pglog_hardlimit has been introduced. It is off by default. This flag enables the feature that limits the length of the pg log. Users should run 'ceph osd set pglog_hardlimit' after a complete upgrade is over. Once all the OSDs have this flag set, the length of the pg log will be capped by a hard limit. This flag should not be unset.
Story Points: ---
Clone Of:
: 1644409 1673654 (view as bug list) Environment:
Last Closed: 2019-02-07 17:14:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1584264, 1644409, 1673654    

Description Neha Ojha 2018-07-24 21:31:12 UTC
Description of problem: osd_max_pg_log_entries is not a hard upper limit for the pg log length. During recovery/backfill the pg log may end up growing considerably, and using a lot of memory.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Neha Ojha 2018-07-24 23:51:51 UTC
For QA purposes:

The length of the pg log can be observed from the output of ceph pg dump.
It can also be viewed in the OSD logs as "approx pg log length = ".

Comment 4 Neha Ojha 2018-09-14 00:57:15 UTC
Pushed changes to ceph-3.2-rhel-patches

Comment 14 Neha Ojha 2018-10-30 21:13:15 UTC
Hi Bara,

I have fixed a typo and added some information to the Doc Text.

Thanks,
Neha

Comment 17 Neha Ojha 2018-11-05 23:48:43 UTC
Moving this to z1 in light of http://tracker.ceph.com/issues/36686.

Comment 23 errata-xmlrpc 2019-01-03 19:01:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0020

Comment 24 Vikhyat Umrao 2019-01-08 00:32:27 UTC
*** Bug 1644409 has been marked as a duplicate of this bug. ***

Comment 32 Red Hat Bugzilla 2023-09-15 00:11:03 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days