Bug 1827856 - follower monitors can grow beyond memory target
Summary: follower monitors can grow beyond memory target
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 4.1
Hardware: All
OS: Linux
medium
high
Target Milestone: z2
: 4.1
Assignee: Sridhar Seshasayee
QA Contact: Pawan
Aron Gunn
URL:
Whiteboard:
Depends On:
Blocks: 1816167
TreeView+ depends on / blocked
 
Reported: 2020-04-24 23:59 UTC by Josh Durgin
Modified: 2020-09-30 17:25 UTC (History)
11 users (show)

Fixed In Version: ceph-14.2.8-100.el8cp, ceph-14.2.8-100.el7cp
Doc Type: Bug Fix
Doc Text:
.Ceph Monitors can grow beyond the memory target Auto-tuning the memory target was only done on the Ceph Monitor leader and not the Ceph Monitors following the leader. This was causing the Ceph Monitor followers to exceed the set memory target, resulting in the Ceph Monitors crashing once its memory was exhausted. With this release, the auto-tuning process applies the memory target for the Ceph Monitor leader and its followers so memory is not exhausted on the system.
Clone Of:
Environment:
Last Closed: 2020-09-30 17:25:33 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 45266 0 None None None 2020-04-25 00:06:11 UTC
Github ceph ceph pull 34916 0 None closed nautilus: mon/OSDMonitor: Always tune priority cache manager memory on all mons 2020-09-29 11:35:47 UTC
Red Hat Product Errata RHBA-2020:4144 0 None None None 2020-09-30 17:25:55 UTC

Description Josh Durgin 2020-04-24 23:59:08 UTC
Description of problem:
The leader monitor periordically tells tcmalloc to release memory back to the OS, but follower monitors do not. This can result in follower monitors using more memory than their memory target, and potentially getting oom killed.

A workaround is to reset the mon_memory_target config option, which will cause all monitors to ask tcmalloc to release its free memory.

Alternately, mon_memory_autotune can be disabled.

Version-Release number of selected component (if applicable):
4.0 and later.

How reproducible:
deterministic, though workload to reproduce is unclear

Steps to Reproduce:
Set up cluster as in https://bugzilla.redhat.com/show_bug.cgi?id=1825312

Actual results:
>1GB RSS memory used by follower monitors

Expected results:
~1GB RSS memory used by all monitors

Comment 9 errata-xmlrpc 2020-09-30 17:25:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 4.1 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4144


Note You need to log in before you can comment on or make changes to this bug.