Description of problem: The past_intervals structure can get very big, consuming memory and ultimately making recovery difficult. How reproducible: It has happened several times with customers with unhealthy clusters. Steps to Reproduce: 1. make cluster unhealthy 2. thrash osds 3. osd memory requirements increase, eventually beyond what the host has available Now in upstream jewel: https://github.com/ceph/ceph/pull/17351 Backport that patch to downstream 2.y. Note that luminous (and thus 3.y) does not have this problem.
Fix is in Ceph v10.2.10 upstream
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0340