Description of problem: When a rados operation that would be a write returns an error, a pg log entry is added, but the pg log is not trimmed. If no successful writes are completed against this pg, the pg log will continue growing with each new error. Since the pg log is kept in memory, this can lead to out of memory conditions and thus the osds holding that pg crashing. Version-Release number of selected component (if applicable): 3.0 How reproducible: always Steps to Reproduce: 1. delete a non-existent object more than osd_max_pg_log_size times, e.g. 'rados -p test rm foo' 2. find the pg that would hold object foo: 'ceph osd map test foo' 3. check the length of the pg log: 'ceph query $PGID foo -f json | jq .info.stats.log_size' Actual results: More than osd_max_pg_log_size entries Expected results: Fewer than osd_max_pg_log_size entries Additional info: See https://github.com/ceph/ceph/blob/luminous/qa/standalone/osd/repro_long_log.sh for a simple test case, which is included in the rados suite via: https://github.com/ceph/ceph/blob/luminous/qa/suites/rados/standalone/osd.yaml
Luminous backport: http://tracker.ceph.com/issues/23323 https://github.com/ceph/ceph/pull/20851
*** Bug 1551721 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1259
*** Bug 1616039 has been marked as a duplicate of this bug. ***