Description of problem: When osd_scrub_auto_repair=true is set, inconsistent placement groups on erasure coded pools should automatically get repaired. When testing this function out it seems every deep-scrub is triggering an repair even if not errors are reported on the pg. This is not ideal, and cases the cluster to go into a warn state which can throw alerts in customers monitoring systems. Version-Release number of selected component (if applicable): RHCS 2.3 How reproducible: Everytime Steps to Reproduce: 1. ceph tell osd.* injectargs '--osd_scrub_auto_repair=true' 2. force deepscrub, or wait for next deepscrub Actual results: PG will be in repair state Expected results: PG should only be in repair state if there is an error. Additional info: It looks like the repair is cancelled when the number of errors is greater then osd_scrub_auto_repair_num_errors, but shouldn't it also be cancelled if number or errors is zero? https://github.com/ceph/ceph/blob/jewel/src/osd/PG.cc#L4686
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:0911