Bug 1506782 - osd_scrub_auto_repair not working as expected
Summary: osd_scrub_auto_repair not working as expected
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 2.3
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: z2
: 3.2
Assignee: David Zafman
QA Contact: Manohar Murthy
Aron Gunn
URL:
Whiteboard:
Depends On:
Blocks: 1629656
TreeView+ depends on / blocked
 
Reported: 2017-10-26 17:56 UTC by tbrekke
Modified: 2022-03-13 14:40 UTC (History)
15 users (show)

Fixed In Version: RHEL: ceph-12.2.8-109.el7cp Ubuntu: ceph_12.2.8-94redhat1xenial
Doc Type: Bug Fix
Doc Text:
.A PG repair no longer sets the storage cluster to a warning state When doing a repair of a placement group (PG) it was considered a damaged PG. This was placing the storage cluster into a warning state. With this release, repairing a PG does not place the storage cluster into a warning state.
Clone Of:
Environment:
Last Closed: 2019-04-30 15:56:43 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 38070 0 None None None 2019-01-28 22:27:53 UTC
Github ceph ceph pull 26178 0 'None' closed mon: A PG with PG_STATE_REPAIR doesn't mean damaged data, PG_STATE_IN… 2020-11-20 09:45:32 UTC
Red Hat Issue Tracker RHCEPH-3710 0 None None None 2022-03-13 14:40:58 UTC
Red Hat Product Errata RHSA-2019:0911 0 None None None 2019-04-30 15:57:00 UTC

Description tbrekke 2017-10-26 17:56:45 UTC
Description of problem:

When osd_scrub_auto_repair=true is set, inconsistent placement groups on erasure coded pools should automatically get repaired. When testing this function out it seems every deep-scrub is triggering an repair even if not errors are reported on the pg. This is not ideal, and cases the cluster to go into a warn state which can throw alerts in customers monitoring systems. 

Version-Release number of selected component (if applicable):

RHCS 2.3

How reproducible:

Everytime

Steps to Reproduce:
1. ceph tell osd.* injectargs '--osd_scrub_auto_repair=true'
2. force deepscrub, or wait for next deepscrub

Actual results:

PG will be in repair state

Expected results:

PG should only be in repair state if there is an error.

Additional info:

It looks like the repair is cancelled when the number of errors is greater then osd_scrub_auto_repair_num_errors, but shouldn't it also be cancelled if number or errors is zero?

https://github.com/ceph/ceph/blob/jewel/src/osd/PG.cc#L4686

Comment 23 errata-xmlrpc 2019-04-30 15:56:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:0911


Note You need to log in before you can comment on or make changes to this bug.