1506782 – osd_scrub_auto_repair not working as expected

Bug 1506782 - osd_scrub_auto_repair not working as expected

Summary: osd_scrub_auto_repair not working as expected

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RADOS
Sub Component:
Version:	2.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	z2
Target Release:	3.2
Assignee:	David Zafman
QA Contact:	Manohar Murthy
Docs Contact:	Aron Gunn
URL:
Whiteboard:
Depends On:
Blocks:	1629656
TreeView+	depends on / blocked

Reported:	2017-10-26 17:56 UTC by tbrekke
Modified:	2022-03-13 14:40 UTC (History)
CC List:	15 users (show)
Fixed In Version:	RHEL: ceph-12.2.8-109.el7cp Ubuntu: ceph_12.2.8-94redhat1xenial
Doc Type:	Bug Fix
Doc Text:	.A PG repair no longer sets the storage cluster to a warning state When doing a repair of a placement group (PG) it was considered a damaged PG. This was placing the storage cluster into a warning state. With this release, repairing a PG does not place the storage cluster into a warning state.
Clone Of:
Environment:
Last Closed:	2019-04-30 15:56:43 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	38070	None	None	None	2019-01-28 22:27:53 UTC
Github	ceph ceph pull 26178	'None'	closed	mon: A PG with PG_STATE_REPAIR doesn't mean damaged data, PG_STATE_IN…	2020-11-20 09:45:32 UTC
Red Hat Issue Tracker	RHCEPH-3710	None	None	None	2022-03-13 14:40:58 UTC
Red Hat Product Errata	RHSA-2019:0911	None	None	None	2019-04-30 15:57:00 UTC

Description tbrekke 2017-10-26 17:56:45 UTC

Description of problem:

When osd_scrub_auto_repair=true is set, inconsistent placement groups on erasure coded pools should automatically get repaired. When testing this function out it seems every deep-scrub is triggering an repair even if not errors are reported on the pg. This is not ideal, and cases the cluster to go into a warn state which can throw alerts in customers monitoring systems. 

Version-Release number of selected component (if applicable):

RHCS 2.3

How reproducible:

Everytime

Steps to Reproduce:
1. ceph tell osd.* injectargs '--osd_scrub_auto_repair=true'
2. force deepscrub, or wait for next deepscrub

Actual results:

PG will be in repair state

Expected results:

PG should only be in repair state if there is an error.

Additional info:

It looks like the repair is cancelled when the number of errors is greater then osd_scrub_auto_repair_num_errors, but shouldn't it also be cancelled if number or errors is zero?

https://github.com/ceph/ceph/blob/jewel/src/osd/PG.cc#L4686

Comment 23 errata-xmlrpc 2019-04-30 15:56:43 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:0911

Note You need to log in before you can comment on or make changes to this bug.