Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1330035

Summary:	Even after removing the trouble some OSD, still seeing in-consistent PGs
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Tanay Ganguly <tganguly>
Component:	RADOS	Assignee:	Kefu Chai <kchai>
Status:	CLOSED WONTFIX	QA Contact:	ceph-qe-bugs <ceph-qe-bugs>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	2.0	CC:	ceph-eng-bugs, dzafman, hnallurv, kchai, kdreyer, kurs
Target Milestone:	rc
Target Release:	2.0
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-05-10 07:32:42 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Tanay Ganguly 2016-04-25 10:27:58 UTC

Description of problem:
Need to do a Deep Scrub to remove the inconsistent PG list

Version-Release number of selected component (if applicable):
10.1.1.1

How reproducible:
Always

Steps to Reproduce:
1. Had a cluster which had 15 PG's as inconsistent.
2. Identified the problem was with 1 particular Disk which have gone BAD.
3. Removed that particular OSD from Crush, data re-balance took place, but still those 15 PG's was showing as inconsistent.
4. And if i query 
rados list-inconsistent-obj 6.59
[]error 2: (2) No such file or directory

I was seeing error because that particular OSD was not there.

Actual results:
Needed to do deep-scrub on those inconsistent PG's to make my cluster clean.

Expected results:
It should have been taken care automatically.

Additional info:

Comment 3 Kefu Chai 2016-05-10 07:32:42 UTC

this problem is two folded:

still marked inconsistent after removing the bad OSD
====================================================

we share the monitor with current status after scrubbing. but we don't clean the PG_STATE_INCONSISTENT flag after peering. as we don't track why/who caused the inconsistency, and revert the flag once the bad guy is gone. it would be very tricky if we want to do this way. so a stupid and safer approach is to keep that flag until it is reset with a deep scrub which set it.



rados list-inconsistent-obj
===========================

"rados list-inconsistent-obj" targets the primary osd for getting the latest scrub result

- after the peering, the interval changed, so the object for storing the result of last scrub is zapped. that's why we have empty return value.
- and since the command does not send the epoch # as should the scrub script. we can hardly check if this inconsistency is outdated or not.

Not a blocker - recommend moving to 2.z

Comment 4 Ken Dreyer (Red Hat) 2016-05-10 13:14:36 UTC

Kefu can you please confirm that you meant to close this one as NOTABUG? The previous comment says "recommend moving to 2.z", so I wanted to double-check this.

Comment 5 Kefu Chai 2016-05-11 06:29:34 UTC

Ken, yes, I confirm.

sorry for the confusion. I forgot to remove that line after editing the reasons to close this bug as NOTABUG.

Comment 6 Tanay Ganguly 2016-05-13 09:21:05 UTC

Hi Kefu,

I think its a BUG but as designed.
Should we mark it as NOTABUG ?

Thanks,
Tanay

Comment 7 Kefu Chai 2016-05-25 08:13:17 UTC

Tanay, sorry for the latency. makes sense. changing it to WONTFIX.