Bug 1457097

Summary: PG repair does not repair the objects whose attributes are corrupted.
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Parikshith <pbyregow>
Component: RADOSAssignee: David Zafman <dzafman>
Status: CLOSED ERRATA QA Contact: Parikshith <pbyregow>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 2.3CC: ceph-eng-bugs, dzafman, hnallurv, icolle, kchai, shmohan, tserlin
Target Milestone: rc   
Target Release: 2.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-10.2.7-24.el7cp Ubuntu: ceph_10.2.7-26redhat1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-06-19 13:33:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Parikshith 2017-05-31 06:35:14 UTC
Description of problem:

PG repair does not repair the objects whose attributes are corrupted.

Version-Release number of selected component (if applicable): ceph version 10.2.7-21.el7cp 


How reproducible:

Steps to Reproduce:
1. Created a EC pool(5+2), wrote some data.
2. Created a snapshot of this pool.
3. Picked one of the shard and corrupted user.ceph.snapset xattr
4. After running the scrub on primary, status is reporting an inconsistent object

ceph -s:
     health HEALTH_ERR
            1 pgs inconsistent
            1 scrub errors
     monmap e3: 3 mons at {aircobra=10.70.39.1:6789/0,cornell=10.70.39.6:6789/0,corsair=10.70.39.7:6789/0}
            election epoch 24, quorum 0,1,2 aircobra,cornell,corsair
     osdmap e388: 9 osds: 8 up, 8 in
            flags sortbitwise,require_jewel_osds
      pgmap v11682: 300 pgs, 2 pools, 288 GB data, 73910 objects
            673 GB used, 8224 GB / 8898 GB avail
                 299 active+clean
                   1 active+clean+inconsistent


5. Ran pg repair on the affected pg(3.11)


Actual results:

Corrupted object does not get repaired (pg 3.11)

ceph -w: 
2017-05-30 16:16:27.264217 mon.2 [INF] from='client.? 10.70.39.2:0/1474454893' entity='client.admin' cmd=[{"prefix": "pg repair", "pgid": "3.11"}]: dispatch
2017-05-30 16:16:31.773155 mon.0 [INF] pgmap v11683: 300 pgs: 1 active+clean+inconsistent, 299 active+clean; 288 GB data, 673 GB used, 8224 GB / 8898 GB avail
2017-05-30 16:16:32.796121 mon.0 [INF] pgmap v11684: 300 pgs: 1 active+clean+inconsistent, 299 active+clean; 288 GB data, 673 GB used, 8224 GB / 8898 GB avail
2017-05-30 16:16:29.614098 osd.6 [INF] 3.11 repair starts
2017-05-30 16:16:32.158385 osd.6 [ERR] 3.11 repair 1 errors, 0 fixed
2017-05-30 16:16:34.853660 mon.0 [INF] pgmap v11685: 300 pgs: 1 active+clean+inconsistent, 299 active+clean; 288 GB data, 673 GB used, 8224 GB / 8898 GB avail
2017-05-30 16:16:35.893849 mon.0 [INF] pgmap v11686: 300 pgs: 1 active+clean+inconsistent, 299 active+clean; 288 GB data, 673 GB used, 8224 GB / 8898 GB avail
2017-05-30 16:16:36.929105 mon.0 [INF] pgmap v11687: 300 pgs: 1 active+clean+inconsistent, 299 active+clean; 288 GB data, 673 GB used, 8224 GB / 8898 GB avail


Expected results:


Additional info:

Comment 2 John Poelstra 2017-05-31 15:38:23 UTC
Based on initial review, appears to be a blocker.  Development doing root cause analysis now.

Comment 14 errata-xmlrpc 2017-06-19 13:33:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1497