Bug 1318055 - collection split causes orphaned files in filestore causing inconsistent scrubs and crashes on pg removal
Summary: collection split causes orphaned files in filestore causing inconsistent scru...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 1.3.2
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: rc
: 2.0
Assignee: Samuel Just
QA Contact: shylesh
URL:
Whiteboard:
Depends On:
Blocks: 1334534
TreeView+ depends on / blocked
 
Reported: 2016-03-15 21:52 UTC by Samuel Just
Modified: 2017-07-30 15:14 UTC (History)
8 users (show)

Fixed In Version: ceph-10.2.0-1.el7cp
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1334534 (view as bug list)
Environment:
Last Closed: 2016-08-23 19:34:56 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 14766 0 None None None 2016-03-15 21:52:15 UTC
Red Hat Product Errata RHBA-2016:1755 0 normal SHIPPED_LIVE Red Hat Ceph Storage 2.0 bug fix and enhancement update 2016-08-23 23:23:52 UTC

Description Samuel Just 2016-03-15 21:52:16 UTC
Description of problem:

When a collection split occurs during a collection_move_rename on an ec object, it can result in the old link being orphaned in the parent directory (see the commits for details).  

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Create new cluster
2. Create ec pool with one pg.
3. Fill up pool by alternately creating a new object and overwriting an existing object.  All of the objects should have names longer than 300 characters.
4. When the pg collection splits, there should be an orphaned file in the <pgid>_head/ directory for the single pg on the osds with the pg.

Actual results:

4. is proof that the bug happened.  The following symptoms may also occur depending on the exact objects created:

Scrubbing the pg may result in the pg being reported incorrectly as inconsistent.  Removing the orphaned file and re-scrubbing should cause the pg to stop being inconsistent.

Causing the pg mapping to change may cause the ENOTEMPTY crash in the linked bug.  Again, the file remaining in the collection directory will be the orphaned file.

Expected results:

No orphaned file, crashes, or pg inconsistencies.


Additional info:

Comment 1 Samuel Just 2016-03-15 21:52:51 UTC
https://github.com/ceph/ceph/pull/8136 -- merged.  Not backported to hammer yet.

Comment 2 Samuel Just 2016-04-08 14:41:56 UTC
Still needs backport to hammer.

Comment 3 Ken Dreyer (Red Hat) 2016-04-26 20:30:12 UTC
Looks like that PR in comment 1 already made it into v10.2.0 upstream?

If that's the case, would you please re-target this bz to 1.3.3?

Comment 4 Samuel Just 2016-05-09 20:42:22 UTC
Pending backport to hammer.

Comment 5 Samuel Just 2016-05-09 20:48:12 UTC
Oops, I guess this stays in the 2.0 release tag so I should leave it as POST?

Comment 6 Samuel Just 2016-05-09 20:48:26 UTC
It's in 10.2.0

Comment 7 Samuel Just 2016-05-09 21:02:16 UTC
It's in 10.2.0, so it's MODIFIED I guess

Comment 12 errata-xmlrpc 2016-08-23 19:34:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1755.html


Note You need to log in before you can comment on or make changes to this bug.