Bug 679076

Summary: fsck.gfs2: segfault in pass1b
Product: Red Hat Enterprise Linux 5 Reporter: Robert Peterson <rpeterso>
Component: gfs2-utilsAssignee: Robert Peterson <rpeterso>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: medium    
Version: 5.7CC: adas, djansa, edamato, fnadge
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: gfs2-utils-0.1.62-30.el5 Doc Type: Bug Fix
Doc Text:
Cause A user does fsck.gfs2 on a GFS2 file system. The file system damage is such that two inodes point to the same metadata block, thus qualifying it as having a "duplicate block reference". However, both dinodes have other unrecoverable damage, so both get thrown away. Consequence In pass1b, fsck.gfs2 segfaults and does not complete. The problem was due to all duplicate references having been removed because of other damage. It tries to evaluate the "remaining reference," but since there are no remaining references, it tries to access something that doesn't exist. Fix A new check has been added to see if the duplicate reference list is empty. Result As a result of the new check, pass1b completes normally and the fsck.gfs2 finishes normally.
Story Points: ---
Clone Of:
: 679080 (view as bug list) Environment:
Last Closed: 2011-07-21 11:10:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 679080    
Attachments:
Description Flags
Patch to fix the problem none

Description Robert Peterson 2011-02-21 14:12:20 UTC
Description of problem:
While recently analyzing a customer's gfs2 metadata, I ran
fsck.gfs2 and it segfaulted in pass1b.  I tracked down the
problem, and this bug is to track the problem and its fix.

Version-Release number of selected component (if applicable):
RHEL57

How reproducible:
Unknown

Steps to Reproduce:
1. Restore customer metadata
2. fsck.gfs2 -y /dev/device
3.
  
Actual results:
Segfault in pass1b

Expected results:
fsck.gfs2 should run to completion.

Additional info:
Patch available

Comment 1 Robert Peterson 2011-02-21 14:15:25 UTC
Created attachment 479925 [details]
Patch to fix the problem

The problem occurred when there were duplicate block
references in a dinode but all references in the duplicate
list are eventually deleted due to other corruption.
The fix is an additional check whether the list is empty.

Comment 2 Robert Peterson 2011-02-21 14:16:28 UTC
Requesting ack flags for 5.7.

Comment 3 Robert Peterson 2011-02-22 22:52:07 UTC
The patch was pushed to the RHEL57 branch of the cluster.git
tree for inclusion into 5.7.  It was also pushed to the
master branch of the gfs2-utils git tree.  Crosswrite bug
#679080 was created for the RHEL6 work.  Changing status to
POST until we get this into a build.

Comment 6 Florian Nadge 2011-05-24 14:18:56 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
This field is the basis of the errata or release note for this bug. It can also be used for change logs.

The Technical Note template, known as CCFR, is as follows:

Cause
    What actions or circumstances cause this bug to present.
Consequence
    What happens when the bug presents.
Fix
    What was done to fix the bug.
Result
    What now happens when the actions or circumstances above occur.
    Note: this is not the same as the bug doesn’t present anymore.

Comment 7 Robert Peterson 2011-05-24 17:51:29 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,13 +1,20 @@
-This field is the basis of the errata or release note for this bug. It can also be used for change logs.
-
-The Technical Note template, known as CCFR, is as follows:
-
 Cause
-    What actions or circumstances cause this bug to present.
+A user does fsck.gfs2 on a GFS2 file system.  The file system damage
+is such that two inodes point to the same metadata block, thus
+qualifying it as having a "duplicate block reference".  However, both
+dinodes have other unrecoverable damage, so both get thrown away.
+
 Consequence
-    What happens when the bug presents.
+In pass1b, fsck.gfs2 segfaults and does not complete.  The problem
+was due to all duplicate references having been removed because of
+other damage.  It tries to evaluate the "remaining reference," but
+since there are no remaining references, it tries to access something
+that doesn't exist.
+
 Fix
-    What was done to fix the bug.
+A new check has been added to see if the duplicate reference
+list is empty.
+
 Result
-    What now happens when the actions or circumstances above occur.
+As a result of the new check, pass1b completes normally and the
-    Note: this is not the same as the bug doesn’t present anymore.+fsck.gfs2 finishes normally.

Comment 8 Nate Straz 2011-05-31 16:00:50 UTC
Bob looked for the customer metadata which he initially hit this with.  After searching for a while he remembered that the corruption was caused by leftover metadata on disk when he restored the customer data.  He since tried to reproduce the conditions by hand but was not able to reproduce it.

I verified that the patch is included and no new regressions were found during regression testing.

Comment 9 errata-xmlrpc 2011-07-21 11:10:14 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-1042.html