Description of problem: While recently analyzing a customer's gfs2 metadata, I ran fsck.gfs2 and it segfaulted in pass1b. I tracked down the problem, and this bug is to track the problem and its fix. Version-Release number of selected component (if applicable): RHEL57 How reproducible: Unknown Steps to Reproduce: 1. Restore customer metadata 2. fsck.gfs2 -y /dev/device 3. Actual results: Segfault in pass1b Expected results: fsck.gfs2 should run to completion. Additional info: Patch available
Created attachment 479925 [details] Patch to fix the problem The problem occurred when there were duplicate block references in a dinode but all references in the duplicate list are eventually deleted due to other corruption. The fix is an additional check whether the list is empty.
Requesting ack flags for 5.7.
The patch was pushed to the RHEL57 branch of the cluster.git tree for inclusion into 5.7. It was also pushed to the master branch of the gfs2-utils git tree. Crosswrite bug #679080 was created for the RHEL6 work. Changing status to POST until we get this into a build.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: This field is the basis of the errata or release note for this bug. It can also be used for change logs. The Technical Note template, known as CCFR, is as follows: Cause What actions or circumstances cause this bug to present. Consequence What happens when the bug presents. Fix What was done to fix the bug. Result What now happens when the actions or circumstances above occur. Note: this is not the same as the bug doesn’t present anymore.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,13 +1,20 @@ -This field is the basis of the errata or release note for this bug. It can also be used for change logs. - -The Technical Note template, known as CCFR, is as follows: - Cause - What actions or circumstances cause this bug to present. +A user does fsck.gfs2 on a GFS2 file system. The file system damage +is such that two inodes point to the same metadata block, thus +qualifying it as having a "duplicate block reference". However, both +dinodes have other unrecoverable damage, so both get thrown away. + Consequence - What happens when the bug presents. +In pass1b, fsck.gfs2 segfaults and does not complete. The problem +was due to all duplicate references having been removed because of +other damage. It tries to evaluate the "remaining reference," but +since there are no remaining references, it tries to access something +that doesn't exist. + Fix - What was done to fix the bug. +A new check has been added to see if the duplicate reference +list is empty. + Result - What now happens when the actions or circumstances above occur. +As a result of the new check, pass1b completes normally and the - Note: this is not the same as the bug doesn’t present anymore.+fsck.gfs2 finishes normally.
Bob looked for the customer metadata which he initially hit this with. After searching for a while he remembered that the corruption was caused by leftover metadata on disk when he restored the customer data. He since tried to reproduce the conditions by hand but was not able to reproduce it. I verified that the patch is included and no new regressions were found during regression testing.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-1042.html