Bug 1251036 - fsck.gfs2: Segfault with corrupt rindex
fsck.gfs2: Segfault with corrupt rindex
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: gfs2-utils (Show other bugs)
7.2
Unspecified Unspecified
high Severity high
: rc
: ---
Assigned To: Robert Peterson
cluster-qe@redhat.com
: Patch
Depends On: 1271674
Blocks: 1203710 1313485 1295577
  Show dependency treegraph
 
Reported: 2015-08-06 08:53 EDT by Robert Peterson
Modified: 2016-11-04 02:29 EDT (History)
6 users (show)

See Also:
Fixed In Version: gfs2-utils-3.1.9-1.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-11-04 02:29:59 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Tool to rebuild rindex from printsavedmeta plus file system (34.12 KB, text/plain)
2015-08-14 16:37 EDT, Robert Peterson
no flags Details
Patch #1 - fsck.gfs2: Read jindex before making rindex repairs (15.07 KB, patch)
2015-09-09 10:43 EDT, Robert Peterson
no flags Details | Diff
Patch #2 - fsck.gfs2: Detect multiple rgrp grow segments (13.64 KB, patch)
2015-09-09 10:44 EDT, Robert Peterson
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 1475183 None None None Never

  None (edit)
Description Robert Peterson 2015-08-06 08:53:52 EDT
Description of problem:
I recently received a set of customer metadata which had a
corrupt rindex file. When I ran fsck.gfs2 on it, it segfaulted.
I ran my little rindex size fix-up program, rindex_set_di_size.c
which fixed di_size properly. When I ran fsck.gfs2, it still
segfaulted. Using gdb, I got the following call trace:

(gdb) run -y /dev/clariion_lun10/scratch &> /tmp/fsck.out
Starting program: /home/bob/gfs2-utils/gfs2/fsck/./fsck.gfs2 -y /dev/clariion_lun10/scratch &> /tmp/fsck.out

Program received signal SIGSEGV, Segmentation fault.
0x0000000000422a0b in gfs2_rgrp_free (rgrp_tree=rgrp_tree@entry=0x7fffffffdbf0) at rgrp.c:247
247                     if (rgd->bits[0].bi_bh) { /* if a buffer exists */
Missing separate debuginfos, use: debuginfo-install glibc-2.17-79.el7.x86_64
(gdb) bt
#0  0x0000000000422a0b in gfs2_rgrp_free (rgrp_tree=rgrp_tree@entry=0x7fffffffdbf0) at rgrp.c:247
#1  0x000000000041da87 in rg_repair (sdp=sdp@entry=0x7fffffffd860, trust_lvl=trust_lvl@entry=2, rg_count=rg_count@entry=0x7fffffffd508, sane=sane@entry=0x7fffffffd50c) at rgrepair.c:874
#2  0x00000000004046aa in fetch_rgrps (sdp=sdp@entry=0x7fffffffd860) at initialize.c:638
#3  0x000000000040618a in initialize (sdp=sdp@entry=0x7fffffffd860, force_check=0, preen=0, all_clean=all_clean@entry=0x7fffffffd77c) at initialize.c:1742
#4  0x0000000000401fe8 in main (argc=3, argv=0x7fffffffdda8) at main.c:355
(gdb) 

My philosophy is: The fsck.gfs2 program should NEVER segfault
no matter what kind of hideous rubbish you throw at it.

Version-Release number of selected component (if applicable):
RHEL7.2

How reproducible:
Always

Steps to Reproduce:
1.gfs2_edit restoremeta savemeta.mda.china.01445381 <dev>
2.fsck.gfs2 <dev>

Actual results:
Segmentation fault

Expected results:
fsck.gfs2 should never segfault.

Additional info:
WARNING: This is a huge 15GB set of metadata. It requires a huge
device (I used 25TB because 10TB was not enough). It takes many
hours to restore, probably more than 8 hours or more, depending
on the hardware.
Comment 1 Robert Peterson 2015-08-14 16:37:19 EDT
Created attachment 1063170 [details]
Tool to rebuild rindex from printsavedmeta plus file system

After much struggle and strife, I wrote and debugged this program.
It rebuilds a corrupt rindex from scratch, based on the contents
of a printsavemeta. It also uses the existing file system to tell
where the journal blocks are, so it can avoid rgrps that lie
within. It might still have bugs, but it successfully rebuilt this
particular file system's rindex, and it's a very complex case.
Comment 2 Robert Peterson 2015-09-09 10:33:27 EDT
Setting this to assigned and requesting some flags. I've got
a couple patches for this already, which I'll attach shortly.
Comment 3 Robert Peterson 2015-09-09 10:43:16 EDT
Created attachment 1071807 [details]
Patch #1 - fsck.gfs2: Read jindex before making rindex repairs

In most cases, the rindex needs to be read into memory in case the
journals or jindex are corrupt and need repairs. However, in some
rare cases, the rindex needs repairs, and in the rindex repair code
it needs to read in the jindex and journals in order to filter out
rgrp records that appear in the journals. This prevents the rgrp
records inside journals from being treated as real rgrps, rather
than false-positives.

This patch also fixes a segfault in the rgrp code for cases of
extremely corrupt rindex files where the rgrp has no buffers.
Comment 4 Robert Peterson 2015-09-09 10:44:19 EDT
Created attachment 1071808 [details]
Patch #2 - fsck.gfs2: Detect multiple rgrp grow segments

This patch gives fsck.gfs2's rgrepair code the capability to detect
multiple gfs2_grow segments and repair them accordingly.
Comment 6 Robert Peterson 2016-02-25 13:41:14 EST
Today I posted 11 patches to upstream cluster-devel related to this.
Comment 7 Robert Peterson 2016-03-24 09:19:04 EDT
Yesterday I pushed the upstream patches to the gfs2-utils
git tree. It should be an easy port to rhel7.
Comment 8 Mike McCune 2016-03-28 19:30:47 EDT
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune@redhat.com with any questions
Comment 12 Justin Payne 2016-09-08 21:11:36 EDT
Verified in gfs2-utils-3.1.9-3.el7:

[root@dash-02 ~]# rpm -q gfs2-utils
gfs2-utils-3.1.9-3.el7.x86_64
[root@dash-02 ~]# gfs2_edit restoremeta savemeta.mda.china.01445381 /dev/mapper/mpatha1
[root@dash-02 ~]# ./rindex_set_di_size /dev/mapper/mpatha1
[root@dash-02 ~]# gfs2_edit printsavedmeta savemeta.mda.china.01445381 > printsavedmeta.mda.china.01445381.out
[root@dash-02 ~]# ./rindex_from_printsavedmeta  printsavedmeta.mda.china.01445381.out /dev/mapper/mpatha1
Checking for rgrps in journal0.
Checking for rgrps in journal1.
Checking for rgrps in journal2.
Checking for rgrps in journal3.
First rg length: 0x5
 42735 resource groups written.here.
[root@dash-02 ~]# fsck.gfs2 -y /dev/mapper/mpatha1 &> fsck.out
[root@dash-02 ~]# tail fsck.out 
dinodes: 78652666 (0x4b024fa)

Calculated statfs values:
blocks:  2594517996 (0x9aa533ec)
free:    166405146 (0x9eb241a)
dinodes: 78639364 (0x4aff104)
The statfs file was fixed.
check_statfs completed in 0.002s
Writing changes to disk
gfs2_fsck complete
Comment 14 errata-xmlrpc 2016-11-04 02:29:59 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2438.html

Note You need to log in before you can comment on or make changes to this bug.