1251036 – fsck.gfs2: Segfault with corrupt rindex

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1251036 - fsck.gfs2: Segfault with corrupt rindex

Summary: fsck.gfs2: Segfault with corrupt rindex

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	gfs2-utils
Sub Component:
Version:	7.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Robert Peterson
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:	1271674
Blocks:	1203710 1295577 1313485
TreeView+	depends on / blocked

Reported:	2015-08-06 12:53 UTC by Robert Peterson
Modified:	2019-08-15 05:04 UTC (History)
CC List:	6 users (show)
Fixed In Version:	gfs2-utils-3.1.9-1.el7
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-11-04 06:29:59 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Tool to rebuild rindex from printsavedmeta plus file system (34.12 KB, text/plain) 2015-08-14 20:37 UTC, Robert Peterson	no flags	Details
Patch #1 - fsck.gfs2: Read jindex before making rindex repairs (15.07 KB, patch) 2015-09-09 14:43 UTC, Robert Peterson	no flags	Details \| Diff
Patch #2 - fsck.gfs2: Detect multiple rgrp grow segments (13.64 KB, patch) 2015-09-09 14:44 UTC, Robert Peterson	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	1475183	0	None	None	None	Never
Red Hat Product Errata	RHBA-2016:2438	0	normal	SHIPPED_LIVE	gfs2-utils bug fix and enhancement update	2016-11-03 14:01:57 UTC

Description Robert Peterson 2015-08-06 12:53:52 UTC

Description of problem:
I recently received a set of customer metadata which had a
corrupt rindex file. When I ran fsck.gfs2 on it, it segfaulted.
I ran my little rindex size fix-up program, rindex_set_di_size.c
which fixed di_size properly. When I ran fsck.gfs2, it still
segfaulted. Using gdb, I got the following call trace:

(gdb) run -y /dev/clariion_lun10/scratch &> /tmp/fsck.out
Starting program: /home/bob/gfs2-utils/gfs2/fsck/./fsck.gfs2 -y /dev/clariion_lun10/scratch &> /tmp/fsck.out

Program received signal SIGSEGV, Segmentation fault.
0x0000000000422a0b in gfs2_rgrp_free (rgrp_tree=rgrp_tree@entry=0x7fffffffdbf0) at rgrp.c:247
247                     if (rgd->bits[0].bi_bh) { /* if a buffer exists */
Missing separate debuginfos, use: debuginfo-install glibc-2.17-79.el7.x86_64
(gdb) bt
#0  0x0000000000422a0b in gfs2_rgrp_free (rgrp_tree=rgrp_tree@entry=0x7fffffffdbf0) at rgrp.c:247
#1  0x000000000041da87 in rg_repair (sdp=sdp@entry=0x7fffffffd860, trust_lvl=trust_lvl@entry=2, rg_count=rg_count@entry=0x7fffffffd508, sane=sane@entry=0x7fffffffd50c) at rgrepair.c:874
#2  0x00000000004046aa in fetch_rgrps (sdp=sdp@entry=0x7fffffffd860) at initialize.c:638
#3  0x000000000040618a in initialize (sdp=sdp@entry=0x7fffffffd860, force_check=0, preen=0, all_clean=all_clean@entry=0x7fffffffd77c) at initialize.c:1742
#4  0x0000000000401fe8 in main (argc=3, argv=0x7fffffffdda8) at main.c:355
(gdb) 

My philosophy is: The fsck.gfs2 program should NEVER segfault
no matter what kind of hideous rubbish you throw at it.

Version-Release number of selected component (if applicable):
RHEL7.2

How reproducible:
Always

Steps to Reproduce:
1.gfs2_edit restoremeta savemeta.mda.china.01445381 <dev>
2.fsck.gfs2 <dev>

Actual results:
Segmentation fault

Expected results:
fsck.gfs2 should never segfault.

Additional info:
WARNING: This is a huge 15GB set of metadata. It requires a huge
device (I used 25TB because 10TB was not enough). It takes many
hours to restore, probably more than 8 hours or more, depending
on the hardware.

Comment 1 Robert Peterson 2015-08-14 20:37:19 UTC

Created attachment 1063170 [details]
Tool to rebuild rindex from printsavedmeta plus file system

After much struggle and strife, I wrote and debugged this program.
It rebuilds a corrupt rindex from scratch, based on the contents
of a printsavemeta. It also uses the existing file system to tell
where the journal blocks are, so it can avoid rgrps that lie
within. It might still have bugs, but it successfully rebuilt this
particular file system's rindex, and it's a very complex case.

Comment 2 Robert Peterson 2015-09-09 14:33:27 UTC

Setting this to assigned and requesting some flags. I've got
a couple patches for this already, which I'll attach shortly.

Comment 3 Robert Peterson 2015-09-09 14:43:16 UTC

Created attachment 1071807 [details]
Patch #1 - fsck.gfs2: Read jindex before making rindex repairs

In most cases, the rindex needs to be read into memory in case the
journals or jindex are corrupt and need repairs. However, in some
rare cases, the rindex needs repairs, and in the rindex repair code
it needs to read in the jindex and journals in order to filter out
rgrp records that appear in the journals. This prevents the rgrp
records inside journals from being treated as real rgrps, rather
than false-positives.

This patch also fixes a segfault in the rgrp code for cases of
extremely corrupt rindex files where the rgrp has no buffers.

Comment 4 Robert Peterson 2015-09-09 14:44:19 UTC

Created attachment 1071808 [details]
Patch #2 - fsck.gfs2: Detect multiple rgrp grow segments

This patch gives fsck.gfs2's rgrepair code the capability to detect
multiple gfs2_grow segments and repair them accordingly.

Comment 6 Robert Peterson 2016-02-25 18:41:14 UTC

Today I posted 11 patches to upstream cluster-devel related to this.

Comment 7 Robert Peterson 2016-03-24 13:19:04 UTC

Yesterday I pushed the upstream patches to the gfs2-utils
git tree. It should be an easy port to rhel7.

Comment 8 Mike McCune 2016-03-28 23:30:47 UTC

This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 9 Andrew Price 2016-06-07 15:32:10 UTC

https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=498444

Comment 12 Justin Payne 2016-09-09 01:11:36 UTC

Verified in gfs2-utils-3.1.9-3.el7:

[root@dash-02 ~]# rpm -q gfs2-utils
gfs2-utils-3.1.9-3.el7.x86_64
[root@dash-02 ~]# gfs2_edit restoremeta savemeta.mda.china.01445381 /dev/mapper/mpatha1
[root@dash-02 ~]# ./rindex_set_di_size /dev/mapper/mpatha1
[root@dash-02 ~]# gfs2_edit printsavedmeta savemeta.mda.china.01445381 > printsavedmeta.mda.china.01445381.out
[root@dash-02 ~]# ./rindex_from_printsavedmeta  printsavedmeta.mda.china.01445381.out /dev/mapper/mpatha1
Checking for rgrps in journal0.
Checking for rgrps in journal1.
Checking for rgrps in journal2.
Checking for rgrps in journal3.
First rg length: 0x5
 42735 resource groups written.here.
[root@dash-02 ~]# fsck.gfs2 -y /dev/mapper/mpatha1 &> fsck.out
[root@dash-02 ~]# tail fsck.out 
dinodes: 78652666 (0x4b024fa)

Calculated statfs values:
blocks:  2594517996 (0x9aa533ec)
free:    166405146 (0x9eb241a)
dinodes: 78639364 (0x4aff104)
The statfs file was fixed.
check_statfs completed in 0.002s
Writing changes to disk
gfs2_fsck complete

Comment 14 errata-xmlrpc 2016-11-04 06:29:59 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2438.html

Note You need to log in before you can comment on or make changes to this bug.