325151 – GFS2: gfs2_fsck changes to system inodes don't stick

Bug 325151 - GFS2: gfs2_fsck changes to system inodes don't stick

Summary: GFS2: gfs2_fsck changes to system inodes don't stick

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	gfs2-utils
Sub Component:
Version:	5.1
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Chris Feist
QA Contact:	GFS Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-10-09 16:53 UTC by Robert Peterson
Modified:	2010-01-12 03:39 UTC (History)
CC List:	2 users (show)
Fixed In Version:	RHBA-2008-0350
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-05-21 17:20:06 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Patch to fix the problem (24.84 KB, patch) 2007-11-28 22:40 UTC, Robert Peterson	no flags	Details \| Diff
Saved metadata for "control" fsck test (7.47 MB, application/octet-stream) 2007-11-28 22:54 UTC, Robert Peterson	no flags	Details
Saved metadata for "leukemia" test (7.47 MB, application/octet-stream) 2007-11-28 23:00 UTC, Robert Peterson	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2008:0350	0	normal	SHIPPED_LIVE	gfs2-utils bug fix update	2008-05-20 12:44:56 UTC

Description Robert Peterson 2007-10-09 16:53:10 UTC

Description of problem:
While working on a fix for bug #291551 I discovered a problem with
gfs2_fsck.  Basically, if gfs2_fsck makes changes to any of the system
inodes, the changes get reverted later.  The problem is that the
code doesn't recognize system inodes when it makes changes to them.
Instead, it treats them as regular inodes.  Later, when the program
ends, it rewrites the system inodes from entries in memory which do
NOT contain the changes made before.  That means that gfs2_fsck can't
fix inode problems with the system inodes.

Version-Release number of selected component (if applicable):
RHEL5, RHEL51

How reproducible:
Always

Steps to Reproduce:
1. Mount a gfs2 file system
2. Start some gfs2 IO process that creates a bunch of files and/or
   directories in the root inode (a system inode).  I used QE's d_io.
3. Power-fence the system while it's running.
4. When the system comes back up, run gfs2_fsck TWICE.
  
Actual results:
[root@roth-01 ../cluster/gfs2/fsck]# ./gfs2_fsck /dev/sdb1
Initializing fsck
Validating Resource Group index.
Level 1 RG check.
(level 1 passed)
Recovering journals (this may take a while).
Journal recovery complete.
Starting pass1
Pass1 complete      
Starting pass1b
Pass1b complete
Starting pass1c
Pass1c complete
Starting pass2
root inode 22 (0x16): Entries is 2 - should be 3
Fix entries for root inode 22 (0x16)? (y/n) y
Entries updated
Pass2 complete      
Starting pass3
Pass3 complete      
Starting pass4
Link count inconsistent for inode 22 (0x16) has 2 but fsck found 3.
Update link count for inode 22 (0x16) ? (y/n) y
Link count updated for inode 22 (0x16) 
Pass4 complete      
Starting pass5
Pass5 complete      
Writing changes to disk
gfs2_fsck complete    
[root@roth-01 ../cluster/gfs2/fsck]# ./gfs2_fsck /dev/sdb1
Initializing fsck
Validating Resource Group index.
Level 1 RG check.
(level 1 passed)
Recovering journals (this may take a while).
Journal recovery complete.
Starting pass1
Pass1 complete      
Starting pass1b
Pass1b complete
Starting pass1c
Pass1c complete
Starting pass2
root inode 22 (0x16): Entries is 2 - should be 3
Fix entries for root inode 22 (0x16)? (y/n) y
Entries updated
Pass2 complete      
Starting pass3
Pass3 complete      
Starting pass4
Link count inconsistent for inode 22 (0x16) has 2 but fsck found 3.
Update link count for inode 22 (0x16) ? (y/n) y
Link count updated for inode 22 (0x16) 
Pass4 complete      
Starting pass5
Pass5 complete      
Writing changes to disk
gfs2_fsck complete    

In this case, the first invocation of gfs2_fsck fixed the problem
with the root inode, but the fixes were later reverted.  The second
invocation found the same problem, as would all future invocations.

Expected results:
The second gfs2_fsck shouldn't find any errors with inode 22 (the root
inode) because the first invocation should have fixed it.

Additional info:
Serious problem.  Relatively easy to fix.

Comment 1 Robert Peterson 2007-10-11 15:35:41 UTC

Requesting ack flags for inclusion in 5.2.

Comment 3 Robert Peterson 2007-11-28 15:11:13 UTC

It's been a long time so I thought I'd write an update on this problem.

First, the good news:
By dummying up a file system with gfs2_edit, I was easily able to
recreate this problem.  I also debugged the problem, found the issue
and wrote a working prototype patch that fixes the problem.

Now the bad news:
The code impacts all the places where fsck references dinodes,
so it's pretty pervasive.  In order to test the various code paths,
I tried to run my tests from bug #382581.  That's when I discovered
that, unfortunately, the current version of gfs2_edit does not save
the rindex system file, which means I couldn't recreate the proper
conditions for the test again.  (That doesn't invalidate the testing
I did for 382581, however, because I used the same rindex file over
and over and everything worked.)

So first, I had to make a few more changes to gfs2_edit so that it
can save the rindex file properly.  Second, I had to rewrite the
test conditions.  Rather than creating six file systems with varying
states of corruption, I decided it made more sense to have two:
(1) A clean file system with a bunch of EAs and (2) a file system that
has leukemia, with a wide range of corruption throughout the system;
basically one occurrence of each possible type of corruption.

So I set up my good file system, saved it with the new, improved
gfs2_edit, then set about corrupting it in multiple ways to test code 
paths.

The test was an success.  In fact, it was too much of a success.
Not only did it find bugs in my patch to fix this problem, it
also uncovered more bugs that are out there lurking in the depths.
It uncovered bugs that likely exist in gfs1's gfs_fsck.
For example, if you have an inode with an extended attribute that
is really a duplicate of another inode's data block, fsck crashes.
It seems to be a condition that neither gfs1 nor gfs2's fscks can
handle properly.  The duplicate resolving code doesn't look at EAs.

I will likely open new bugzilla records in order to fix gfs2_edit
and to treat these new bugs apart from this bug record.  Unfortunately,
I can't ship the code and mark this as modified until those code paths
are tested.  I can't really test the code paths of my current fix
until I get these issues resolved.

Comment 4 Robert Peterson 2007-11-28 22:40:38 UTC

Created attachment 271861 [details]
Patch to fix the problem

This patch fixes the problem for this bug record and I'm planning to
ship it this way.

This isn't as bad as I previously thought.  The only serious bug
I found is the one mentioned in my previous comment, namely, there
are problems when gfs2_fsck encounters a block that is duplicated
between a dinode and an extended attribute.  I'm going to have to open
that as a separate bug record, and I believe it exists in gfs1 as well.
However, the odds of hitting this are very low, so I recommend
deferring that fix to 5.3 or even later.
For now, I've pulled that particular piece of corruption out of my
test case and gfs2_fsck seems to handle all the other corruption properly.

One notable exception is when an EA block is out of range, in which case
the code doesn't know how to repair it.  So the corruption is never fixed.
So gfs2_fsck will complain about the corruption each time it runs.
That could be technically opened as another bug record, but I don't have
a good solution or a good thought on how to fix that.  Maybe throw away
the whole thing?  At any rate, the code flags the error but doesn't crash.
Again, that's probably a problem with gfs1 as well.

This attached patch allows gfs2_fsck to use either new buffers for
dinodes or existing dinodes where the dinode happens to be for a system
file.  In debugging the fix, it had the effect of keeping my buffer
management code "honest".  In other words, it uncovered several places
where gfs2_fsck was mismanaging the buffers, but getting away with it
because the buffers were discarded.  Since the buffers for system
dinodes can't be discarded, I had to fix all these problems and manage
the buffers properly.

In doing my testing, I did encounter another very serious and ugly bug:
that has a very simple one-line fix.  Basically, if duplicates are
encountered at all, whole directories can be completely destroyed.
The fix is at the end of pass1.c:
-		b = osi_list_entry(tmp, struct blocks, list);
+		b = osi_list_entry(dup_list.next, struct blocks, list);

This fix is included in this patch, but it almost warrants its own bug.
I guess I'll call that a management decision and get some opinions on it.

This patch has been extensively tested by virtue of my new test case
which puts gfs2_fsck through its paces (and evidenced by the number of
bugs I shook out of it).

Comment 5 Robert Peterson 2007-11-28 22:54:20 UTC

Created attachment 271881 [details]
Saved metadata for "control" fsck test

To run this test, copy the attached bz2 file into /tmp, then:

cd /tmp/
bunzip2 fscktest1.meta.bz2
gfs2_edit restoremeta /tmp/fscktest1.meta /dev/roth_vg/roth_lv
gfs2_fsck /dev/roth_vg/roth_lv

Your destination device (e.g. /dev/roth_vg/roth_lv) should be
72GB or bigger.

Comment 6 Robert Peterson 2007-11-28 23:00:26 UTC

Created attachment 271901 [details]
Saved metadata for "leukemia" test

To run this test, copy the attached bz2 file into /tmp, then:

cd /tmp/
bunzip2 fscktest3.meta.bz2
gfs2_edit restoremeta /tmp/fscktest3.meta /dev/roth_vg/roth_lv
gfs2_fsck /dev/roth_vg/roth_lv

Your destination device (e.g. /dev/roth_vg/roth_lv) should be
72GB or bigger.

Comment 7 Robert Peterson 2007-11-28 23:18:01 UTC

Correction:  Toward the end of comment #4, I mentioned a serious and
ugly bug that has a simple one-line fix.  The code actually appears at
the end of pass1b.c (not pass1.c as stated).  That fix already exists in
gfs1's gfs_fsck (shipped as part of bug #216203.)
Apparently it was cross-written to gfs2 incorrectly by yours truly
when I did the fix for gfs2 bug #223506 back in January.

Comment 8 Robert Peterson 2007-11-29 16:26:38 UTC

I tested on roth-01 using my newly designed "leukemia" test as described
above and attached below.  I committed this fix to the HEAD and RHEL5
branches of cvs for inclusion in RHEL5.2, so I'm changing the status to
modified.

Other notes:
1. I opened bug #404611 to address the problem of blocks that are
   duplicated between data and extended attributes.
2. After getting management input, I'm shipping the one-line fix
   mentioned in comment #4 and comment #7 with this patch (as attached.)

Comment 11 errata-xmlrpc 2008-05-21 17:20:06 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0350.html

Note You need to log in before you can comment on or make changes to this bug.