Bug 506550

Summary: gfs2_fsck does not check EA dinodes properly, wrong ea_type cannot be repaired if EA<block_size
Product: Red Hat Enterprise Linux 5 Reporter: Jaroslav Kortus <jkortus>
Component: gfs2-utilsAssignee: Robert Peterson <rpeterso>
Status: CLOSED DUPLICATE QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: high    
Version: 5.4CC: edamato
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-07-15 22:49:01 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
metadata of GFS2 filesystem with wrong ea_type field none

Description Jaroslav Kortus 2009-06-17 17:54:57 UTC
Created attachment 348308 [details]
metadata of GFS2 filesystem with wrong ea_type field

Description of problem:
gfs2_fsck (and gfs_fsck too) fails to check EA dinodes if these are smaller than block (no indirect addressing needed). It works correctly (detects&removes) for large (>block_size) EA dinodes.

It seems to me that it fails to locate the EA if they are this size (see the output below).

When gfs2_fsck is run on filesystem with small EA node and the ea_type field is invalid (0x99 for example) it fails to detect this error and exits with 0.

If this filesystem is mounted and such a file accessed the node is withdrawn from the cluster and filesystem cannot be accessed. This can be a bit confusing as gfs2_fsck still thinks its all OK. 

Metadata example of corrupted ea_type is attached.

Tested on x86_64.

Version-Release number of selected component (if applicable):
gfs-utils-0.1.19-3.el5
gfs2-utils-0.1.58-1.el5
kmod-gfs-0.1.33-2.el5
GFS fsck 0.1.19 (built May  4 2009 19:34:42)
GFS2 fsck 0.1.58 (built May 29 2009 15:43:58)


How reproducible:
always

Steps to Reproduce:
1. create fresh FS (gfs2 or gfs1)
2. mount it with "-o acl" and create file on it (file-01)
3. force some xattr population:
        for i in `seq 50000 50100`; do 
                setfacl -m u:$i:rw mountedFS/file-01 ;
        done
4. unmount and change the ea_type field in EA dinode of the file to 0x99
5. run gfs2_fsck on the filesystem, it exits 0. Notice in "-v -v" output that there is no EA dionde found
6. mount the filesystem and try stat the file
  
Actual results:
Filesystem error is not detected and run-time filesystem panic can occur.

Expected results:
Filesystem error is detected (as it is for "large" EA dinodes) and repaired.

Additional info:

Output snip from fsck on FS containing one small EA dinode:
Pass1b complete      
Starting pass1c 
Looking for inodes containing ea blocks...
Pass1c complete

The same for larger EA:
Looking for inodes containing ea blocks...
EA in inode 4656 (0x1230)
(pass1c.c:266)  Found eattr at 7105 (0x1bc1)
(metawalk.c:674)        Extended attributes exist for inode #4656 (0x1230).
(metawalk.c:609)        Checking EA indirect block #7105 (0x1bc1).
(metawalk.c:571)        Checking EA leaf block #4657 (0x1231).
(pass1c.c:202)    Pointers Required: 2
  Pointers Reported: 2
(metawalk.c:571)        Checking EA leaf block #7106 (0x1bc2).
(metawalk.c:571)        Checking EA leaf block #7107 (0x1bc3).
Pass1c complete


/var/log/messages on file access:
Jun 17 13:20:56 dell-pe1855-02 kernel: GFS: fsid=a3cluster:a3gfs2.0: fatal: filesystem consistency error
Jun 17 13:20:56 dell-pe1855-02 kernel: GFS: fsid=a3cluster:a3gfs2.0:   inode = 24/24
Jun 17 13:20:56 dell-pe1855-02 kernel: GFS: fsid=a3cluster:a3gfs2.0:   function = ea_foreach_i
Jun 17 13:20:56 dell-pe1855-02 kernel: GFS: fsid=a3cluster:a3gfs2.0:   file = /builddir/build/BUILD/gfs-kmod-0.1.33/_kmod_build_/src/gfs/eattr.c, line = 134
Jun 17 13:20:56 dell-pe1855-02 kernel: GFS: fsid=a3cluster:a3gfs2.0:   time = 1245259256
Jun 17 13:20:56 dell-pe1855-02 kernel: GFS: fsid=a3cluster:a3gfs2.0: about to withdraw from the cluster
Jun 17 13:20:56 dell-pe1855-02 kernel: GFS: fsid=a3cluster:a3gfs2.0: telling LM to withdraw
Jun 17 13:20:56 dell-pe1855-02 kernel: GFS: fsid=a3cluster:a3gfs2.0: withdrawn
Jun 17 13:20:56 dell-pe1855-02 kernel: 
Jun 17 13:20:56 dell-pe1855-02 kernel: Call Trace:
Jun 17 13:20:56 dell-pe1855-02 kernel:  [<ffffffff88607fcc>] :gfs:gfs_lm_withdraw+0xc4/0xd3
Jun 17 13:20:56 dell-pe1855-02 kernel:  [<ffffffff800bd580>] delayacct_end+0x5d/0x86
Jun 17 13:20:56 dell-pe1855-02 kernel:  [<ffffffff80064a18>] __wait_on_bit+0x60/0x6e
Jun 17 13:20:57 dell-pe1855-02 kernel:  [<ffffffff80015a20>] sync_buffer+0x0/0x3f
Jun 17 13:20:57 dell-pe1855-02 kernel:  [<ffffffff80064a92>] out_of_line_wait_on_bit+0x6c/0x78
Jun 17 13:20:57 dell-pe1855-02 kernel:  [<ffffffff8861fb87>] :gfs:gfs_consist_inode_i+0x3d/0x42
Jun 17 13:20:57 dell-pe1855-02 kernel:  [<ffffffff885f4b62>] :gfs:gfs_dreread+0x87/0xc7
Jun 17 13:20:57 dell-pe1855-02 kernel:  [<ffffffff885f96e2>] :gfs:ea_foreach_i+0x108/0x118
Jun 17 13:20:57 dell-pe1855-02 kernel:  [<ffffffff885f9751>] :gfs:ea_foreach+0x5f/0x178
Jun 17 13:20:57 dell-pe1855-02 kernel:  [<ffffffff885fae25>] :gfs:ea_find_i+0x0/0x6b
Jun 17 13:20:57 dell-pe1855-02 kernel:  [<ffffffff885f98a3>] :gfs:gfs_ea_find+0x39/0x46
Jun 17 13:20:57 dell-pe1855-02 kernel:  [<ffffffff885fb0e7>] :gfs:gfs_ea_get_i+0x22/0x88
Jun 17 13:20:57 dell-pe1855-02 kernel:  [<ffffffff885f9efa>] :gfs:gfs_ea_get+0x70/0x87
Jun 17 13:20:57 dell-pe1855-02 kernel:  [<ffffffff800640dd>] wait_for_completion+0x1f/0xa2
Jun 17 13:20:57 dell-pe1855-02 kernel:  [<ffffffff88614e86>] :gfs:gfs_getxattr+0x93/0xa4
Jun 17 13:20:57 dell-pe1855-02 kernel:  [<ffffffff8012a2d0>] inode_doinit_with_dentry+0x176/0x47c
Jun 17 13:20:57 dell-pe1855-02 kernel:  [<ffffffff800312e8>] d_splice_alias+0xd4/0xfb
Jun 17 13:20:57 dell-pe1855-02 kernel:  [<ffffffff886152d7>] :gfs:gfs_lookup+0x3e2/0x41a
Jun 17 13:20:57 dell-pe1855-02 kernel:  [<ffffffff885fe04d>] :gfs:lock_on_glock+0x66/0x6d
Jun 17 13:20:57 dell-pe1855-02 kernel:  [<ffffffff80128917>] avc_has_perm+0x43/0x55
Jun 17 13:20:57 dell-pe1855-02 kernel:  [<ffffffff8000d57d>] do_lookup+0xe5/0x1e6
Jun 17 13:20:57 dell-pe1855-02 kernel:  [<ffffffff8000a8db>] __link_path_walk+0xa01/0xf42
Jun 17 13:20:57 dell-pe1855-02 kernel:  [<ffffffff8000f043>] link_path_walk+0x42/0xb2
Jun 17 13:20:57 dell-pe1855-02 kernel:  [<ffffffff8000d31d>] do_path_lookup+0x270/0x2e7
Jun 17 13:20:57 dell-pe1855-02 kernel:  [<ffffffff80012e0b>] getname+0x15b/0x1c2
Jun 17 13:20:57 dell-pe1855-02 kernel:  [<ffffffff80023cdf>] __user_walk_fd+0x37/0x4c
Jun 17 13:20:57 dell-pe1855-02 kernel:  [<ffffffff8003f4de>] vfs_lstat_fd+0x18/0x47
Jun 17 13:20:57 dell-pe1855-02 kernel:  [<ffffffff80025c6d>] filldir+0x0/0xb7
Jun 17 13:20:57 dell-pe1855-02 kernel:  [<ffffffff8002af75>] sys_newlstat+0x19/0x31
Jun 17 13:20:57 dell-pe1855-02 kernel:  [<ffffffff8005e229>] tracesys+0x71/0xe0
Jun 17 13:20:57 dell-pe1855-02 kernel:  [<ffffffff8005e28d>] tracesys+0xd5/0xe0
Jun 17 13:20:57 dell-pe1855-02 kernel: 
Jun 17 13:20:57 dell-pe1855-02 kernel: inode_doinit_with_dentry:  getxattr returned 5 for dev=dm-1 ino=24

Comment 1 Robert Peterson 2009-06-23 14:35:32 UTC
I verified that this is indeed a bug, although this kind of
corruption should be rare (if even possible) in the field.

I haven't debugged it yet, but I verified it's not fixed by my
latest extensive changes to fsck.gfs2 for bug #500483.  It should
(hopefully) take less than a day to debug this and write a fix.

I recommend we fix it in RHEL5.5.  Changing status to assigned and
requesting ack flags.

Comment 2 Robert Peterson 2009-07-15 22:49:01 UTC
I figured out the problem.  This is actually a regression from this
commit from June 2006:

http://git.fedoraproject.org/git/?p=cluster.git;a=commitdiff;h=b7a4317df0f9493d30aba84fd3451de61e506b89

However, the extended attribute processing code is so intertwined
with my work for bug #500483 that I'm just going to roll the fix
into that one.  My latest patch for bug #500483 repairs the damage,
so I'm going to close this bug as a duplicate.  Good catch!

*** This bug has been marked as a duplicate of bug 500483 ***