Bug 146284

Summary: e2fsck corrupts root filesystem after removing xattr
Product: [Fedora] Fedora Reporter: Denis Charland <denis.charland>
Component: e2fsprogsAssignee: Stephen Tweedie <sct>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3CC: barryn, sct, twoerner, tytso
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2005-298 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-06-09 12:20:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Patch committed to e2fsprogs BK repository to fix problem
none
To be installed as tests/f_clear_xattr/image.gz after applying patch found in attachment #110311
none
Additional patch needed to fix corner case on big-endian systems.
none
gzipped output of "dumpe2fs" on the raw filesystem image (which crashes attachment #110311) none

Description Denis Charland 2005-01-26 18:23:29 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)

Description of problem:
e2fsck corrupts the root filesystem after removing the ext_attr 
feature using debugfs.


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
After booting in single-user mode and umounting / (still mounted but 
read-only) so I can run e2fsck, I run the following commands:

sh-3.00# debugfs -w /dev/sda2 -R "features ^ext_attr"

sh-3.00# e2fsck -y -f /dev/sda2

.
.
.
.

/: ***** FILE SYSTEM WAS MODIFIED *****
/: ***** REBOOT LINUX *****
/: 133033/1310720 files(0.5% non-contiguous...

sh-3.00# reboot
sh: reboot: command not found
sh-3.00# ls
ls: error loading shared librairies: librt.so.1: cannot open shared 
object 
file: no such file or directory
sh-3.00# init 0

After a few INIT messages, system hangs

Then I poweroff and poweron the system:

After the kernel is uncompressed and booted, I get the following 
messages:

exec of init (/sbin/init) failed!!!: 2
umount /initrd/dev failed: 2
Kernel panic - not syncing: Attempted to kill init!

The system hangs.

Note1: This procedure works correctly on /boot.

Note2: Same result when booting from rescue CD.

    

Additional info:

Comment 1 Stephen Tweedie 2005-01-26 18:35:17 UTC
Error 2 is ENOENT; has /sbin/init really disappeared?

Trying to reproduce.


Comment 2 Denis Charland 2005-01-26 20:50:13 UTC
I booted in rescue mode with FC3 installation CD. /sbin/init is still 
there. All the files on / are there but the date for a lot of files 
is Jan 1 1970. It seems that these files have their attributes 
corrupted.

Comment 3 Stephen Tweedie 2005-01-26 20:56:34 UTC
OK, I've reproduced this (using Xen, it made it much easier!)

The problem is that when e2fsck is removing xattrs, if it finds a symlink, it
removes the inode entirely.  Will investigate further.  I haven't observed any
timestamp problems so far.

No other files appear to have been damaged in the process, but all symlinks have
been destroyed entirely.  Curiously, "debugfs" can see the directory entries:

debugfs:  ls bin/
 49153  (12) .    2  (12) ..    0  (24) dnsdomainname    49154  (16) mktemp
 49155  (12) bash    0  (12) sh    49157  (12) sed    0  (12) awk
...

dnsdomainname, sh and awk here were all symlinks before, but they have all now
got an inode number of 0 (hence they are effectively empty dirents now.)


Comment 4 Stephen Tweedie 2005-01-26 21:34:27 UTC
I just verified this with the e2fsck from this morning's bitkeeper tree.  The
problem is still there.

I suspect that the problem is due to us failing to identify fast symlinks
properly when we are clearing out i_file_acl fields when the feature flag is
missing.  And indeed, the fsck log contains a lot of:

Symlink /usr/bin/rdistd (inode #70910) is invalid.

etc.


Comment 5 Stephen Tweedie 2005-01-26 23:26:16 UTC
Hmm --- for most of the inodes on the system, the e2fsck run produces output like:

# grep -w 72550 /tmp/e2fsck-remove-xattrs.log
Inode 72550, i_blocks is 80, should be 72.  Fix? yes
i_file_acl for inode 72550 (/usr/bin/shred) is 295752, should be zero.

But for symlinks we get:

# grep -w 49156 /tmp/e2fsck-remove-xattrs.log
Inode 49156, i_blocks is 8, should be 0.  Fix? yes
Symlink /bin/sh (inode #49156) is invalid.

So the "i_file_acl ... should be zero" line is missing for symlinks.  Looks like
that part of the cleanup is not being applied to symlinks.  And indeed, we don't
seem to be clearing that field:

debugfs:  stat <49156>
Inode: 49156   Type: symlink    Mode:  0777   Flags: 0x0   Generation: 2875386425
User:     0   Group:     0   Size: 4
File ACL: 295752    Directory ACL: 0
...
BLOCKS:
(0):1752392034
TOTAL: 1

so debugfs too is incapable of recognising this inode as a fast symlink, despite
it having 4 blocks and an ACL.


Comment 6 Stephen Tweedie 2005-01-27 12:39:36 UTC
I've got a preliminary fix --- working out the details with Ted Ts'o.


Comment 7 Denis Charland 2005-01-27 15:42:40 UTC
Re Comment #2

File dates in /usr also appear as Jan 1 1970 on a fresh FC3 
installation when booting in rescue mode. The file date attributes 
are not corrupted by e2fsck as I initially tought. It's probably the 
way the root filesystem is mounted in rescue mode. /usr appears as a 
symbolic link to /mnt/runtime/usr.

Comment 8 Theodore Tso 2005-01-27 19:35:45 UTC
Created attachment 110311 [details]
Patch committed to e2fsprogs BK repository to fix problem

See also additional file for tests/f_clear_xattr/image.gz (which is a binary
file, so it's not in the context diff).

Comment 9 Theodore Tso 2005-01-27 19:37:07 UTC
Created attachment 110312 [details]
To be installed as tests/f_clear_xattr/image.gz after applying patch found in attachment #110311 [details]

Comment 10 Theodore Tso 2005-01-28 17:30:50 UTC
Created attachment 110362 [details]
Additional patch needed to fix corner case on big-endian systems.

Note that there are some other test cases that need to be updated in order for
the regression test suite to pass completely on big and little endian systems,
but they are just test case updates.  This one is an actual bug that needs to
be fixed.  Please see the e2fsprogs bk repository for the other test case
updates.

Comment 11 Denis Charland 2005-01-31 19:17:13 UTC
I downloaded e2fsprogs-1.36-rc5, compiled and installed the new 
utilities. I rebooted FC3 in single-user mode, umounted all 
filesystems, ran debugfs and e2fsck on all filesystems to remove both 
resize_inode and ext_attr features. Everything worked properly.

As far as I'm concerned, the bug has been fixed.

Thanks.

Comment 12 Barry K. Nathan 2005-05-23 05:39:51 UTC
Attachment 110311 [details] (the main patch to fix this bug) is causing an e2fsck segfault
on a corrupted filesystem of mine. e2fsck 1.36-rc4 works fine, but 1.36-rc4 +
attachment 110311 [details] segfaults. Any newer version of e2fsck (up to 1.38-WIP-0509,
which is the latest as of this writing AFAIK) segfaults too.

I'm not sure if I can hand over the raw e2image of the filesystem that I'm using
to reproduce the segfault (the filenames might violate the privacy of several
dozen people). However, here's the last few lines of output when I run e2fsck
1.36-rc4 + attachment 110311 [details] inside gdb, as well as a backtrace:

---begin quote---
Inode 870493 ref count is 3332, should be 1.  Fix? yes

i_file_acl for inode 870522 (...) is 393216, should be zero.
Clear? yes


Program received signal SIGSEGV, Segmentation fault.
ext2fs_unmark_generic_bitmap (bitmap=0x0, bitno=870522) at gen_bitmap.c:43
43              if ((bitno < bitmap->start) || (bitno > bitmap->end)) {
(gdb) bt
#0  ext2fs_unmark_generic_bitmap (bitmap=0x0, bitno=870522) at gen_bitmap.c:43
#1  0x08052412 in e2fsck_process_bad_inode (ctx=0x80dc2a0, dir=0, ino=870522,
    buf=0x8102740 "") at bitops.h:529
#2  0x08055695 in e2fsck_pass4 (ctx=0x80dc2a0) at pass4.c:138
#3  0x0804b46b in e2fsck_run (ctx=0x80dc2a0) at e2fsck.c:193
#4  0x08049e48 in main (argc=6, argv=0xbf96d554) at unix.c:1105
(gdb)                                                                     
---end quote---

Is there anything else I can do to help fix this problem?

Comment 13 Theodore Tso 2005-05-23 16:57:14 UTC
Can you send the dumpe2fs output for your corrupted filesystem?   

Can you also send the output of running the debugfs command "stat <870522>"?

Also, with newer versions of e2fsprogs e2image has a new -s option which will
scramble the directory listings.  This causes problems if HTREE is enabled, and
if this is a fast symlink problem it might hide the issue or cause it to change.
  But, it could stil be quite useful.



Comment 14 Barry K. Nathan 2005-05-23 22:03:35 UTC
> Also, with newer versions of e2fsprogs e2image has a new -s option which will
> scramble the directory listings.  This causes problems if HTREE is enabled, and
> if this is a fast symlink problem it might hide the issue or cause it to change.
>   But, it could stil be quite useful.

Oops, I should have read the e2fsprogs 1.36 release notes more closely, so that
I would've known about that.

The filesystem isn't using HTREE.

I'll respond to the rest of your comment later today (maybe in a few minutes,
maybe in a few hours, I'm not sure).

Comment 15 Barry K. Nathan 2005-05-23 22:25:05 UTC
> Can you also send the output of running the debugfs command "stat <870522>"?

Here it is:

debugfs 1.38-WIP (09-May-2005)
debugfs:  stat <870522>
Inode: 870522   Type: regular    Mode:  0264   Flags: 0x0   Generation: 16561874
0
User:   541   Group:   541   Size: 318903
File ACL: 393216    Directory ACL: 0
Links: 2560   Blockcount: 72
Fragment:  Address: 0    Number: 0    Size: 0
ctime: 0x4274eb3e -- Sun May  1 07:44:14 2005
atime: 0x440ccfef -- Mon Mar  6 16:12:31 2006
mtime: 0x4213d0ff -- Wed Feb 16 15:02:23 2005
BLOCKS:
(0):55756363, (3):55754824, (4-5):55756366-55756367, (6):55690576, (7-8):5575636
9-55756370, (9):59292, (10):55756372
TOTAL: 9


Comment 16 Barry K. Nathan 2005-05-23 22:34:27 UTC
Created attachment 114754 [details]
gzipped output of "dumpe2fs" on the raw filesystem image (which crashes attachment #110311 [details])

I just ran "dumpe2fs" (1.38-WIP-0509) with the image's filename and no other
options. Hopefully this file won't be too big to attach to this bug...

Comment 17 Barry K. Nathan 2005-05-24 07:44:20 UTC
I have posted an ext2 filesystem containing a file with a raw filename-scrambled
filesystem image, here (this will make more sense after you read the following
instructions):

http://barryn.ps.uci.edu/e2crash/

Instructions:

1. Download "loopwrap-s.gz" or "loopwrap-s.bz2" from the above site. The bz2
file is about 1-2MB smaller than the gz file, but download whichever one is
better for you. If you don't care, then download the bz2 file to conserve my
bandwidth.

2. Decompress loopwrap-s.* to "loopwrap-s". The decompressed file will be 2.3GB
or so. Make sure not to lose the original compressed file! (Or at least make a
backup copy of the decompressed file before performing steps 3 and 4.)

3. mount -o loop,noatime loopwrap-s /mnt/whereever
(or whatever set of mount options you want)

4. e2fsck -C 0 -f -y /mnt/whereever/superimg-corrupt2-s
(or whatever e2fsck options, etc. you want for reproducing the bug;
superimg-corrupt2-s is the actual corrupted filesystem image)


Basically, what I'm doing here is using an ext2 filesystem ("loopwrap-s")
instead of a tar archive, because tar's -S option causes the file to get
truncated for me (I need to report that somehow at some point, too).

Comment 18 Tim Powers 2005-06-09 12:20:07 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2005-298.html