Bug 146284
Description
Denis Charland
2005-01-26 18:23:29 UTC
Error 2 is ENOENT; has /sbin/init really disappeared? Trying to reproduce. I booted in rescue mode with FC3 installation CD. /sbin/init is still there. All the files on / are there but the date for a lot of files is Jan 1 1970. It seems that these files have their attributes corrupted. OK, I've reproduced this (using Xen, it made it much easier!) The problem is that when e2fsck is removing xattrs, if it finds a symlink, it removes the inode entirely. Will investigate further. I haven't observed any timestamp problems so far. No other files appear to have been damaged in the process, but all symlinks have been destroyed entirely. Curiously, "debugfs" can see the directory entries: debugfs: ls bin/ 49153 (12) . 2 (12) .. 0 (24) dnsdomainname 49154 (16) mktemp 49155 (12) bash 0 (12) sh 49157 (12) sed 0 (12) awk ... dnsdomainname, sh and awk here were all symlinks before, but they have all now got an inode number of 0 (hence they are effectively empty dirents now.) I just verified this with the e2fsck from this morning's bitkeeper tree. The problem is still there. I suspect that the problem is due to us failing to identify fast symlinks properly when we are clearing out i_file_acl fields when the feature flag is missing. And indeed, the fsck log contains a lot of: Symlink /usr/bin/rdistd (inode #70910) is invalid. etc. Hmm --- for most of the inodes on the system, the e2fsck run produces output like: # grep -w 72550 /tmp/e2fsck-remove-xattrs.log Inode 72550, i_blocks is 80, should be 72. Fix? yes i_file_acl for inode 72550 (/usr/bin/shred) is 295752, should be zero. But for symlinks we get: # grep -w 49156 /tmp/e2fsck-remove-xattrs.log Inode 49156, i_blocks is 8, should be 0. Fix? yes Symlink /bin/sh (inode #49156) is invalid. So the "i_file_acl ... should be zero" line is missing for symlinks. Looks like that part of the cleanup is not being applied to symlinks. And indeed, we don't seem to be clearing that field: debugfs: stat <49156> Inode: 49156 Type: symlink Mode: 0777 Flags: 0x0 Generation: 2875386425 User: 0 Group: 0 Size: 4 File ACL: 295752 Directory ACL: 0 ... BLOCKS: (0):1752392034 TOTAL: 1 so debugfs too is incapable of recognising this inode as a fast symlink, despite it having 4 blocks and an ACL. I've got a preliminary fix --- working out the details with Ted Ts'o. Re Comment #2 File dates in /usr also appear as Jan 1 1970 on a fresh FC3 installation when booting in rescue mode. The file date attributes are not corrupted by e2fsck as I initially tought. It's probably the way the root filesystem is mounted in rescue mode. /usr appears as a symbolic link to /mnt/runtime/usr. Created attachment 110311 [details]
Patch committed to e2fsprogs BK repository to fix problem
See also additional file for tests/f_clear_xattr/image.gz (which is a binary
file, so it's not in the context diff).
Created attachment 110312 [details] To be installed as tests/f_clear_xattr/image.gz after applying patch found in attachment #110311 [details] Created attachment 110362 [details]
Additional patch needed to fix corner case on big-endian systems.
Note that there are some other test cases that need to be updated in order for
the regression test suite to pass completely on big and little endian systems,
but they are just test case updates. This one is an actual bug that needs to
be fixed. Please see the e2fsprogs bk repository for the other test case
updates.
I downloaded e2fsprogs-1.36-rc5, compiled and installed the new utilities. I rebooted FC3 in single-user mode, umounted all filesystems, ran debugfs and e2fsck on all filesystems to remove both resize_inode and ext_attr features. Everything worked properly. As far as I'm concerned, the bug has been fixed. Thanks. Attachment 110311 [details] (the main patch to fix this bug) is causing an e2fsck segfault on a corrupted filesystem of mine. e2fsck 1.36-rc4 works fine, but 1.36-rc4 + attachment 110311 [details] segfaults. Any newer version of e2fsck (up to 1.38-WIP-0509, which is the latest as of this writing AFAIK) segfaults too. I'm not sure if I can hand over the raw e2image of the filesystem that I'm using to reproduce the segfault (the filenames might violate the privacy of several dozen people). However, here's the last few lines of output when I run e2fsck 1.36-rc4 + attachment 110311 [details] inside gdb, as well as a backtrace: ---begin quote--- Inode 870493 ref count is 3332, should be 1. Fix? yes i_file_acl for inode 870522 (...) is 393216, should be zero. Clear? yes Program received signal SIGSEGV, Segmentation fault. ext2fs_unmark_generic_bitmap (bitmap=0x0, bitno=870522) at gen_bitmap.c:43 43 if ((bitno < bitmap->start) || (bitno > bitmap->end)) { (gdb) bt #0 ext2fs_unmark_generic_bitmap (bitmap=0x0, bitno=870522) at gen_bitmap.c:43 #1 0x08052412 in e2fsck_process_bad_inode (ctx=0x80dc2a0, dir=0, ino=870522, buf=0x8102740 "") at bitops.h:529 #2 0x08055695 in e2fsck_pass4 (ctx=0x80dc2a0) at pass4.c:138 #3 0x0804b46b in e2fsck_run (ctx=0x80dc2a0) at e2fsck.c:193 #4 0x08049e48 in main (argc=6, argv=0xbf96d554) at unix.c:1105 (gdb) ---end quote--- Is there anything else I can do to help fix this problem? Can you send the dumpe2fs output for your corrupted filesystem? Can you also send the output of running the debugfs command "stat <870522>"? Also, with newer versions of e2fsprogs e2image has a new -s option which will scramble the directory listings. This causes problems if HTREE is enabled, and if this is a fast symlink problem it might hide the issue or cause it to change. But, it could stil be quite useful. > Also, with newer versions of e2fsprogs e2image has a new -s option which will
> scramble the directory listings. This causes problems if HTREE is enabled, and
> if this is a fast symlink problem it might hide the issue or cause it to change.
> But, it could stil be quite useful.
Oops, I should have read the e2fsprogs 1.36 release notes more closely, so that
I would've known about that.
The filesystem isn't using HTREE.
I'll respond to the rest of your comment later today (maybe in a few minutes,
maybe in a few hours, I'm not sure).
> Can you also send the output of running the debugfs command "stat <870522>"?
Here it is:
debugfs 1.38-WIP (09-May-2005)
debugfs: stat <870522>
Inode: 870522 Type: regular Mode: 0264 Flags: 0x0 Generation: 16561874
0
User: 541 Group: 541 Size: 318903
File ACL: 393216 Directory ACL: 0
Links: 2560 Blockcount: 72
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x4274eb3e -- Sun May 1 07:44:14 2005
atime: 0x440ccfef -- Mon Mar 6 16:12:31 2006
mtime: 0x4213d0ff -- Wed Feb 16 15:02:23 2005
BLOCKS:
(0):55756363, (3):55754824, (4-5):55756366-55756367, (6):55690576, (7-8):5575636
9-55756370, (9):59292, (10):55756372
TOTAL: 9
Created attachment 114754 [details] gzipped output of "dumpe2fs" on the raw filesystem image (which crashes attachment #110311 [details]) I just ran "dumpe2fs" (1.38-WIP-0509) with the image's filename and no other options. Hopefully this file won't be too big to attach to this bug... I have posted an ext2 filesystem containing a file with a raw filename-scrambled filesystem image, here (this will make more sense after you read the following instructions): http://barryn.ps.uci.edu/e2crash/ Instructions: 1. Download "loopwrap-s.gz" or "loopwrap-s.bz2" from the above site. The bz2 file is about 1-2MB smaller than the gz file, but download whichever one is better for you. If you don't care, then download the bz2 file to conserve my bandwidth. 2. Decompress loopwrap-s.* to "loopwrap-s". The decompressed file will be 2.3GB or so. Make sure not to lose the original compressed file! (Or at least make a backup copy of the decompressed file before performing steps 3 and 4.) 3. mount -o loop,noatime loopwrap-s /mnt/whereever (or whatever set of mount options you want) 4. e2fsck -C 0 -f -y /mnt/whereever/superimg-corrupt2-s (or whatever e2fsck options, etc. you want for reproducing the bug; superimg-corrupt2-s is the actual corrupted filesystem image) Basically, what I'm doing here is using an ext2 filesystem ("loopwrap-s") instead of a tar archive, because tar's -S option causes the file to get truncated for me (I need to report that somehow at some point, too). An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2005-298.html |