Bug 232663
Summary: | resize2fs corrupts symlinks with xattrs on big-endian platforms | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Bryn M. Reeves <bmr> |
Component: | e2fsprogs | Assignee: | Eric Sandeen <esandeen> |
Status: | CLOSED ERRATA | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 5.0 | CC: | sct |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | RHBA-2007-0571 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2007-11-07 17:14:31 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 234654 | ||
Attachments: |
Created attachment 150239 [details]
correct handling of byteswapped i_blocks field in ext2_swap_inode_full
replace unconditional use of t ("to" conversion) with:
(struct ext2_inode *)(hostorder) ? f : t)
I.e., use the "from" version if the data is in host order and the "to" version
if the data is not in host order.
Created attachment 150246 [details]
reproducer script
Reproducer script, run as resize2fs.sh <nr blocks> <bsize>, e.g.:
./resize_test2.sh 32769 4096
Sample output from script in comment #3 [root@z03 ~]# ./resize_test2.sh 32769 4096 *** starting (5372) *** creating filesystem mke2fs 1.39 (29-May-2006) *** populating filesystem *** grabbing start data e2image 1.39 (29-May-2006) *** pre-checking fileystem e2fsck 1.39 (29-May-2006) resize2fs 1.39 (29-May-2006) *** grabbing end data e2image 1.39 (29-May-2006) e2fsck 1.39 (29-May-2006) *** post-checking fileystem *** Done: fsck PASSED *** statting symlink debugfs 1.39 (29-May-2006) debugfs 1.39 (29-May-2006) Inode: 13 Type: symlink Mode: 0777 Flags: 0x0 Generation: 2566520203 User: 0 Group: 0 Size: 4 File ACL: 4635 Directory ACL: 0 Links: 1 Blockcount: 8 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x45fabdda -- Fri Mar 16 11:55:06 2007 atime: 0x45fabdda -- Fri Mar 16 11:55:06 2007 mtime: 0x45fabdda -- Fri Mar 16 11:55:06 2007 Fast_link_dest: elif <------- should be "file" Btw, this corruption cannot be automatically corrected by e2fsck. The data is still there but it detects these byteswapped symlinks as invalid and attempts to reset the type to 0: Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Symlink /test/symlink-1 (inode #14) is invalid. Clear? no Entry 'symlink-1' in /test (12) has an incorrect filetype (was 7, should be 0). Fix? no Symlink /test/symlink-2 (inode #16) is invalid. Clear? no Entry 'symlink-2' in /test (12) has an incorrect filetype (was 7, should be 0). Fix? no Symlink /test/symlink-3 (inode #18) is invalid. Clear? no Entry 'symlink-3' in /test (12) has an incorrect filetype (was 7, should be 0). Fix? no It should be straightforward to create a special purpose recovery tool if required since the data is still present on disk. The patch in comment #2 breaks the f_clear_xattr check in the e2fsprogs testsuite. Created attachment 150286 [details]
revised patch to correct symlink byteswapping
The first patch breaks e2fsck, causing it to byteswap i_blocks[0] on the long-symlink test in f_clear_xattr during pass2. This then causes the inode to be dealocated as invalid since i_blocks[0] contains an illegal block: i_file_acl for inode 15 (/long-symlink) is 23, should be zero. Clear? yes + Symlink /long-symlink (inode #15) is invalid. + Clear? yes + Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information This happens because of the following code in e2fsck/pass2.c:process_bad_inode(): #ifdef EXT2FS_ENABLE_SWAPFS /* * This is a special kludge to deal with long * symlinks on big endian systems. i_blocks * had already been decremented earlier in * pass 1, but since i_file_acl hadn't yet * been cleared, ext2fs_read_inode() assumed * that the file was short symlink and would * not have byte swapped i_block[0]. Hence, * we have to byte-swap it here. */ if (LINUX_S_ISLNK(inode.i_mode) && (fs->flags & EXT2_FLAG_SWAP_BYTES) && (inode.i_blocks == fs->blocksize >> 9)) inode.i_block[0] = ext2fs_swab32(inode.i_block[0]); #endif This looks like a workaround for the problem fixed by the patch in comment #2. In pass1, when i_blocks is adjusted ext2fs_write_inode would also assume the inode is a short symlink and not byteswap, except that because of the bug in ext2fs_swap_inode_full it does end up swapping on write on big-endian systems. This then needed to be reversed by the code above. With the patch in comment #6 the test suite passes on ppc, zSeries and x86_64. Also tested moving images created with this patch between these systems. I don't think the code in comment #8 is not a workaround for this problem: it was intended for something else entirely. It affects filesystems where we are clearing xattr/acl blocks, and where we have removed the xattr block from the file's i_blocks but have not yet cleared i_file_acl. We use the combination of the i_blocks and i_file_acl fields to determine whether the symlink is fast or not: if i_blocks is zero AFTER accounting for the xattr block (as determined by i_file_acl), THEN the symlink is fast (see ext2fs_inode_data_blocks()). Of course, the two issues may be interacting here. The code in comment #8 does need to be removed after this patch, and it turns out it is a workaround for the bug, but in a somewhat unexpected way. It also turns out the comment in pass2.c wasn't entirely correct. The problem is that when we are clearing the extended attribute, after pass #1, for a long symlink, i_blocks is blocksize/512, while i_file_acl is still non-zero. This looks like a short symlink, and so write_inode doesn't byte swap the inode. But on a bigendian system, that means the inode's i_blocks[0] field just got written out in the wrong byte order! Similarly on read, we don't byte swap i_blocks[0], so when we read it back in, i_blocks[0] doesn't need to be byteswapped. Both before and after this patch, though, we do have the unfortunate situation that if you ^C e2fsck after the pass #1 i_blocks change, but before the pass #2 i_file_acl zero-out, the long symlink will look like a short symlink with a garbage destination. this is bad, but the right fix may be to move the code that zeros out i_file_acl to pass #1. The downside of doign this is we won't be able to print the pathname of the file whose EA block we are removing, but that might be an acceptable tradeoff. That's not related to this core bug, however. Note: I'm still waiting for a DCO (i.e., Developer's Certification of Origin Signed-off-by:) before I commit into the e2fsprogs source repository. I sent a request for this on Saturday; maybe Bryn is on Easter holiday? I'm actually not sure the patch is correct. The code in swapfs.c allows the byteswapping to happen in place (i.e. f == t) but the patch in comment #7 will still do the wrong thing here. Consider the case where t == f and hostorder == 1 on a big endian system. [...] t->i_blocks = ext2fs_swab32(f->i_blocks); [...] if (!islnk || ext2fs_inode_data_blocks(fs, (struct ext2_inode *)(hostorder) ? f : t)) { for (i = 0; i < EXT2_N_BLOCKS; i++) t->i_block[i] = ext2fs_swab32(f->i_block[i]); } else if (t != f) { Since t and f are pointers to the same piece of memory it doesn't matter which pointer we pass to ext2fs_inode_data_blocks - by the time we get there ext2fs_swab32 will have byteswapped the i_blocks field and we end up with the same problem; a fast symlink with an xattr will be incorrectly byteswapped if hostorder is true. I've not seen problems with this during testing, but from a quick look there seems to be only one use like this in e2fsprogs right now (in lib/ext2fs/inode.c:ext2fs_read_inode_full): #ifdef EXT2FS_ENABLE_SWAPFS if ((fs->flags & EXT2_FLAG_SWAP_BYTES) || (fs->flags & EXT2_FLAG_SWAP_BYTES_READ)) ext2fs_swap_inode_full(fs, (struct ext2_inode_large *) inode, (struct ext2_inode_large *) inode, 0, length); #endif This doesn't trigger a noticable problem here because we are in the read path (hostorder == 0). The inode will have been swapped in-place but we want to call ext2fs_inode_data_blocks with the "to" version anyway: ext2fs_inode_data_blocks(fs, (struct ext2_inode *)(hostorder) ? f : t)) If this was in the write path then this would suffer the same problem. Created attachment 152294 [details]
more fun with byteswapping
Treat the i_blocks field in the same way as we treat i_mode; test hostorder and
figure out the number of data blocks either before/after the byte order
conversion as required.
Created attachment 152295 [details]
Correct byteswapping for fast symlinks with xattrs.
Add missing Signed-off-by header.
Created attachment 152296 [details]
Test case
Simple test case using libext2fs.
Run as:
rwtest <fs> <inum>
This just does an ext2fs_read_inode_full/ext2fs_write_inode on the inode number
given. On a fast symlink with an xattr on a big endian system this will end up
byteswapping the symlink destination.
I'll try to put together a reasonably small image that demonstrates the problem
with resize2fs for the test suite.
Patch applied to development sources of e2fsprogs: http://thunk.org/hg/e2fsprogs/?rev/aa8d65921c89 This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Changes committed to e2fsprogs-1.39-9.el5 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0571.html |
Description of problem: Using resize2fs to shrink a filesystem that has xattrs and >1 block group causes fast symlink targets to be byteswapped on big-endian platforms. For e.g., a symlink 'crond.8.gz' -> 'cron.8.gz' gets mangled into 'norcg.8.'. This seems to be down to the following line in lib/ext2fs/swapfs.c:ext2fs_swap_inode_full(): if (!islnk || ext2fs_inode_data_blocks(fs, (struct ext2_inode *)t)) { for (i = 0; i < EXT2_N_BLOCKS; i++) t->i_block[i] = ext2fs_swab32(f->i_block[i]); } else if (t != f) { for (i = 0; i < EXT2_N_BLOCKS; i++) t->i_block[i] = f->i_block[i]; } This unconditionally uses t ("to") as the source of the inode_data_block count. In the case of a big-endian system, this is correct for a read operation (where hostorder = 0) as it causes us to use the value we just byteswapped. For a symlink this number is normally 0 meaning that we execute the second for loop and do a simple byte-by-byte copy. For a write operation (where hostorder = 1) the reverse is true. We use the "to" value, which has been byteswapped to on-disk order (little-endian). In the case of a typical symlink with xattrs, i_blocks will have the value 8. Reading this byteswapped gives us 134217728, causing us to execute the first for loop and byteswap the symlink target for a fast symlink. None of this makes any difference on little-endian, since the swap_* operations are nops. Version-Release number of selected component (if applicable): e2fsprogs-1.39-8.el5 How reproducible: 100% with the right filesystem. The fs needs to contain >1 block group, so for 1k block file systems we need >=8411 blocks, for 4k block file systems >=32769 blocks. We also need SELinux enabled to make sure everything has an xattr to ensure that resize2fs will re-write the inodes. Finally, there needs to be at least one symlink whose inode will be rewritten during the resize operation. The problem is much easier to reproduce if there is also at least one directory other than ./ on the file system. I have some scripts that reliably recreate this that I will attach. Steps to Reproduce: 1. mkfs -b 4096 <dev> 32769 2. mount <dev> 3. mkdir /mnt/dir 4. dd if=/dev/zero of=/mnt/file bs=2048 count=1 5. ln -s file symlink 6. umount <dev> 7. e2fsck -fy <dev> 8. resize2fs <dev> 16384 Actual results: e2fsck, debugfs or mount now show symlink -> 'elif' Expected results: The symlink remains pointing at 'file' Additional info: Making the choice of using 't' or 'f' for the inode_data_blocks call depend on hostorder has fixed this problem in all my test cases.