Bug 232663

Summary: resize2fs corrupts symlinks with xattrs on big-endian platforms
Product: Red Hat Enterprise Linux 5 Reporter: Bryn M. Reeves <bmr>
Component: e2fsprogsAssignee: Eric Sandeen <esandeen>
Status: CLOSED ERRATA QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 5.0CC: sct
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007-0571 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-11-07 17:14:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 234654    
Attachments:
Description Flags
correct handling of byteswapped i_blocks field in ext2_swap_inode_full
none
reproducer script
none
revised patch to correct symlink byteswapping
none
more fun with byteswapping
none
Correct byteswapping for fast symlinks with xattrs.
none
Test case none

Description Bryn M. Reeves 2007-03-16 15:25:05 UTC
Description of problem:
Using resize2fs to shrink a filesystem that has xattrs and >1 block group causes
 fast symlink targets to be byteswapped on big-endian platforms. For e.g., a
symlink 'crond.8.gz' -> 'cron.8.gz' gets mangled into 'norcg.8.'.

This seems to be down to the following line in
lib/ext2fs/swapfs.c:ext2fs_swap_inode_full():


        if (!islnk || ext2fs_inode_data_blocks(fs, (struct ext2_inode *)t)) {
                for (i = 0; i < EXT2_N_BLOCKS; i++)
                        t->i_block[i] = ext2fs_swab32(f->i_block[i]);
        } else if (t != f) {
                for (i = 0; i < EXT2_N_BLOCKS; i++)
                        t->i_block[i] = f->i_block[i];
        }

This unconditionally uses t ("to") as the source of the inode_data_block count.

In the case of a big-endian system, this is correct for a read operation (where
hostorder = 0) as it causes us to use the value we just byteswapped. For a
symlink this number is normally 0 meaning that we execute the second for loop
and do a simple byte-by-byte copy.

For a write operation (where hostorder = 1) the reverse is true. We use the "to"
value, which has been byteswapped to on-disk order (little-endian). In the case
of a typical symlink with xattrs, i_blocks will have the value 8. Reading this
byteswapped gives us 134217728, causing us to execute the first for loop and
byteswap the symlink target for a fast symlink.

None of this makes any difference on little-endian, since the swap_* operations
are nops.


Version-Release number of selected component (if applicable):
e2fsprogs-1.39-8.el5

How reproducible:
100% with the right filesystem.

The fs needs to contain >1 block group, so for 1k block file systems we need
>=8411 blocks, for 4k block file systems >=32769 blocks.

We also need SELinux enabled to make sure everything has an xattr to ensure that
resize2fs will re-write the inodes.

Finally, there needs to be at least one symlink whose inode will be rewritten
during the resize operation. The problem is much easier to reproduce if there is
also at least one directory other than ./ on the file system.

I have some scripts that reliably recreate this that I will attach.

Steps to Reproduce:
1. mkfs -b 4096 <dev> 32769
2. mount <dev>
3. mkdir /mnt/dir
4. dd if=/dev/zero of=/mnt/file bs=2048 count=1
5. ln -s file symlink
6. umount <dev>
7. e2fsck -fy <dev>
8. resize2fs <dev> 16384

Actual results:
e2fsck, debugfs or mount now show symlink -> 'elif'

Expected results:
The symlink remains pointing at 'file'

Additional info:
Making the choice of using 't' or 'f' for the inode_data_blocks call depend on
hostorder has fixed this problem in all my test cases.

Comment 2 Bryn M. Reeves 2007-03-16 15:30:07 UTC
Created attachment 150239 [details]
correct handling of byteswapped i_blocks field in ext2_swap_inode_full

replace unconditional use of t ("to" conversion) with:

(struct ext2_inode *)(hostorder) ? f : t)

I.e., use the "from" version if the data is in host order and the "to" version
if the data is not in host order.

Comment 3 Bryn M. Reeves 2007-03-16 15:59:15 UTC
Created attachment 150246 [details]
reproducer script

Reproducer script, run as resize2fs.sh <nr blocks> <bsize>, e.g.:

./resize_test2.sh 32769 4096

Comment 4 Bryn M. Reeves 2007-03-16 16:01:26 UTC
Sample output from script in comment #3

[root@z03 ~]# ./resize_test2.sh 32769 4096
*** starting (5372)
*** creating filesystem
mke2fs 1.39 (29-May-2006)
*** populating filesystem
*** grabbing start data
e2image 1.39 (29-May-2006)
*** pre-checking fileystem
e2fsck 1.39 (29-May-2006)
resize2fs 1.39 (29-May-2006)
*** grabbing end data
e2image 1.39 (29-May-2006)
e2fsck 1.39 (29-May-2006)
*** post-checking fileystem
*** Done: fsck PASSED
*** statting symlink
debugfs 1.39 (29-May-2006)
debugfs 1.39 (29-May-2006)
Inode: 13   Type: symlink    Mode:  0777   Flags: 0x0   Generation: 2566520203
User:     0   Group:     0   Size: 4
File ACL: 4635    Directory ACL: 0
Links: 1   Blockcount: 8
Fragment:  Address: 0    Number: 0    Size: 0
ctime: 0x45fabdda -- Fri Mar 16 11:55:06 2007
atime: 0x45fabdda -- Fri Mar 16 11:55:06 2007
mtime: 0x45fabdda -- Fri Mar 16 11:55:06 2007
Fast_link_dest: elif <------- should be "file"



Comment 5 Bryn M. Reeves 2007-03-16 16:45:20 UTC
Btw, this corruption cannot be automatically corrected by e2fsck. The data is
still there but it detects these byteswapped symlinks as invalid and attempts to
reset the type to 0:

Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Symlink /test/symlink-1 (inode #14) is invalid.
Clear? no

Entry 'symlink-1' in /test (12) has an incorrect filetype (was 7, should be 0).
Fix? no

Symlink /test/symlink-2 (inode #16) is invalid.
Clear? no

Entry 'symlink-2' in /test (12) has an incorrect filetype (was 7, should be 0).
Fix? no

Symlink /test/symlink-3 (inode #18) is invalid.
Clear? no

Entry 'symlink-3' in /test (12) has an incorrect filetype (was 7, should be 0).
Fix? no


It should be straightforward to create a special purpose recovery tool if
required since the data is still present on disk.


Comment 6 Bryn M. Reeves 2007-03-16 19:55:47 UTC
The patch in comment #2 breaks the f_clear_xattr check in the e2fsprogs testsuite. 

Comment 7 Bryn M. Reeves 2007-03-16 22:45:50 UTC
Created attachment 150286 [details]
revised patch to correct symlink byteswapping

Comment 8 Bryn M. Reeves 2007-03-16 23:10:57 UTC
The first patch breaks e2fsck, causing it to byteswap i_blocks[0] on the
long-symlink test in f_clear_xattr during pass2. This then causes the inode to
be dealocated as invalid since i_blocks[0] contains an illegal block:

  i_file_acl for inode 15 (/long-symlink) is 23, should be zero.
  Clear? yes

+ Symlink /long-symlink (inode #15) is invalid.
+ Clear? yes
+
  Pass 3: Checking directory connectivity
  Pass 4: Checking reference counts
  Pass 5: Checking group summary information

This happens because of the following code in e2fsck/pass2.c:process_bad_inode():

#ifdef EXT2FS_ENABLE_SWAPFS
                        /*
                         * This is a special kludge to deal with long
                         * symlinks on big endian systems.  i_blocks
                         * had already been decremented earlier in
                         * pass 1, but since i_file_acl hadn't yet
                         * been cleared, ext2fs_read_inode() assumed
                         * that the file was short symlink and would
                         * not have byte swapped i_block[0].  Hence,
                         * we have to byte-swap it here.
                         */
                        if (LINUX_S_ISLNK(inode.i_mode) &&
                            (fs->flags & EXT2_FLAG_SWAP_BYTES) &&
                            (inode.i_blocks == fs->blocksize >> 9))
                                inode.i_block[0] = ext2fs_swab32(inode.i_block[0]);
#endif

This looks like a workaround for the problem fixed by the patch in comment #2.

In pass1, when i_blocks is adjusted ext2fs_write_inode would also assume the
inode is a short symlink and not byteswap, except that because of the bug in
ext2fs_swap_inode_full it does end up swapping on write on big-endian systems.
This then needed to be reversed by the code above.

With the patch in comment #6 the test suite passes on ppc, zSeries and x86_64.
Also tested moving images created with this patch between these systems.


Comment 10 Stephen Tweedie 2007-03-19 13:53:03 UTC
I don't think the code in comment #8 is not a workaround for this problem: it
was intended for something else entirely.  It affects filesystems where we are
clearing xattr/acl blocks, and where we have removed the xattr block from the
file's i_blocks but have not yet cleared i_file_acl.  We use the combination of
the i_blocks and i_file_acl fields to determine whether the symlink is fast or
not: if i_blocks is zero AFTER accounting for the xattr block (as determined by
i_file_acl), THEN the symlink is fast (see ext2fs_inode_data_blocks()).

Of course, the two issues may be interacting here.


Comment 12 Theodore Tso 2007-04-07 13:32:19 UTC
The code in comment #8 does need to be removed after this patch, and it turns
out it is a workaround for the bug, but in a somewhat unexpected way.  It also
turns out the comment in pass2.c wasn't entirely correct.   

The problem is that when we are clearing the extended attribute, after pass #1,
for a long symlink, i_blocks is blocksize/512, while i_file_acl is still
non-zero.   This looks like a short symlink, and so write_inode doesn't byte
swap the inode.  But on a bigendian system, that means the inode's i_blocks[0]
field just got written out in the wrong byte order!  Similarly on read, we don't
byte swap i_blocks[0], so when we read it back in, i_blocks[0] doesn't need to
be byteswapped.

Both before and after this patch, though, we do have the unfortunate situation
that if you ^C e2fsck after the pass #1 i_blocks change, but before the pass #2
i_file_acl zero-out, the long symlink will look like a short symlink with a
garbage destination.  this is bad, but the right fix may be to move the code
that zeros out i_file_acl to pass #1.  The downside of doign this is we won't be
able to print the pathname of the file whose EA block we are removing, but that
might be an acceptable tradeoff.  That's not related to this core bug, however.

Comment 13 Theodore Tso 2007-04-11 16:26:19 UTC
Note: I'm still waiting for a DCO (i.e., Developer's Certification of Origin
Signed-off-by:) before I commit into the e2fsprogs source repository.   I sent a
request for this on Saturday; maybe Bryn is on Easter holiday?

Comment 14 Bryn M. Reeves 2007-04-11 17:23:08 UTC
I'm actually not sure the patch is correct. The code in swapfs.c allows the
byteswapping to happen in place (i.e. f == t) but the patch in comment #7 will
still do the wrong thing here.

Consider the case where t == f and hostorder == 1 on a big endian system.

[...]
        t->i_blocks = ext2fs_swab32(f->i_blocks);
[...]
        if (!islnk || ext2fs_inode_data_blocks(fs,
                        (struct ext2_inode *)(hostorder) ? f : t)) {
                for (i = 0; i < EXT2_N_BLOCKS; i++)
                        t->i_block[i] = ext2fs_swab32(f->i_block[i]);
        } else if (t != f) {

Since t and f are pointers to the same piece of memory it doesn't matter which
pointer we pass to ext2fs_inode_data_blocks - by the time we get there
ext2fs_swab32 will have byteswapped the i_blocks field and we end up with the
same problem; a fast symlink with an xattr will be incorrectly byteswapped if
hostorder is true.

I've not seen problems with this during testing, but from a quick look there
seems to be only one use like this in e2fsprogs right now (in
lib/ext2fs/inode.c:ext2fs_read_inode_full):

#ifdef EXT2FS_ENABLE_SWAPFS
        if ((fs->flags & EXT2_FLAG_SWAP_BYTES) ||
            (fs->flags & EXT2_FLAG_SWAP_BYTES_READ))
                ext2fs_swap_inode_full(fs, (struct ext2_inode_large *) inode,
                                       (struct ext2_inode_large *) inode,
                                       0, length);
#endif

This doesn't trigger a noticable problem here because we are in the read path
(hostorder == 0). The inode will have been swapped in-place but we want to call
ext2fs_inode_data_blocks with the "to" version anyway:

ext2fs_inode_data_blocks(fs, (struct ext2_inode *)(hostorder) ? f : t))

If this was in the write path then this would suffer the same problem.


Comment 15 Bryn M. Reeves 2007-04-11 17:29:41 UTC
Created attachment 152294 [details]
more fun with byteswapping

Treat the i_blocks field in the same way as we treat i_mode; test hostorder and
figure out the number of data blocks either before/after the byte order
conversion as required.

Comment 16 Bryn M. Reeves 2007-04-11 17:34:57 UTC
Created attachment 152295 [details]
Correct byteswapping for fast symlinks with xattrs.

Add missing Signed-off-by header.

Comment 17 Bryn M. Reeves 2007-04-11 17:44:42 UTC
Created attachment 152296 [details]
Test case

Simple test case using libext2fs.

Run as:

rwtest <fs> <inum>

This just does an ext2fs_read_inode_full/ext2fs_write_inode on the inode number
given. On a fast symlink with an xattr on a big endian system this will end up
byteswapping the symlink destination.

I'll try to put together a reasonably small image that demonstrates the problem
with resize2fs for the test suite.

Comment 18 Theodore Tso 2007-04-14 18:02:53 UTC
Patch applied to development sources of e2fsprogs:

http://thunk.org/hg/e2fsprogs/?rev/aa8d65921c89


Comment 21 RHEL Program Management 2007-04-25 20:40:49 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 22 Eric Sandeen 2007-06-22 21:02:15 UTC
Changes committed to e2fsprogs-1.39-9.el5

Comment 28 errata-xmlrpc 2007-11-07 17:14:31 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0571.html