Bug 498860

Summary: xfs_fsr fails to complete
Product: [Fedora] Fedora Reporter: Andrew Potter <agpotter>
Component: kernelAssignee: Eric Sandeen <esandeen>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 11CC: itamar, kernel-maint, quintela
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-06-30 20:59:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
xfs_metadump of the afflicted filesystem none

Description Andrew Potter 2009-05-04 04:44:12 UTC
Description of problem:
I switched to Fedora recently and brought along a couple XFS partitions. When I try to run xfs_fsr on a fragmented file, I get:

# xfs_bmap foo.bar
foo.bar:
0: [0..24751]: 447546640..447571391
1: [24752..39167]: 447588800..447603215

# xfs_fsr -v foo.bar
XFS_IOC_SWAPEXT failed: foo.bar: Invalid argument

# xfs_bmap foo.bar
foo.bar: 
0: [0..24751]: 447546640..447571391
1: [24752..39167]: 447588800..447603215


Alternitavely,

# xfs_fsr -v
START: pass=0 ino=0 /dev/md127 /mnt/Backup
/mnt/Backup start inode=0
ino=152
XFS_IOC_SWAPEXT failed: ino=152: Invalid argument
ino=158
^Cxfs_fsr startpass 0, endpass 0, time 12 seconds




This may be related to kernel bug #12538?

Version-Release number of selected component (if applicable):
Linux localhost.localdomain 2.6.29.1-111.fc11.i686.PAE #1 SMP Fri Apr 24 10:56:23 EDT 2009 i686 i686 i386 GNU/Linux

xfsdump version 3.0.0, arch i586, release 2.fc11
xfsprogs version 3.0.0, arch i586, release 2.fc11

How reproducible:
Always

Steps to Reproduce:
1. Run xfs_fsr on a fragmented file or a filesystem with fragmentation
2. Observe failure.
  
Actual results:
xfs_fsr does not defragment files.

Expected results:
xfs_fsr performs file defragmentation.


Additional info:
When running on an exceptionally fragmented file, xfs_fsr will churn away for some minutes before returning an error message.

Comment 1 Eric Sandeen 2009-05-04 17:28:11 UTC
hrm, thanks for the report; the kernel.org bugzilla you reference *should* be fixed in .29.1 already.

I'll test quickly here on an x86_64 box, if there are no problems I'll retest on x86.

As a quick dedicated testcase does something like this always fail for you?

# for I in `seq 10 -1 0`; do
 dd if=/dev/zero of=fragfile bs=4k count=1 seek=$I conv=notrunc oflag=sync
done

# xfs_fsr fragfile

Thanks,
-Eric

Comment 2 Andrew Potter 2009-05-04 17:59:20 UTC
Oh, the testcase you gave does work:

# xfs_bmap  fragfile 
fragfile:
        0: [0..7]: 25843688..25843695
        1: [8..15]: 25843680..25843687
        2: [16..23]: 25843392..25843399
        3: [24..31]: 25843384..25843391
        4: [32..39]: 25841808..25841815
        5: [40..47]: 25841584..25841591
        6: [48..55]: 25841048..25841055
        7: [56..63]: 25839680..25839687
        8: [64..71]: 25837648..25837655
        9: [72..79]: 25837384..25837391
        10: [80..87]: 25868416..25868423
# xfs_fsr  fragfile 
# xfs_bmap  fragfile 
fragfile:
        0: [0..87]: 25868424..25868511

Comment 3 Eric Sandeen 2009-05-04 18:06:20 UTC
hrm.  Ok... well, we need to find out where that EINVAL is coming from ... would you be willing to run a debug xfs module?

Or maybe if this still fails:

# xfs_fsr -v
START: pass=0 ino=0 /dev/md127 /mnt/Backup
/mnt/Backup start inode=0
ino=152
XFS_IOC_SWAPEXT failed: ino=152: Invalid argument
ino=158
^Cxfs_fsr startpass 0, endpass 0, time 12 seconds

you could do find /mnt/Backup -inum 152

and then point xfs_fsr directly at the file, see if it's consistent there?

Thanks,
-Eric

Comment 4 Andrew Potter 2009-05-04 18:30:52 UTC
Pointing xfs_fsr at the file at inum 152 yields the same error, and no change in xfs_bmap output. 

I can run a debug xfs module, but not until tonight.

Comment 5 Eric Sandeen 2009-05-04 18:46:22 UTC
Ok, thanks.   Could you try one more thing?

# xfs_db /dev/md127
xfs_db> inode 152
xfs_db> p

Thanks,
-Eric

Comment 6 Andrew Potter 2009-05-04 18:53:01 UTC
Here's the xfs_info, FWIW:
# xfs_info /dev/md127
meta-data=/dev/md127             isize=256    agcount=16, agsize=7231006 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=115696096, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=32768, version=1
         =                       sectsz=512   sunit=0 blks, lazy-count=0
realtime =none                   extsz=65536  blocks=0, rtextents=0


And here's the xfs_db output:
# umount /mnt/Backup
# xfs_db /dev/md127
xfs_db> inode 152
xfs_db> p
core.magic = 0x494e
core.mode = 0100644
core.version = 1
core.format = 3 (btree)
core.nlinkv1 = 1
core.uid = 501
core.gid = 501
core.flushiter = 8
core.atime.sec = Tue Dec  4 09:29:31 2007
core.atime.nsec = 976335330
core.mtime.sec = Tue Dec  4 09:29:32 2007
core.mtime.nsec = 476576221
core.ctime.sec = Sat Apr 25 21:43:25 2009
core.ctime.nsec = 811156232
core.size = 190611456
core.nblocks = 46540
core.extsize = 0
core.nextents = 691
core.naextents = 0
core.forkoff = 0
core.aformat = 2 (extents)
core.dmevmask = 0
core.dmstate = 0
core.newrtbm = 0
core.prealloc = 0
core.realtime = 0
core.immutable = 0
core.append = 0
core.sync = 0
core.noatime = 0
core.nodump = 0
core.rtinherit = 0
core.projinherit = 0
core.nosymlinks = 0
core.extsz = 0
core.extszinherit = 0
core.nodefrag = 0
core.filestream = 0
core.gen = 1
next_unlinked = null
u.bmbt.level = 1
u.bmbt.numrecs = 4
u.bmbt.keys[1-4] = [startoff] 1:[0] 2:[12379] 3:[18927] 4:[38558]
u.bmbt.ptrs[1-4] = 1:81683175 2:86159167 3:81724650 4:86424632

Comment 7 Eric Sandeen 2009-05-04 19:29:51 UTC
One other thing you could do if you're willing is to provide an xfs_metadump image of /dev/md127; that way I'd have your exact geometry & metadata layout and I could reproduce it quite easily, I expect.   No file data will be in the dump, but some filenames will remain un-obfuscated, though, in case you have security/privacy concerns.

Comment 8 Andrew Potter 2009-05-04 20:40:32 UTC
Created attachment 342377 [details]
xfs_metadump of the afflicted filesystem

42 MB compressed, ~300MB uncompressed metadata dump.

Comment 9 Eric Sandeen 2009-05-04 21:32:39 UTC
perfect!  Got it, thanks, and reproducing:

[root@inode mnt]# find . -inum 152 -exec xfs_fsr -v {} \;
%';-&/K6eX9m6oS\Ued: ./Qnimd/lRJ9c#2|/TD7XFFF
                    oi~p1_nx'W: Invalid argument

(how's that for obfuscation? ;)

Comment 10 Eric Sandeen 2009-05-04 22:16:29 UTC
Ok, it's failing here in xfs_swap_extents():

        /*
         * If the target has extended attributes, the tmp file
         * must also in order to ensure the correct data fork
         * format.
         */
        if ( XFS_IFORK_Q(ip) != XFS_IFORK_Q(tip) ) {
                error = XFS_ERROR(EINVAL);
                goto error0;
        }

But when fsr sets up the temp file to swap extents with, it just makes a very trivial xattr on the file:

        /* Setup extended attributes */
        if (statp->bs_xflags & XFS_XFLAG_HASATTR) {
                if (fsetxattr(tfd, "user.X", "X", 1, XATTR_CREATE) != 0) {
                        fsrprintf(_("could not set ATTR on tmp: %s:\n"), tname);
                        close(tfd);
                        return -1;
                }
                if (dflag)
                        fsrprintf(_("%s set temp attr\n"), tname);
        }

... hm but in this case it's actually not even doing that.  So this will take a little digging but I can reproduce it, so we'll get it fixed.

Note to self, could be a regression in the bulkstat ioctl handling improperly copying out bs_xflags ...

Thanks,
-Eric

Comment 11 Eric Sandeen 2009-05-04 22:28:53 UTC
further note to self... original ip had di_forkoff 0; tip has forkoff 13

Comment 12 Eric Sandeen 2009-05-05 01:43:41 UTC
Ugh, I know what this is, and it's not really quite a bug per se.  

The way xfs_fsr works is it creates a "donor" file with better layout, copies the data into it, switches the extents between the fragmented file and this new file, then throws away the "donor" file after the fragmented extents have been switched to it.

However in the process, it makes sure that the inode format is the same, including extended attribute layout.

But your filesystem came from an OS without selinux, which adds security xattrs to each file.  So, your new donor file created during fsr got extended attributes from selinux, while the old one does not.  Hence it fails the check in comment #10.

This will demonstrate it by creating a fragmented file on a filesystem w/ no selinux xattrs, then mounting it in a way that new files get selinux labels, and attempt an fsr:

# mkfs.xfs -dfile,name=fsfile,size=32m
# mkdir test
# mount -o loop,context="unconfined_u:object_r:user_tmp_t:s0" fsfile test
# for I in `seq 10 -1 0`; do  dd if=/dev/zero of=test/fragfile bs=4k count=1 seek=$I conv=notrunc oflag=sync; done
# umount test
# mount -o loop fsfile test
# xfs_fsr test/fragfile 
XFS_IOC_SWAPEXT failed: fragfile: Invalid argument

So you have a few ways around this - 

a) run restorecon on the backup mountpoint to give everything labels
b) mount the filesystem with a fs-wide selinux context
c) disable selinux

I'll have to think about whether there is a clean way to handle this in xfs; it may be the sort of thing we just need to caveat....

-Eric

Comment 13 Andrew Potter 2009-05-05 05:09:11 UTC
# restorecon -r /mnt/Backup
Didn't fix it.

# restorecon -Fr /mnt/Backup
Didn't fix it.

# touch /.autorelabel; reboot
Went through a relabeling process while /mnt/Backup was mounted, but didn't solve the problem.

# mount -o context=system_u:object_r:mnt_t:s0 -t xfs /dev/md127 /mnt/Backup
Worked.

Incidentally, I had a second xfs partition (/home). Fedora seemed happy with it after I had done the .autorelabel trick when I initially installed; in fact, xfs_fsr works without changing the context in the mount options.

Thanks for getting to the bottom of this! :)

Comment 14 Eric Sandeen 2009-05-05 12:53:21 UTC
Hm, not sure why restorecon & relabel didn't fix it, I'll have to look into that.  (I don't claim to be the foremost selinux expert).  Mounting with an fs-wide context may not be the most ideal solution.  Anyway, glad you're a bit more fragmentation-free now ;)  Thanks for providing all the debugging info.

-Eric

Comment 15 Eric Sandeen 2009-05-05 15:37:30 UTC
Ok, I guess restorecon won't do the trick because there is no policy for this random mountpoint.

<eparis> there was just a choice made not to relabel things under /mnt.

you could do chcon -R instead:

chcon -R -t mnt_t /mnt/Backup

More info than that, and I'm well out of my selinux league, sorry.

-Eric

Comment 16 Andrew Potter 2009-05-05 16:08:36 UTC
chcon worked, thanks for all your help.

Comment 17 Bug Zapper 2009-06-09 15:05:47 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle.
Changing version to '11'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 18 Eric Sandeen 2009-06-30 20:59:14 UTC
I'm going to close this notabug, because really it's just a strange interaction w/ selinux.

although a man page update that adds the selinux caveat may be in order ...