Bug 1225651
| Summary: | kernel: XFS (dm-5): xfs_swap_extents: inode 0x2b06cdabeb format is incompatible for exchanging | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Orion Poplawski <orion> | ||||||||
| Component: | xfsprogs | Assignee: | Eric Sandeen <esandeen> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | Zorro Lang <zlang> | ||||||||
| Severity: | low | Docs Contact: | |||||||||
| Priority: | unspecified | ||||||||||
| Version: | 6.6 | CC: | dchinner, ddouwsma, eguan, esandeen, espionage724, orion, pasteur, redhat, rhbugs, swhiteho, tlavigne, zlang | ||||||||
| Target Milestone: | rc | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | x86_64 | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | xfsprogs-3.1.1-20.el6 | Doc Type: | If docs needed, set a value | ||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2017-03-21 11:55:15 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | |||||||||||
| Bug Blocks: | 1324930 | ||||||||||
| Attachments: |
|
||||||||||
|
Description
Orion Poplawski
2015-05-27 21:55:28 UTC
As with that other bug, please try: # mount -t debugfs none /sys/kernel/debug # echo 1 > /sys/kernel/debug/tracing/tracing_enabled # echo 1 > /sys/kernel/debug/tracing/events/xfs/xfs_swap_extent_before/enable # echo 1 > /sys/kernel/debug/tracing/events/xfs/xfs_swap_extent_after/enable <run your failing fsr run> # cat /sys/kernel/debug/tracing/trace and add that as an attachment. Created attachment 1031327 [details]
/sys/kernel/debug/tracing/trace
Here you go. Thanks.
ino 0x2b06cdabeb (target), btree format, num_extents 1319, Max in-fork extents 6, broot size 120, fork offset 104
ino 0x2d20b1861b (temp), extent format, num_extents 3, Max in-fork extents 6, broot size 0, fork offset 96
Looks like it's a case where the btree root won't fit into the data fork of the temp inode. It's a while since I've looked at this code in xfs_fsr so I'm not sure if it is supposed to handle this case or not....
-Dave.
This is probably fixed by the xfsprogs commit:
commit 1adfe5c6296d3ea6c182f31a6728fc94af9146f7
Author: Eric Sandeen <sandeen>
Date: Fri Oct 18 22:30:18 2013 +0000
xfs_fsr: fix SWAPEXT failures under selinux
If we run xfs_fsr on a system which creates selinux extended
attributes, the temp file created by xfs_fsr may have a
large-ish local extended attribute as soon as it is created.
If the target file has NON-local extended attributes, it may
have a fork offset larger than the temp file, because i.e.
FMT_EXTENTS attributes take up less space. We currently
have no mechanism to grow the temp file's fork offset.
So in this case, the SWAPEXT ioctl will fail.
(With systems using selinux and lots of xattrs, this becomes
fairly common in the real world.)
After testing the target file for a non-local extent, and
checking to see if the temp forkoff needs to be grown on the
first pass, we can add a large attr to knock all attributes on
the temp file out of local format, and grow the fork offset for
this particular case.
This passes xfstest 227, and also resolves issues seen on
a metadata image provided by Gabriel.
Reported-by: Gabriel VLASIU <gabriel>
Signed-off-by: Eric Sandeen <sandeen>
Reviewed-by: Christoph Hellwig <hch>
Signed-off-by: Rich Johnston <rjohnston>
Orion, are you still seeing these errors? Perhaps you can test with this patch.
Thanks,
-Eric
I'm still seeing this. xfsprogs-3.1.1-16.el6 appears to already contain that patch. Whoops, I'm sorry, I should have known that. :) Too quick on the draw... Ok, I'm not sure at this point what might be going wrong. Getting full information about one of the inodes that fails to defrag: # xfs_db -c "inode $XYZ" -c print dev/whatever # xfs_bmap -v /path/to/file might be enough to sort it out... Thanks, -Eric I'm assuming this would translate to something like: Nov 2 01:01:23 csdisk1 kernel: XFS (dm-5): xfs_swap_extents: inode 0x36000382b6 format is incompatible for exchanging. Nov 2 01:01:23 csdisk1 fsr[7255]: XFS_IOC_SWAPEXT failed: ino=231928464054: Invalid argument Nov 2 01:01:23 csdisk1 fsr[7255]: xfs_fsr startpass 0, endpass 2, time 7282 seconds # xfs_db -c "inode 231928464054" -c print /dev/dm-5 xfs_db: /dev/dm-5 contains a mounted filesystem fatal error -- couldn't initialize XFS library This will need to wait for some downtime then. Thanks. Or alternatively if you wanted to provide an xfs_metadump on a side channel, that might help too. Thanks, -Eric You could try xfs_db -r to open the device readonly; hopefully things are stable enough that it'll be correct info even while mounted. Thanks, -Eric Created attachment 1089893 [details]
xfs command output
xfs_db -c "inode 231928464054" -c print -r /dev/dm-5 > xfs/xfs_db.out
xfs_bmap -v /export/cora3/schecter/vortex_data2/data2/schecter/ystone/mvi2_sdsc/1v/dry/s400nm/27-50/cm1out_000047.nc > xfs/xfs_bmap.out
How's that?
Comment on attachment 1089893 [details]
xfs command output
Thanks, I'll see if i have any luck reproducing it with that info.
\o/ this does the trick: #!/bin/bash rm -f testfile chcon -t file_t . for I in `seq 1319 -1 0`; do OFF=$((I*262144)); fallocate -o $OFF -l 262144 testfile; done chcon -t samba_share_t . sync xfs_fsr testfile Is there anything similarly odd going on w/ your selinux contexts? I needed those tricks to get different fork offsets on the target vs. the tempfile. For the file (inode) that failed, what is the selinux context for it's directory, vs. for the file itself? The test fails on upstream bits too, FWIW. I guess I don't follow what is odd about the selinux contexts. Both file and directory have the same context: drwx------. schecter 520 system_u:object_r:samba_share_t:s0 . drwx------. schecter 520 system_u:object_r:samba_share_t:s0 .. -rw-------. schecter 520 system_u:object_r:samba_share_t:s0 cm1out_000047.nc In fact I think everything on this file system should be that way. Interesting; ok. Well, hopefully the failure I hit will be the same as the failure you hit, and the fix will work in both places ... I get the same before status from tracing, at any rate: xfs_swap_extent_before: dev 8:33 ino 0xc393c625 (target), btree format, num_extents 1320, Max in-fork extents 6, broot size 120, fork offset 104 xfs_swap_extent_before: dev 8:33 ino 0xc393c62d (temp), extent format, num_extents 1, Max in-fork extents 6, broot size 0, fork offset 96 something caused those fork offsets to differ; in my case it was different sized selinux attributes... -Eric Ok, Orion - would you be up for testing a patch?
In fsr_setup_attr_fork() -
Index: xfsprogs-3.1.1/fsr/xfs_fsr.c
===================================================================
--- xfsprogs-3.1.1.orig/fsr/xfs_fsr.c
+++ xfsprogs-3.1.1/fsr/xfs_fsr.c
@@ -1084,7 +1084,7 @@ fsr_setup_attr_fork(
* the temp file's forkoffset when the attr moves out
* of the inode)
*/
- if (diff < 0 && fsx.fsx_nextents > 0) {
+ if (diff < 0) {
char val[2048];
memset(val, 'X', 2048);
if (fsetxattr(tfd, name, val, 2048, 0)) {
There's a bit more to it, but that's the effective change I'm testing here.
To be on the safe side, you could copy the fragmented file first, so you have a backup just to be sure.
Can you add xfs_info for this filesystem as well? Want to double check what size your on-disk inodes are. # xfs_info /dev/mapper/vg_data-cora3
meta-data=/dev/mapper/vg_data-cora3 isize=256 agcount=56, agsize=167772144 blks
= sectsz=512 attr=2, projid32bit=0
data = bsize=4096 blocks=9392880640, imaxpct=5
= sunit=16 swidth=160 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=521728, version=2
= sectsz=512 sunit=16 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
I can test the patch.
If you simply touch a new file in that same directory, does it get the same context as all the other files in that dir? There seems to be some weird selinux interaction going on here; my patch will probably fix it, but I'd like to know how we got into this situation. # cd /export/cora3/schecter/vortex_data2/data2/schecter/ystone/mvi2_sdsc/1v/dry/s400nm/27-50/ # touch newfile # ls -Z newfile Thanks, -Eric yes, as would be expected, no? -rw-r--r--. root root unconfined_u:object_r:samba_share_t:s0 newfile When initially populating the disk (perhaps copying from elsewhere) the contexts may have been different and then changed with chcon -t samba_share_t. Actually seems, like not the same; the dir was: drwx------. schecter 520 system_u:object_r:samba_share_t:s0 . and the target file was: -rw-------. schecter 520 system_u:object_r:samba_share_t:s0 cm1out_000047.nc and the new file is (and the fsr temp file would be): -rw-r--r--. root root unconfined_u:object_r:samba_share_t:s0 newfile unconfined_t vs. system_u so, modulo selinux behavior, mystery solved, and confidence in the patch goes up, good deal. :) (the above means that the temp file will have a larger attribute, and hence a smaller data region, which means that your target file's bmap root can't be swapped into the smaller space available in the temp file...) Thanks, -Eric Ah, yeah so when files are created as root via ssh connection they are unconfined_u. system_u when created say via nfs. Patch seems to checkout fine testing with cm1out_000047.nc. Great, thanks for testing it, I'll get it upstream & into rhel6. Starting a run on the full filesystem and still getting some failures: XFS_IOC_SWAPEXT failed: ino=60649343235: Invalid argument XFS_IOC_SWAPEXT failed: ino=60649343236: Invalid argument XFS_IOC_SWAPEXT failed: ino=60649343237: Invalid argument XFS_IOC_SWAPEXT failed: ino=60649343238: Invalid argument ..... noooo! Ok, let's do the same dance, with the benefit of experience: * For the inode, print it out with xfs_db * enable the tracing, try defragmenting that file, and see what tracing says * show me the selinux context of the target file and its parent dir * create a new file in the same dir & see what context it gets by default Okay, here's one that seems to be sticking (others appear to have disappeared after multiple passes):
XFS_IOC_SWAPEXT failed: ino=125040536861: Invalid argument
# find /export/cora3 -inum 125040536861
/export/cora3/lund/GW/Tides/HonLi_tide_data/UVTOMZ3_20S_50S_0002-01-21.nc
# xfs_fsr /export/cora3/lund/GW/Tides/HonLi_tide_data/UVTOMZ3_20S_50S_0002-01-21.nc
XFS_IOC_SWAPEXT failed: /export/cora3/lund/GW/Tides/HonLi_tide_data/UVTOMZ3_20S_50S_0002-01-21.nc: Invalid argument
# tracer: nop
#
# TASK-PID CPU# TIMESTAMP FUNCTION
# | | | | |
xfs_fsr-9184 [004] 278124.725337: xfs_swap_extent_before: dev 253:5 ino 0x1d1cff2d1d (target), btree format, num_extents 8, Max in-fork extents 6, broot size 40, fork offset 104
xfs_fsr-9184 [004] 278124.725339: xfs_swap_extent_before: dev 253:5 ino 0x1d222eb247 (temp), extent format, num_extents 7, Max in-fork extents 7, broot size 0, fork offset 120
drwxr-xr-x. lund nwra system_u:object_r:samba_share_t:s0 /export/cora3/lund/GW/Tides/HonLi_tide_data
-rw-r--r--. lund nwra system_u:object_r:samba_share_t:s0 /export/cora3/lund/GW/Tides/HonLi_tide_data/UVTOMZ3_20S_50S_0002-01-21.nc
Created attachment 1093451 [details]
125040536861xfs command output for
Hm, in that case you're only going to get from 8 extents to 7 in any case. ;) Is the filesystem mostly full, or is this an extremely large file? But in this case, the target file only has room for 6 extents in the inode, and the "temp" inode has 7, so it can't swap. We can only manipulate the temp inode to make things fit ... let me look a little to see if there's anything that can be done in this case, but I don't think there is. If the temp file had been created with fewer extents, the defrag would have worked. Just as a note, in general whole-fs defragmentation isn't really recommended; it can cause freespace fragmentation (which then might make fsr unable to find contiguous free space to defragment into when you *really* need to, and cause new allocations to be fragmented as well...) There is at least one more set of patches upstream to address this class of problems, so set bug for next release. Sorry for the absence. The file is not particularly big and I think we have plenty or free space (at least in absolute terms):
# ls -lh /export/cora3/lund/GW/Tides/HonLi_tide_data/UVTOMZ3_20S_50S_0002-01-21.nc
-rw-r--r--. 1 lund nwra 221M Aug 1 2015 /export/cora3/lund/GW/Tides/HonLi_tide_data/UVTOMZ3_20S_50S_0002-01-21.nc
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_data-cora3
35T 34T 1.1T 98% /export/cora3
I had come across something on the internet that suggested regular defrag could be good thing, but perhaps not.
I also just ran into a situation where the fsr run *may* have trigged the system to go out do lunch:
Feb 7 23:48:17 csdisk1 fsr[25764]: XFS_IOC_SWAPEXT failed: ino=21926454334: Invalid argument
Feb 7 23:48:46 csdisk1 kernel: XFS (dm-2): xfs_swap_extents: inode 0x25010ffb0e format is incomp
atible for exchanging.
Feb 7 23:48:46 csdisk1 fsr[25764]: XFS_IOC_SWAPEXT failed: ino=158931614478: Invalid argument
Feb 7 23:49:01 csdisk1 fsr[25824]: /export/cora3 start inode=0
that's the last message in /var/log/messages. Console had some messages about hung tasks but that didn't make it into the log. Was able to reboot the system fine. But maybe I'll just drop the regular defrag.
I have to say even if this bug has been fixed, but maybe someone still have chance to hit xfs_swap_extents fails likes this bug (but not): Let's take a look at the attachment from the reporter (I pick up one as example): ------------------------------- xfs_swap_extent_before: dev 253:2 ino 0x156d3 (target), btree format, num_extents 1317, Max in-fork extents 6, broot size 120, fork offset 104 xfs_swap_extent_before: dev 253:2 ino 0x1b1b2c8335 (temp), extent format, num_extents 5, Max in-fork extents 6, broot size 0, fork offset 96 ------------------------------- The target data format is btree, the real broot size is 120 - 24(XFS_BTREE_LBLOCK_LEN) + 4(sizeof(xfs_bmdr_block_t)) = 100. The forkoff is 104. And the forkoff can't less than 100 now. After increase selinux context, the temp's attr become bigger, its forkoff become smaller. As above, the forkoff become to 96. Then xfs_fsr trys to change the aformat from LOCAL to EXTENT (by write 2k attr forcibly), kernel will use xfs_bmap_forkoff_reset() to change the forkoff. xfs_bmap_forkoff_reset will calculate a default forkoff (generally it's 104 if inode size is 256), and compare with the current forkoff (96), if default forkoff > current forkoff, then set the forkoff to the default forkoff, or no change. So after fixed this bug, the temp's forkoff can be change to 104, equal to the target's forkoff. After this, swap_extents will success. But as I metioned, xfs_bmap_forkoff_reset() will reset the forkoff to a default forkoff (which it think it's good enough), or nothing change. So for example, if the target's forkoff is 120, the temp's forkoff is 96, then xfs_bmap_forkoff_reset() reset it to 104. 120 still bigger than 104, xfs_swap_extents will fails. That's the policy of xfs reset the forkoff, except we change it, or this failure is known issue. Thanks, Zorro Zorro, if you have another specific testcase which fails, can you please attach it so I can take a look? We may want to handle it in a separate bug, depending on the details. Thanks, -Eric (In reply to Eric Sandeen from comment #44) > Zorro, if you have another specific testcase which fails, can you please > attach it so I can take a look? We may want to handle it in a separate bug, > depending on the details. > > Thanks, > -Eric Hi Eric, the case I send to xfstests: [PATCH] fstests: xfs_fsr SWAPEXT fails when temp forkoff smaller than target it can reproduce failures on upstream xfsprogs and kernel sometimes. And as I said I don't know if it's a bug. Because xfs_fsr try to move the di_forkoff to attr direction by write 2k attr to try to change LOCAL attr format to EXTENT format: if (diff < 0) { char val[2048]; memset(val, 'X', 2048); if (fsetxattr(tfd, name, val, 2048, 0)) { fsrprintf(_("big ATTR set failed\n")); return -1; } But the truth is change attr from LOCAL to EXTENT maybe not move the di_forkoff or move not enough. Because in kernel use xfs_bmap_forkoff_reset to decide how to move the di_forkoff: STATIC void xfs_bmap_forkoff_reset( xfs_inode_t *ip, int whichfork) { if (whichfork == XFS_ATTR_FORK && ip->i_d.di_format != XFS_DINODE_FMT_DEV && ip->i_d.di_format != XFS_DINODE_FMT_UUID && ip->i_d.di_format != XFS_DINODE_FMT_BTREE) { uint dfl_forkoff = xfs_default_attroffset(ip) >> 3; if (dfl_forkoff > ip->i_d.di_forkoff) ip->i_d.di_forkoff = dfl_forkoff; } } You can see the di_forkoff depand on what dfl_forkoff is, the biggest forkoff (can be reset) is dfl_forkoff = xfs_default_attroffset(ip) >> 3. Then take a look at xfs_default_attroffset(): uint xfs_default_attroffset( struct xfs_inode *ip) { struct xfs_mount *mp = ip->i_mount; uint offset; if (mp->m_sb.sb_inodesize == 256) { offset = XFS_LITINO(mp, ip->i_d.di_version) - XFS_BMDR_SPACE_CALC(MINABTPTRS); } else { offset = XFS_BMDR_SPACE_CALC(6 * MINABTPTRS); } ASSERT(offset < XFS_LITINO(mp, ip->i_d.di_version)); return offset; } If inodesize is 256 bytes, the default forkoff will be (di_version != 3): XFS_LITINO(mp, ip->i_d.di_version) - XFS_BMDR_SPACE_CALC(MINABTPTRS) = (256 - 100) - (16 * 2 + 4) = 120 (sorry I said wrong at above, I said it's 104 ...) dfl_forkoff = 120 >> 3 = 15 So if the target's forkoff is 16, temp's forkoff is 14, we can't move it to 16 (or bigger) by change attr format from LOCAL to EXTENT. If inodesize is 512 bytes, the default forkoff will be (di_version != 3): XFS_BMDR_SPACE_CALC(6 * MINABTPTRS) = 4 + (6 * 2) * (8 + 8) = 196 dfl_forkoff = 196 >> 3 = 24 So if the target's forkoff is 25, temp's forkoff is 20, we can't move it bigger than 24. That's kernel limit, I don't think we need to change that ? Thanks, Zorro Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2017-0791.html I've ran into this message today on Fedora 36 when doing some maintenance on my NAS drive with XFS: [22209.007497] XFS (sdb1): xfs_swap_extents: inode 0x14d30400 format is incompatible for exchanging. I've used XFS primarily since Fedora 26 and haven't seen that message before. The last command I ran on the drive before checking dmesg was "xfs_fsr", and before that was a "xfs_repair". Before that I ran some general permission commands (chmod, chown, restorecon). That inode points to a data file from the game "DJMAX Respect V", which is a single 38GB file: xfs_ncheck -i 349373440 /dev/sdb1 349373440 espionage724/Games/PC/Steam Backups/DJMAX RESPECT V/common/DJMAX RESPECT V/DJMAX RESPECT V_Data/StreamingAssets/Packs/77d437bd47480d01be5b09064f7e9f76 You're commenting on a 7 year old RHEL6 bug that is closed ;) This isn't really a bug; defrag is best effort. in this case, it failed. If you wanted to provide a metadump image for analysis, someone could do that when they had time - but please do so in a new bug, for Fedora. |