Bug 212201 - Cannot build sysem with XFS file system.
Cannot build sysem with XFS file system.
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: xfsprogs (Show other bugs)
6
x86_64 Linux
medium Severity urgent
: ---
: ---
Assigned To: Russell Cattelan
:
: 211086 212260 214658 231054 236315 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-10-25 12:59 EDT by Glenn Rottingen
Modified: 2007-11-30 17:11 EST (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-02-09 12:46:23 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
attr2 corruption fix (4.94 KB, patch)
2006-11-21 20:16 EST, Russell Cattelan
no flags Details | Diff

  None (edit)
Description Glenn Rottingen 2006-10-25 12:59:28 EDT
Description of problem:
System fails to read / direcory after install.

Try to use XFS file system for /, EXT3 for BOOT.
Tried this with both LVM and Native.  All files are on sdb.

System will install but access to / mount point fails on boot from HD.
Reinstalled and made sure the xfsprogs was sellected.  

On a Rescue boot I have no problem accessing all the files.
Installed Mandriva 2007 on same FEDORA file system, No problems.

Version-Release number of selected component (if applicable):

FC6, 10/24.  X86.
How reproducible:
Fails every time.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:  Will try i386 version of FC6 next.  Will followup with comment
on results when complete.
Comment 1 Glenn Rottingen 2006-10-25 14:55:26 EDT
Tried i386 release of FC6.  Same problem.  Error message -

I/O Error in file system ("dm-0") metadata dev dm-0 block 0x62200000000
s-trans-read-block error 5 buf cont 4096.

Hope this helps.

Also:  In rescue mode if I chroot to sysimage I cannot run nano. A library is
not found.
If I do not chroot, nano works fine.

I believe XFS support is not in the kernel.
Comment 2 John Holmstadt 2006-10-27 10:47:01 EDT
(It appears that bug 212260 is a dupe of this one)

I get a similar error using XFS on /. Here's my dmesg output...

attempt to access beyond end of device
dm-0: rw=0, want=3350879797256, limit=14352384
I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x30c30000000      
("xfs_trans_read_buf") error 5 buf count 4096
Comment 3 Michael Cronenworth 2006-11-04 03:24:12 EST
I get similar results as well. I had to use a FC5 DVD install to create a
successful XFS partition then use the FC6 DVD and upgrade FC5. Talk about a 2
hour run around. At least I have XFS now.
Comment 4 Marcin Zajaczkowski 2006-11-11 11:55:42 EST
(In reply to comment #1)
> I believe XFS support is not in the kernel.

That's not quite the way I see it.

After first instalation my Fedora booted up with error messages like John wrote.
I used SystemRescueCD to check system logs. mount detected XFS and files were
readable (at least those I checked). xfs_info looked normal:

root@sysresccd /root/tmp % xfs_info /dev/sda6
meta-data=/mnt/temp1             isize=256    agcount=16, agsize=187633 blks
         =                       sectsz=512
data     =                       bsize=4096   blocks=3002128, imaxpct=25
         =                       sunit=0      swidth=0 blks, unwritten=1
naming   =version 2              bsize=4096
log      =internal               bsize=4096   blocks=2560, version=1
         =                       sectsz=512   sunit=0 blks
realtime =none                   extsz=65536  blocks=0, rtextents=0


but xfs_check returned:

root@sysresccd /root % xfs_check /dev/sda6
bad sb version # 0xb094 in ag 0
bad sb version # 0xb084 in ag 1
can't seek in filesystem at bb 24593432576
can't read btree block 16384/0
extent count for ino 4194435 data fork too low (0) for file format
bad nblocks 21 for inode 4194435, counted 1
bad nextents 19 for inode 4194435, counted 0
no . entry for directory 4194435
no .. entry for directory 4194435
/bin/xfs_check: line 62:  4674 Segmentation fault      xfs_db$ISFILE -i -p
xfs_check -c "check$OPTS" $1

I made the second try with the same result.
The third time I formated partition using tools from SystemRescueCD and only
pointed out Anaconda which partitions to use. This allowed me to install Fedora.

It seems installer formats XFS partitions in somehow different way.
Comment 5 John Holmstadt 2006-11-13 11:07:14 EST
I just noticed something odd today when trying to troubleshoot this issue. It
may be a possible workaround.

If you start the install with "linux selinux=0 xfs", the install goes fine (as
usual), and boots up fine as well! I was able to run yum update (something I
wasn't able to do before), and was even able to turn selinux back on. After a
reboot, everything is still working great.

Could this possibly be a symptom of poor application of selinux policies?
Maybe re-applying the policies on a broken system would fix this?
Comment 6 Eric Sandeen 2006-11-15 18:22:32 EST
*** Bug 212260 has been marked as a duplicate of this bug. ***
Comment 7 Eric Sandeen 2006-11-15 18:29:14 EST
Bleah.

We added "-i attr=2" to the mkfs commandline, that seems to have caused the
issue.  It was supposed to be more efficient for selinux.  I should have tested
it.  It used to work.  :(

There should be 2 choices for a workaround:

1) do mkfs.xfs from the shell, don't let anaconda make the filesystem (and don't
use -i attr=2)
or
2) use selinux=0 so that selinux attrs aren't written

either one should get things going... 1) may be inefficient w/ selinux; larger
inode size may help.  2) -seems- to be safe even if you turn selinux back on
later, but this should be verified.

I will investigate what is going on, it seems that we have a corrupt filesystem
when attr2 and selinux is used.

I just did a test w/ an ext3 root fc6 install, make an xfs filesystem with -i
attr=2, copied /usr onto it, selinux attrs and all, and it all seems fine.  Very
odd... will keep looking.

Thanks (and sorry about that!)

-Eric
Comment 8 Eric Sandeen 2006-11-15 23:02:01 EST
By the way, for the same reason that pre-mkfs'ing xfs partitions without the
attr=2 option works around the bug, upgrades from previous versions of Fedora
over pre-existing xfs filesystems should also be safe.
Comment 9 Eric Sandeen 2006-11-15 23:27:54 EST
*** Bug 214658 has been marked as a duplicate of this bug. ***
Comment 10 Eric Sandeen 2006-11-16 11:35:56 EST
This looks like an alignment issue on x86_64, with the on-disk superblock
struct, and the sb_features2 field at the end.

If I look at the on-disk structs for an attr2 filesystem on x86 vs. x86_64, the
features2 flag field (which holds the attr2 flag) is at a different location:

x86:
0c0: 00000000 00000000 00000008 00000000 00000000 00000000 00000000 00000000
                       ^^^^^^^^
vs on x86_64:
0c0: 00000000 00000000 00000000 00000008 00000000 00000000 00000000 00000000   
          
                                ^^^^^^^^

the size of xfs_sb_t comes out differently too:
x86_64:
(gdb) print sizeof(xfs_sb_t)
$1 = 208
x86:
(gdb) print sizeof(xfs_sb_t)
$1 = 204

Ok, at least I see the problem.  Let me sort out the right way to fix this.
Comment 11 Eric Sandeen 2006-11-16 13:33:11 EST
further, this is coming from the superblock translation routines, because
padding at the end of the xfs_sb_struct makes the last field look like 8 bytes
to that routine, when it should be 4.
Comment 12 Eric Sandeen 2006-11-16 16:48:08 EST
Did anyone who hit this bug see it on x86, or were you all on x86_64?
Comment 13 Eric Sandeen 2006-11-16 17:10:10 EST
ok tested it on x86, it's broken there too.  so the padding/alignment is an
issue for cross-arch mounts but something else is going on here.
Comment 14 Chris Weyl 2006-11-16 17:38:50 EST
(In reply to comment #12)
> Did anyone who hit this bug see it on x86, or were you all on x86_64?

A little belated, but I hit it on x86.
Comment 15 Russell Cattelan 2006-11-20 10:59:32 EST
Just a quick update:
We have tracked down the problem to a bug in the attr2 code in XFS.

When an inode is converted from one form to another (local -> extents -> btree)
there is a location the points to how much space can be used to store the data
meta data and how much space can be used to store the attribute meta data.

The problem only appears to happen with a specfic sequence of directory filling
and adding attributes.

We are working on solving the problem along with the team at SGI hopefully we
will have a fix soon.
Comment 16 Russell Cattelan 2006-11-20 12:04:44 EST
*** Bug 211086 has been marked as a duplicate of this bug. ***
Comment 17 Russell Cattelan 2006-11-21 20:16:55 EST
Created attachment 141859 [details]
attr2 corruption fix

This patch is not completly optimal for attr2 but it does keep
things from corrupting.
Comment 18 Marcin Zajaczkowski 2006-11-26 06:39:11 EST
Maybe little OT, but my first two instalation (with XFS made by anaconda) took
about *1,5h* each (Amilo Pro V8010 with Intel Pentium M 1,7GHz). The third (with
file system made by mkfs.xfs from SystemRescueCD with default options) took over
*two* hours in spite of it was with the same (or at least very similar) options. 
What is more I have impression that there is something wrong with my harddisk
performance. This is especially visible when I make several packets
actualization with pup - update stage (already downloaded packages) of even not
very big packet can take awhile (in minutes).

It is possible that it could be caused by manual disc formating (possible with
not optimal options for SELinux and Fedora itself)?
Comment 19 Eric Sandeen 2006-11-26 09:19:22 EST
re: comment #18, was this an initial install or an upgrade?

I have seen the rpm databases get -extremely- fragmented during an upgrade, and
this made the install take -forever-

If an upgrade, you chould check /var/lib/rpm/* with xfs_bmap, see how many
extents, and use xfs_fsr to defrag if necessary.
Comment 20 Marcin Zajaczkowski 2006-11-26 15:36:04 EST
It was a fresh installation.

I have checked those files the biggest one - Packages has 133 extents, Basenames
- 93, Filemd5s - 71. Others have less extends. I don't have comparision, but
this isn't probably large values.

I wanted to make defragmentation, but (maybe it's a silly problem) I wasn't able
to find xfs_fsr. It's absent in xfsprogs package. Even "yum search xfs_fsr"
returns only libattr. Am I missing something?


Sample update operation (RPMs from "Fedora Updates" - 14MB):
[QUOTE]
[root@bolger rpm]# time rpm -Uhv gaim-2.0.0-0.22.beta5.fc6.i386.rpm
ImageMagick-6.2.8.0-3.fc6.1.i386.rpm rhythmbox-0.9.5-7.
fc6.i386.rpm cyrus-sasl-md5-2.1.22-4.i386.rpm 
Preparing...                ########################################### [100%]
   1:cyrus-sasl-md5         ########################################### [ 25%]
   2:gaim                   ########################################### [ 50%]
   3:ImageMagick            ########################################### [ 75%]
   4:rhythmbox              ########################################### [100%]

real    2m37.997s
user    0m8.165s
sys     0m3.722s
[/QUOTE]

I'll try to make similar operation on a corensponding machine with FC5, but it
can take a few days. Do you think that operation should take 2,5 minutes?
Comment 21 Eric Sandeen 2006-12-21 11:33:35 EST
With some help from Russell & myself, the xfs guys have committed a patch to
address this issue, see http://oss.sgi.com/archives/xfs/2006-12/msg00210.html

We'll try to get that into an FC6 update kernel soon.

Once the fix gets into FC6, the proper way to get xfs up & running with selinux
and other extended attributes would be to install with selinux=0, and do not add
any extended attributes (beagle, selinux, etc) to your xfs filesystems until you
have upgraded to this fixed kernel.  Post-upgrade, you should be able to use
xattrs again to your heart's content.

Thanks,

-Eric
Comment 22 Marcin Zajaczkowski 2006-12-22 14:31:13 EST
(In reply to comment #20)
(...)
> Sample update operation (RPMs from "Fedora Updates" - 14MB):
> [QUOTE]
> [root@bolger rpm]# time rpm -Uhv gaim-2.0.0-0.22.beta5.fc6.i386.rpm
> ImageMagick-6.2.8.0-3.fc6.1.i386.rpm rhythmbox-0.9.5-7.
> fc6.i386.rpm cyrus-sasl-md5-2.1.22-4.i386.rpm 
(...)
> real    2m37.997s
> user    0m8.165s
> sys     0m3.722s
> [/QUOTE]
> 
> I'll try to make similar operation on a corensponding machine with FC5, but it
> can take a few days. Do you think that operation should take 2,5 minutes?

Promised benchmark (the same machine with reinstalled system):

Fresh system (just after instalation):

[root@bolger rpm]# time rpm -Uhv cyrus-sasl-md5-2.1.22-4.i386.rpm
ImageMagick-6.2.8.0-3.fc6.1.i386.rpm rhythmbox-0.9.5-7.fc6.i386.rpm
gaim-2.0.0-0.22.beta5.fc6.i386.rpm 
warning: cyrus-sasl-md5-2.1.22-4.i386.rpm: Header V3 DSA signature: NOKEY, key
ID 4f2a6fd2
Preparing...                ########################################### [100%]
   1:ImageMagick            ########################################### [ 25%]
   2:cyrus-sasl-md5         ########################################### [ 50%]
   3:gaim                   ########################################### [ 75%]
   4:rhythmbox              ########################################### [100%]

real    0m41.841s
user    0m4.900s
sys     0m1.472s


After few months of intensive use.

[root@bolger rpm]# time rpm -Uhv ImageMagick-6.2.8.0-3.fc6.1.i386.rpm
rhythmbox-0.9.5-7.fc6.i386.rpm gaim-2.0.0-0.22.beta5.fc6.i386.rpm 
Preparing...                ########################################### [100%]
   1:gaim                   ########################################### [ 33%]
   2:ImageMagick            ########################################### [ 67%]
   3:rhythmbox              ########################################### [100%]

real    1m1.898s
user    0m5.082s
sys     0m1.696s


Significant faster than 2:37 on a fresh system last time.

Due to very bad performance and reinstalled my already configured system again
and chose selinux=0 in installer. Instalation took *45 minutes* (in comparison
to 1,5 and over 2 hours).
System runs very smoothly. Part of speed up can be caused by disabled selinux,
but there had to be something with my filesystem (made by SystemRescueCD for
Fedora with enabled selinux). Maybe it because of old xfs_progs in SRCd.
Comment 23 Eric Sandeen 2006-12-22 15:42:27 EST
Marcin, can you open a new bug if you feel there is some performance problem,
this one was originally opened for an attr2 corruption problem and the perf
stuff is a bit of a tangent.

No guarantees on when it will get addressed though - you might have better luck
on the oss.sgi.com bugzilla.

Thanks,

-Eric
Comment 24 Glenn Rottingen 2007-01-16 10:20:09 EST
I have built a new i386 distribution of FC6 1/15/2008 that has all the most
recent code. (pungi) It still has the same problem with XFS.  Need to start
install with selinux=0 to get XFS up and running.  Setting selunix to enforcing
after first boot has no problems.

This problem has not been resolved.  Please contact me for additional information.
Comment 25 Russell Cattelan 2007-01-25 16:12:21 EST
Sorry about the delay in fixing this still trying to track down
why the patch has not been applied to the kernel tree.
Comment 26 Glenn Rottingen 2007-02-03 13:29:31 EST
Created a new Build 2/1/2007.
Linux localhost.localdomain 2.6.19-1.2895.fc6xen #1 SMP Wed Jan 10 19:09:13 EST
2007 x86_64 x86_64 x86_64 GNU/Linux

This kernel tree does not have the fix yet.

When the next new kernel is available I will try the build again.

selinux=0 is still the workaround. 
Comment 27 Russell Cattelan 2007-02-09 12:46:23 EST
The patch has been committed to the fedora devel kernel.

Closing as fixed at this point.
Comment 28 Chris Weyl 2007-02-09 19:36:05 EST
So, just to be pedantic :) , we'll see the fix in the next FC-6 kernel...?  The
close status seems to indicate that the fix will only be in F7+.
Comment 29 Glenn Rottingen 2007-02-15 11:07:32 EST
Fixed.

New Pungi build 2.14.2007.

Linux localhost.localdomain 2.6.19-1.2911.fc6xen #1 SMP Sat Feb 10 15:34:39 EST
2007 x86_64 x86_64 x86_64 GNU/Linux

Selinux=0 is nolonger needed.
Comment 30 kelsey hudson 2007-03-12 17:27:44 EDT
*** Bug 231054 has been marked as a duplicate of this bug. ***
Comment 31 Eric Sandeen 2007-05-01 10:57:29 EDT
*** Bug 236315 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.