Bug 211086
Summary: | XFS_WANT_CORRUPTED_GOTO at line 4528 of file fs/xfs/xfs_bmap.c | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Carl-Johan Kjellander <carljohan> |
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | CLOSED DUPLICATE | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 6 | CC: | esandeen, wtogami |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2006-11-20 17:04:18 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Carl-Johan Kjellander
2006-10-17 09:31:01 UTC
The problem persists with kernel-xen-2.6.18-1.2798.fc6. Is this repeatable on a fresh install? I retrieved the filesystem images you reported on the xfs list, and xfs_repair does find that it is corrupted. entry "Packages" at block 0 offset 144 in directory inode 1048704 references free inode 1048708 clearing inode number in entry at offset 144... So, you have filesystem corruption; it's not clear what the cause may be. It is possible that you are bumping into a 4k stacks problem, although xfs is much better in this area these days than it used to be... If this is repeatable on a fresh install, does the situation change if you don't use lvm? also, you mention xen; is this corruption on the host system, or? Well, there is two sides to the problem. 1. The filesystem got corrupted. That install was done with the standard kernel. I have not tried reinstalling with any XFS filesystems and I have migrated the old XFS partitions to ext3 for now. I'll see if I can get around to doing another reinstall on some extra harddrive space. Could I get swraid and LVM inside a guest xen domain? 2. Trying to access the corrupted file triggers an internal XFS error. This happens both with the standard and xen kernel, in host. Haven't tried the guest, but I would imagine that the problem is there as well. I'll see what I can do about testing another install. re: 1) Yep, this looks like a real bug (barring hardware problems...). I'm not a xen expert, but I think you can set up storage the same way...? re: 2) I wouldn't call this a bug, xfs is performing as designed when it encounters this type of corruption. Thanks, -Eric I can confirm that it's 100% reproducible. I got a new harddisk yesterday anyway so thought it would be good to test this one while it was still empty. Steps to reproduce: 1. Start install of FC6test4 with 'linux xfs' 2. Put / on a separate partition with ext3 3. Create two partitions on separate drives >=12GB. 4. Raid them together with RAID1 and LVM volume as filesystem 5. Put 8 GB ext as /usr and 4 GB xfs as /var 6. Do not customize at all. 7. Let install run. 8. After reboot, cat </var/lib/rpm/Packages >/dev/zero cat: -: Input/output error XFS internal error XFS_WANT_CORRUPTED_GOTO at line 4528 of file fs/xfs/xfs_bmap.c. Caller 0xeebce6ba [<c040571a>] dump_trace+0x69/0x1af [<c0405878>] show_trace_log_lvl+0x18/0x2c [<c0405e18>] show_trace+0xf/0x11 [<c0405e47>] dump_stack+0x15/0x17 [<eebb128a>] xfs_bmap_read_extents+0x448/0x462 [xfs] [<eebce6ba>] xfs_iread_extents+0xa0/0xbb [xfs] [<eebae692>] xfs_bmapi+0x23a/0x1f83 [xfs] [<eebd0e1d>] xfs_iomap+0x2e1/0x78d [xfs] [<eebec52e>] __xfs_get_blocks+0x72/0x237 [xfs] [<eebec748>] xfs_get_blocks+0x28/0x2d [xfs] [<c0484fd9>] do_mpage_readpage+0x282/0x5e2 [<c048584c>] mpage_readpages+0xac/0x114 [<c044d06b>] __do_page_cache_readahead+0x124/0x1c8 [<c044d15b>] blockable_page_cache_readahead+0x4c/0x9f [<c044d306>] page_cache_readahead+0xbf/0x196 [<c0447967>] do_generic_mapping_read+0x13d/0x49b [<c044859f>] __generic_file_aio_read+0x18c/0x1d1 [<eebf3b3c>] xfs_read+0x294/0x2fc [xfs] [<eebf07b7>] xfs_file_aio_read+0x70/0x78 [xfs] [<c0465c6a>] do_sync_read+0xc1/0xfb [<c04665ec>] vfs_read+0xa6/0x157 [<c0466a5b>] sys_read+0x41/0x67 [<c0404ea7>] syscall_call+0x7/0xb DWARF2 unwinder stuck at syscall_call+0x7/0xb I could compress this one as well if you want to compare. I actually think 2) is a bug as well. It shouldn't say internal error, it should say that the filesystem is corrupted, but I think it actually might be an internal error. It just happens to do the right thing, to remount as readonly and give io-error. My system is a SMP dual AthlonMP 1800+, but is the installer kernel SMP? It shouldn't matter right? And if there would be something wrong with the harddrives that would be caught by the RAID1, right? anyway, there should be something else showing up in dmesg, and I haven't had any problem with any of these disks. And one is brand new, and the partitions were installed on different physical block from try 1 and try 2. /cjk Thanks for doing that test. smp/nonsmp should not matter, no. And you're right, at this point it doesn't look likely that it is a hardware problem. I'm leaning towards this being a stack corruption issue, xfs over layers of IO subsystems can lead to trouble on 4KSTACKS. Towards the end of the install, can you go to the console & do a dmesg, see if there are any warnings about stack? If you have the time, perhaps you could try a simpler storage geometry to confirm that this is working ok. The kernel truly is encountering on-disk corruption and shutting down; perhaps the message isn't the most clear, but the real bug occurred -before- you got that message, when somehow bad bits got on the disk. This spot in the code is sanity checking the bmap for magic numbers, etc, and finds it to be wrong. Thanks, -Eric I agree that the real bug is in the filesystem corruption, not the weird message. I'll see if I can reproduce the bug in-side a xen guest domain and do more tests since it's my main computer and I really need to use it. The reason I was using xfs in the first place was that it so handy when you have it layered on lvm and raid, you get extra security and you can do xfs_grows when you need more place for /var/html stuff for instance. And now we have the option of encrypted filesystems as well. That's another layer to play with. :) Thanks for doing the tests, I can try to find a box to do some tests on too. In the absence of other data, my best guess is still stack problems with this many layers. And encryption adds another one... FWIW, you can grow an ext3 filesystem, too :) Yeah. But with ext3 I have to do 'init 1' to grow /var. XFS is sooo handy with lvm. I did do dmesg but it didn't see anything but I didn't scroll very far, I'm kicking myself that I didn't save dmesg output somewhere for later. I should give the link to the compressed corrupt filesystem here as well and not only in linux-xfs. http://razor.csbnet.se/varfucked.bz2 Unpack with bzip2 to a 4.00GB big lvm volume. I looked at the file system image and confirmed this is the attr2 bug we are looking at. *** This bug has been marked as a duplicate of 212201 *** |