150427 – XFS internal error xfs_alloc_read_agf

Bug 150427 - XFS internal error xfs_alloc_read_agf

Summary: XFS internal error xfs_alloc_read_agf

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	3
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Dave Jones
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2005-03-06 11:40 UTC by Daniel Tschan
Modified:	2015-01-04 22:17 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2005-07-30 00:56:21 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Daniel Tschan 2005-03-06 11:40:05 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050224 Firefox/1.0.1 Fedora/1.0.1-1.3.1

Description of problem:
I encountered the following error while working with a XFS filesystem:

Mar  3 00:40:52 tschan1 kernel: 0x0: 58 41 47 46 00 00 00 01 00 00 00 03 00 2e 4d 21
Mar  3 00:40:52 tschan1 kernel: Filesystem "md1": XFS internal error xfs_alloc_read_agf at line 2195 of file fs/xfs/xfs_alloc.c.  Caller 0xf91ca8b3
Mar  3 00:40:52 tschan1 kernel:  [<f91caef8>] xfs_alloc_read_agf+0x135/0x1d8 [xfs]
Mar  3 00:40:52 tschan1 kernel:  [<f91ca8b3>] xfs_alloc_fix_freelist+0x36/0x339 [xfs]
Mar  3 00:40:52 tschan1 last message repeated 2 times
Mar  3 00:40:52 tschan1 kernel:  [<f9202686>] xlog_assign_tail_lsn+0xc/0x113 [xfs]
Mar  3 00:40:52 tschan1 kernel:  [<f92056ce>] xlog_state_release_iclog+0x18/0x179 [xfs]
Mar  3 00:40:52 tschan1 kernel:  [<f9201b74>] xfs_log_release_iclog+0xe/0x35 [xfs]
Mar  3 00:40:52 tschan1 kernel:  [<f9204e8b>] xlog_regrant_write_log_space+0x324/0x6ee [xfs]
Mar  3 00:40:52 tschan1 kernel:  [<f91cb496>] xfs_free_extent+0xab/0xef [xfs]
Mar  3 00:40:52 tschan1 kernel:  [<f91da4a7>] xfs_bmap_finish+0xdd/0x14e [xfs]
Mar  3 00:40:52 tschan1 kernel:  [<f9217d9d>] xfs_rmdir+0x30a/0x3e2 [xfs]
Mar  3 00:40:52 tschan1 kernel:  [<f92205e1>] linvfs_rmdir+0x13/0x2f [xfs]
Mar  3 00:40:52 tschan1 kernel:  [<c016e437>] vfs_rmdir+0x18d/0x1d9
Mar  3 00:40:52 tschan1 kernel:  [<c016e51a>] sys_rmdir+0x97/0xe9
Mar  3 00:40:52 tschan1 kernel:  [<c015d5cc>] filp_close+0x59/0x5f
Mar  3 00:40:52 tschan1 kernel:  [<c0103337>] syscall_call+0x7/0xb
Mar  3 00:40:52 tschan1 kernel: xfs_force_shutdown(md1,0x8) called from line 4073 of file fs/xfs/xfs_bmap.c.  Return address = 0xf9222cdb
Mar  3 00:40:52 tschan1 kernel: Filesystem "md1": Corruption of in-memory data detected.  Shutting down filesystem: md1
Mar  3 00:40:52 tschan1 kernel: Please umount the filesystem, and rectify the problem(s)
Mar  3 00:41:23 tschan1 kernel: xfs_force_shutdown(md1,0x1) called from line 353 of file fs/xfs/xfs_rw.c.  Return address = 0xf9222cdb

Setup:
2x Seagate Barracuda ST3200822AS connected through 
Adaptec ASH-1205SA with Silicon Image SiI 3112 (driver sata_sil)
2 software raid 1 arrays, /dev/md0 100 MB XFS, /dev/md1 180 GB XFS

Special kernel modules:
nvidia 1.0-6629
fuse 2.2

At the time the corruption was detected the large filesystem was nearly full (about 300 MB free).

The following behaviour is always reproducible, at least on FC3 rescue, kernel-2.6.10-1.737_FC3 and kernel-2.6.10-1.770_FC3 (latest at this time) with or without before mentioned kernel modules:

xfs_repair is unable to repair the filesystem and aborts with the following output:

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
entry "/ost+found" at block 0 offset 776 in directory inode 128 references invalid inode 18374686479671623679
        clearing inode number in entry at offset 776...
entry at block 0 offset 776 in directory inode 128 has illegal name "/ost+found":         - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
imap claims a free inode 506020455 is in use, correcting imap and clearing inode
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
LEAFN node level is 1 inode 813219798 bno = 8388608
        - agno = 13
        - agno = 14
        - agno = 15
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - clear lost+found (if it exists) ...
        - check for inodes claiming duplicate blocks...
        - agno = 0
inode 0x35c bmap block 0x6e4d20 claimed, state is 5
        - agno = 1
data fork in regular inode 89335619 claims used block 53366048
xfs_repair: dinode.c:2436: process_dinode_int: Assertion `err == 0' failed.
Aborted

After a mount, umount, mount sequence the filesystem is fully readable (with or without xfs_repair) but after a few write operations (create, delete a file or write into a file) the same internal error reappears. atime updates however do not trigger this error.

I'll keep the damaged filesystem for a few days. Please tell me if I can provide further information.


Version-Release number of selected component (if applicable):
kernel-2.6.10-1.737_FC3

How reproducible:
Didn't try


Additional info:

Comment 1 Dave Jones 2005-03-09 21:37:08 UTC

you're better off reporting this one to the maintainers at SGI.
(linux-xfs.com)

Comment 2 Daniel Tschan 2005-03-17 23:20:56 UTC

I've reported the bug upstream as suggested. According to the XFS developers at
SGI some of the xfs_repair problems are related to the fact that xfsprogs are
compiled with activated debug mode.

See also bug 151438

Comment 3 Dan Pritts 2005-04-28 14:10:40 UTC

I'm having similar problems on RHEL 4 (but alas i dont' pay for support) - Did
you open a bugzilla bug with the SGI XFS folks?  I am not finding it with
searches of their bugzilla.

Comment 4 Daniel Tschan 2005-04-28 14:56:54 UTC

No, I sent a mail to linux-xfs.com:
http://oss.sgi.com/archives/linux-xfs/2005-03/msg00036.html

Comment 5 Dave Jones 2005-07-15 18:07:29 UTC

An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which
may contain a fix for your problem.   Please update to this new kernel, and
report whether or not it fixes your problem.

If you have updated to Fedora Core 4 since this bug was opened, and the problem
still occurs with the latest updates for that release, please change the version
field of this bug to 'fc4'.

Thank you.

Comment 6 Daniel Tschan 2005-07-18 13:40:24 UTC

Thanks for the information. I used xfs on one desktop machine only to see if
it's ready for production on our servers. Meanwhile I switched the machine back
to ext3 so I have no xfs left for testing. Sorry.

Note You need to log in before you can comment on or make changes to this bug.