Bug 136768

Summary: Kernel oops while unlinking files on an xfs filesystem
Product: [Fedora] Fedora Reporter: Matthias Saou <matthias>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED NEXTRELEASE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 2CC: pfrields, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-04-16 04:45:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Matthias Saou 2004-10-22 09:15:55 UTC
Description of problem:
I grew an XFS-on-top-of-LVM partition like this :
# lvm lvextend -L+5G /dev/data/bck
# xfs_growfs /dev/data/bck
And all went fine, but to further recover free space, I decided to
delete some old mail backups (Maildir, plenty of small files, many
hardlinked across directories), but the rm command segfaulted, so I
ran it again, but it is now defunct and I got this from dmesg :

Unable to handle kernel NULL pointer dereference at virtual address
00000008
 printing eip:
82c6c6ba
*pde = 00003001
Oops: 0000 [#1]
SMP
Modules linked in: nfsd exportfs lockd md5 ipv6 sunrpc tg3
ip_conntrack_ftp ipt_limit ipt_state ip_conntrack ipt_multiport
iptable_filter ip_tables floppy sg microcode xfs button battery
asus_acpi ac ext3 jbd dm_mod megaraid sd_mod scsi_modCPU:    2
EIP:    0060:[<82c6c6ba>]    Not tainted
EFLAGS: 00010206   (2.6.8-1.521smp)
EIP is at validate_fields+0x1a/0x8c [xfs]
eax: 00000000   ebx: 00000000   ecx: 00000080   edx: 0f8c6eb8
esi: 6e9ef0a4   edi: 6e9ef0a4   ebp: 2200ea64   esp: 0f8c6eb8
ds: 007b   es: 007b   ss: 0068
Process rm (pid: 8120, threadinfo=0f8c6000 task=809862b0)
Stack: 000020c0 00000002 000201c0 00000064 00000065 0e9c9082 00000000
0017a000
       00000000 00010000 4178cc16 0969e318 4178cc17 0b7d8b50 4178cc17
0b7d8b50
       00008180 00000000 00000288 00000000 00000000 ffffffff ffffffff
7e73732c
Call Trace:
 [<82c6cbeb>] linvfs_unlink+0x27/0x2f [xfs]
 [<0216aa80>] vfs_unlink+0x182/0x1c1
 [<0216ab6e>] sys_unlink+0xaf/0x131
 [<02158bb5>] put_user_size+0x29/0x2d
 [<0216d651>] sys_getdents64+0xa0/0xaa
Code: 8b 58 08 6a 00 ff 53 18 5a 85 c0 75 5d 0f b7 44 24 0a 89 46


Version-Release number of selected component (if applicable):
2.6.8-1.521smp

How reproducible:
Uknown. This is a 2TB LVM volume heavily accessed through nfs.
Rebooting will be a problem...

Additional info:
I've already grown that very partition at least 2 or 3 times, and
other volumes on the same machine that are also formatted with XFS too
without any problems up to now.

Comment 1 Matthias Saou 2004-10-22 09:18:51 UTC
I just ran dmesg again and now have this appended to the above :

 <4>xfs_inotobp: xfs_imap()  returned an error 22 on dm-1.  Returning
error.
xfs_iunlink_remove: xfs_inotobp()  returned an error 22 on dm-1. 
Returning error.
xfs_inactive:   xfs_ifree() returned an error = 22 on dm-1
xfs_force_shutdown(dm-1,0x1) called from line 1759 of file
fs/xfs/xfs_vnodeops.c.  Return address = 0x82c6f03e
Filesystem "dm-1": I/O Error Detected.  Shutting down filesystem: dm-1
Please umount the filesystem, and rectify the problem(s)
------------[ cut here ]------------
kernel BUG at fs/inode.c:1122!
invalid operand: 0000 [#2]
SMP
Modules linked in: nfsd exportfs lockd md5 ipv6 sunrpc tg3
ip_conntrack_ftp ipt_limit ipt_state ip_conntrack ipt_multiport
iptable_filter ip_tables floppy sg microcode xfs button battery
asus_acpi ac ext3 jbd dm_mod megaraid sd_mod scsi_modCPU:    0
EIP:    0060:[<0217554b>]    Not tainted
EFLAGS: 00010246   (2.6.8-1.521smp)
EIP is at iput+0x19/0x61
eax: 82c83300   ebx: 370799a4   ecx: 370799b4   edx: 37079900
esi: 42619f24   edi: 370799a4   ebp: 00000020   esp: 7d2d8bec
ds: 007b   es: 007b   ss: 0068
Process nfsd (pid: 13808, threadinfo=7d2d8000 task=7d1ad270)
Stack: 42619f2c 02171cf7 00000000 00000000 7d2d8000 00000000 00000001
00000000
       08b140a4 00000000 0217243d 022db178 00000000 00000000 2c11402c
08b140a4
       00000000 08b140a4 11270000 02172704 7d2d8c60 82c60a49 2c11402c
82c82e40
Call Trace:
 [<02171cf7>] prune_dcache+0x1ff/0x2db
 [<0217243d>] d_alloc+0xa2/0x25e
 [<02172704>] d_alloc_anon+0x2c/0x1cd
 [<82c60a49>] xfs_vget+0x95/0x9b [xfs]
 [<82c6ec1a>] linvfs_get_dentry+0x57/0x6c [xfs]
 [<82a5e02d>] find_exported_dentry+0x2d/0x7fe [exportfs]
 [<0228d335>] qdisc_restart+0x13/0x230
 [<82b923da>] ip_refrag+0x1a/0x58 [ip_conntrack]
 [<02158ba8>] put_user_size+0x1c/0x2d
 [<0227e9dc>] memcpy_toiovec+0x27/0x49
 [<0227ef36>] skb_copy_datagram_iovec+0x4f/0x1e1
 [<0227c921>] release_sock+0xa5/0xab
 [<022a0819>] tcp_recvmsg+0x63b/0x676
 [<0227ca0d>] sock_common_recvmsg+0x30/0x46
 [<02279796>] sock_recvmsg+0xae/0xcb
 [<0211b20d>] recalc_task_prio+0x128/0x133
 [<0211b29e>] activate_task+0x86/0x93
 [<02115e55>] smp_send_reschedule+0x1a/0x1b
 [<0211d0bc>] __wake_up_common+0x36/0x5b
 [<0211d130>] __wake_up+0x4f/0x7f
 [<82cc8df9>] svc_sock_enqueue+0x255/0x25d [sunrpc]
 [<82cc9f5c>] svc_tcp_recvfrom+0x2fb/0x36d [sunrpc]
 [<0212eae4>] set_current_groups+0xb2/0xba
 [<82a5eac9>] export_decode_fh+0x50/0x56 [exportfs]
 [<82d64bcc>] nfsd_acceptable+0x0/0x11a [nfsd]
 [<82a5ea79>] export_decode_fh+0x0/0x56 [exportfs]
 [<82d65026>] fh_verify+0x340/0x4b3 [nfsd]
 [<82d64bcc>] nfsd_acceptable+0x0/0x11a [nfsd]
 [<02128cd6>] process_timeout+0x0/0x5
 [<82d6c53c>] nfsd3_proc_getattr+0x6d/0x76 [nfsd]
 [<82d6db17>] nfs3svc_decode_fhandle+0x0/0x6b [nfsd]
 [<82d6377d>] nfsd_dispatch+0xbf/0x162 [nfsd]
 [<82cc894e>] svc_process+0x323/0x55f [sunrpc]
 [<82d634dd>] nfsd+0x275/0x456 [nfsd]
 [<82d63268>] nfsd+0x0/0x456 [nfsd]
 [<82d63268>] nfsd+0x0/0x456 [nfsd]
 [<021041f1>] kernel_thread_helper+0x5/0xb
Code: 0f 0b 62 04 50 38 2f 02 85 c0 74 0b 8b 50 14 85 d2 74 04 89

It seems like the filesystem and the kernel really didn't like the
unlink problem at all. Any ideas about what the cause could have been?

Comment 2 Matthias Saou 2005-01-25 04:58:08 UTC
This same server, which was currently running 2.6.9-1.6_FC2smp now
froze with the message below to be found in /var/log/messages after
reboot (no system partition are xfs). It is now running
2.6.10-1.9_FC2smp. This problem is probably unrelated, but just in case...

[...]
Jan 25 04:26:21 filer02 kernel: nfsd: page allocation failure.
order:4, mode:0x50
Jan 25 04:26:21 filer02 kernel:  [<02140445>] __alloc_pages+0x2a4/0x2be
Jan 25 04:26:21 filer02 kernel:  [<02140477>] __get_free_pages+0x18/0x24
Jan 25 04:26:21 filer02 kernel:  [<02143518>] kmem_getpages+0x1c/0xbf
Jan 25 04:26:21 filer02 kernel:  [<02144188>] cache_grow+0xff/0x1e4
Jan 25 04:26:21 filer02 kernel:  [<0213ed67>] mempool_alloc+0x79/0x18e
Jan 25 04:26:21 filer02 kernel:  [<0214441d>]
cache_alloc_refill+0x1b0/0x1ec
Jan 25 04:26:21 filer02 kernel:  [<021448f8>] __kmalloc+0x76/0x88
Jan 25 04:26:21 filer02 kernel:  [<82c66ea5>] kmem_alloc+0x49/0x97 [xfs]
Jan 25 04:26:21 filer02 kernel:  [<82c66f6b>] kmem_realloc+0x17/0x52 [xfs]
Jan 25 04:26:21 filer02 kernel:  [<82c4b00a>]
xfs_iext_realloc+0xc8/0xdb [xfs]
Jan 25 04:26:21 filer02 kernel:  [<82c288df>]
xfs_bmap_insert_exlist+0x22/0x75 [xfs]
Jan 25 04:26:21 filer02 kernel:  [<82c25d36>]
xfs_bmap_add_extent_hole_delay+0x42f/0x485 [xfs]
Jan 25 04:26:21 filer02 kernel:  [<02219e6b>] __elv_add_request+0x35/0x6a
Jan 25 04:26:21 filer02 kernel:  [<0221cb23>] __make_request+0x479/0x4e7
Jan 25 04:26:21 filer02 kernel:  [<82c23757>]
xfs_bmap_add_extent+0x152/0x3a5 [xfs]
Jan 25 04:26:21 filer02 kernel:  [<82c2a786>] xfs_bmapi+0x96c/0x1073 [xfs]
Jan 25 04:26:21 filer02 kernel:  [<82c28f41>]
xfs_bmap_search_extents+0x53/0x5a
[xfs]do_IRQ: stack overflow: 456

Comment 3 Dave Jones 2005-04-16 04:45:43 UTC
Fedora Core 2 has now reached end of life, and no further updates will be
provided by Red Hat.  The Fedora legacy project will be producing further kernel
updates for security problems only.

If this bug has not been fixed in the latest Fedora Core 2 update kernel, please
try to reproduce it under Fedora Core 3, and reopen if necessary, changing the
product version accordingly.

Thank you.