From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7b)
Description of problem:
While running load tests of Subversion with the repository on an NFS
mounted filesystem, we're getting reliable crashes in every Redhat 9 -
through Fedora Core 1 kernel. I've attached the oops and will attach
the ksymoops output shortly. The hang does not seem to occur when we
use a repository mounted on local disk. I don't believe that it has
anything to do with Subversion, but whatever load svn is generating is
tickling a kernel bug.
The hardware is dual Xeon 3.0GHz, running hyperthreading, kernel
2.4.22-1.2179.nptlsmp. The mount options in use are:
The NFS server is a NetApp. Both NFS client and server are running at
100Mb switched ethernet.
In the 2.4.26 kernel's Changelog
(http://kernel.org/pub/linux/kernel/v2.4/ChangeLog-2.4.26) I saw
mention of a refile_inode bug fixed by Trond, which made me think
perhaps this is what is affecting us, but I don't know.
A few minutes before the machine crashes, the virtual memory system
seems to deteriorate rapidly, with large amounts of 'si' and
especially 'so' traffic. I will also attach 'vmstat 30' output for the
30 or so minutes preceding the system crash.
The bug doesn't seem to affect us on a RH 7.2-based system running a
vanilla 2.4.21 kernel that includes Trond's NFS-ALL patch cluster.
Unable to handle kernel NULL pointer dereference at virtual address
*pde = 00000000
nfs lockd sunrpc iptable_filter ip_tables autofs tg3 keybdev mousedev
hid input usb-ohci usbcore ext3 jbd cciss sd_mod scsi_mod
EIP: 0060:[<c01690b7>] Not tainted
EIP is at refile_inode [kernel] 0x47 (2.4.22-1.2179.nptlsmp)
eax: 00000000 ebx: dc141b80 ecx: 00000000 edx: dc141b88
esi: c0375ea8 edi: c0374e58 ebp: 00023354 esp: e76a5dd4
ds: 0068 es: 0068 ss: 0068
Process svnlook (pid: 2038, stackpage=e76a5000)
Stack: c17de430 dc141c44 c013c5e2 dc141b80 c17de430 00000000 c17de430
c17de430 000001d2 e76a4000 00000a57 000001d2 00000019 00000020
c0374e58 c0374e58 c01463ba e76a5e40 000001d2 0000003c 00000020
Call Trace: [<c013c5e2>] __remove_inode_page [kernel] 0x82 (0xe76a5ddc)
[<c01460ca>] shrink_cache [kernel] 0x30a (0xe76a5df0)
[<c01463ba>] shrink_caches [kernel] 0x4a (0xe76a5e1c)
[<c0146432>] try_to_free_pages_zone [kernel] 0x62 (0xe76a5e30)
[<f885827b>] ext3_do_update_inode [ext3] 0x19b (0xe76a5e38)
[<c0147012>] balance_classzone [kernel] 0x52 (0xe76a5e54)
[<c0147348>] __alloc_pages [kernel] 0x188 (0xe76a5e70)
[<c013df51>] do_generic_file_read [kernel] 0x401 (0xe76a5eb0)
[<c013e3b0>] file_read_actor [kernel] 0x0 (0xe76a5ee0)
[<c013e575>] generic_file_new_read [kernel] 0xc5 (0xe76a5f00)
[<c013e3b0>] file_read_actor [kernel] 0x0 (0xe76a5f10)
[<c0163131>] do_select [kernel] 0x151 (0xe76a5f24)
[<c013e69f>] generic_file_read [kernel] 0x2f (0xe76a5f4c)
[<f89fd608>] nfs_file_read [nfs] 0x98 (0xe76a5f64)
[<c01504ba>] sys_pread [kernel] 0xca (0xe76a5f8c)
[<c0109b27>] system_call [kernel] 0x33 (0xe76a5fc0)
Code: 89 01 c7 43 08 00 00 00 00 89 48 04 8b 06 89 50 04 89 43 08
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Right now we can reproduce this using our Subversion load testing with
Silk Performer. We are working on reproducing this with commonly
available command-line tools.
Actual Results: Test completes.
Expected Results: Kernel oops.
Created attachment 99698 [details]
Created attachment 99699 [details]
'vmstat 30' output for period preceding crash
Created attachment 99700 [details]
SysRq+T output from oopsed state
I submitted this to the linux-nfs mailing list, and according to
Trond, this is a VM bug which should be fixed in FC1 kernels:
That it showed up on tests where we were using an NFS-mounted
filesystem is, apparently, just coincidental.
Subject: Re: [NFS] oops in FC1 update kernel, in refile_inode
From: Trond Myklebust <trond.myklebust () fys ! uio ! no>
Date: 2004-04-26 21:56:32
That is indeed a fix for a generic VFS/mm race. It has pretty much
nothing to do with NFS itself but just happened to trigger on an NFS
partition for someone.
As far as I can see, that patch hasn't yet been applied to the latest
errata kernel (linux-2.4.22-1.2188.nptl). Have you tried it out to see
if it fixes your Oops?
Steve, could you make sure that patch makes it into any future errata
--- linux-2.4.26-up/fs/inode.c.orig 2004-03-19 17:12:46.000000000 -0500
+++ linux-2.4.26-up/fs/inode.c 2004-03-26 13:01:23.000000000 -0500
@@ -319,7 +319,8 @@ void refile_inode(struct inode *inode)
+ if (!(inode->i_state & I_LOCK))
With the above patch applied to the FC1.2179 kernel, we have not seen
the oops in 2 days of constant testing. For reference, we used to see
this oops after 2-8 hours of stress testing.
patch is in cvs, and will be in the next update.
Can this be the same issue as in bug 123332? I've posted there 2
stacktraces from kerlen panics, captured with a digital camera.
BTW, forgot to notice, we're having those kernel panics on Fedora
kernel 2.4.22-1.2188.nptlsmp, about once every 2 weeks. This is a
production system, so unfortunately we cannot afford to stress-test it
to reproduce this artificially.
We cannot also connect a serial console, as the machine has only 1
serial port that has to be connected to a UPS.
But the stacktraces captured with digital camera look exactly the same
as the one reported here.
We were suspecting this to be a hardware issue with 3Ware controller
that runs our RAID5 array, but in the light of this bug it seems more
probable to be a kernel bug, right?
there should be a 2190 kernel in updates-testing, which should have
Out system just crashed again;
I've installed the 2.4.22-1.2190.nptlsmp kernel package from
2004-05-26 - I'll let you know if it remedies the issue, but testing
period will be long since this crash occurs about twice a month on
this particular system.
Does this issue affect Fedora 2's 2.6 kernel?
no. refile_inode doesn't exist there.
Another panic in refile_inode occured just today on
The problem has not been resolved, or the problem is separate (in that
case, bug 123332 is not a dupe of this one).
BTW, looking at /usr/src/linux-2.4/fs/inode.c (from
kernel-source-2.4.22-1.2190.nptl RPM) the fix from comment #3 is
But the panics still happen.
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases,
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/
Problem was found and fixed in RHEL3 U3.