Bug 148840

Summary: Kernel Oops in rpc.mountd on NFS servers with a large number of NFS mounts
Product: [Fedora] Fedora Reporter: Stuart Anderson <anderson>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 3CC: pfrields, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: 2.6.12-1.1372_FC3 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-07-29 00:37:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
sunrpc patch none

Description Stuart Anderson 2005-02-16 02:06:23 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3)
Gecko/20041020

Description of problem:
In a cluster of 290 dual-Xeon FC3 (2.6.10-1.760) machines we
where experiencing several kernel crashes (Oops) a day until
Neil Brown identified a critical (for us) patch in the RPC
auth cache.

The details of this discussion and a proof of principle patch may
be found starting with,
http://sourceforge.net/mailarchive/forum.php?thread_id=6514912&forum_id=4930

The proof of principle patch has now been running on our 290 node
cluster for over 6 days without a single crash.

The question here is how to get the cleaned up version of the patch
integrated into FC3 to avoid having to patch all of our systems.

Version-Release number of selected component (if applicable):
kernel-smp-2.6.10-1.760_FC3

How reproducible:
Sometimes

Steps to Reproduce:
1. Install FC3 on 290 linux machines
2. Have them all cross mount 1 filesystem from each other node
3. Run data intensive analysis jobs that read from all 290 filesystems
on every node
4. Wait a few hours for a kernel Oops.
    

Additional info:

Comment 1 Dave Jones 2005-02-16 02:12:08 UTC
if you can attach the patch to this bugzilla, I'll take a look at
including it until it gets merged upstream.

thanks.


Comment 2 Stuart Anderson 2005-02-18 00:49:18 UTC
Created attachment 111190 [details]
sunrpc patch

Comment 3 Stuart Anderson 2005-03-24 18:17:42 UTC
This patch is now in 2.6.12-rc1. Any idea when it might be merged into a FC3
kernel?

Comment 4 Dave Jones 2005-07-15 18:05:37 UTC
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which
may contain a fix for your problem.   Please update to this new kernel, and
report whether or not it fixes your problem.

If you have updated to Fedora Core 4 since this bug was opened, and the problem
still occurs with the latest updates for that release, please change the version
field of this bug to 'fc4'.

Thank you.

Comment 5 Stuart Anderson 2005-07-17 02:48:55 UTC
Unfortunately, the SMP version of 2.6.12-1.1372_FC3 will not boot on these
computers, see,

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=162859

Comment 6 Stuart Anderson 2005-07-29 00:37:06 UTC
I am now able to boot kernel-smp-2.6.12-1.1372_FC3 since
mkinitrd-4.1.18.1-1.i386.rpm was released today in test/update