From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3) Gecko/20041020 Description of problem: In a cluster of 290 dual-Xeon FC3 (2.6.10-1.760) machines we where experiencing several kernel crashes (Oops) a day until Neil Brown identified a critical (for us) patch in the RPC auth cache. The details of this discussion and a proof of principle patch may be found starting with, http://sourceforge.net/mailarchive/forum.php?thread_id=6514912&forum_id=4930 The proof of principle patch has now been running on our 290 node cluster for over 6 days without a single crash. The question here is how to get the cleaned up version of the patch integrated into FC3 to avoid having to patch all of our systems. Version-Release number of selected component (if applicable): kernel-smp-2.6.10-1.760_FC3 How reproducible: Sometimes Steps to Reproduce: 1. Install FC3 on 290 linux machines 2. Have them all cross mount 1 filesystem from each other node 3. Run data intensive analysis jobs that read from all 290 filesystems on every node 4. Wait a few hours for a kernel Oops. Additional info:
if you can attach the patch to this bugzilla, I'll take a look at including it until it gets merged upstream. thanks.
Created attachment 111190 [details] sunrpc patch
This patch is now in 2.6.12-rc1. Any idea when it might be merged into a FC3 kernel?
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which may contain a fix for your problem. Please update to this new kernel, and report whether or not it fixes your problem. If you have updated to Fedora Core 4 since this bug was opened, and the problem still occurs with the latest updates for that release, please change the version field of this bug to 'fc4'. Thank you.
Unfortunately, the SMP version of 2.6.12-1.1372_FC3 will not boot on these computers, see, https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=162859
I am now able to boot kernel-smp-2.6.12-1.1372_FC3 since mkinitrd-4.1.18.1-1.i386.rpm was released today in test/update