Bug 68636 - 2.4.18-5smp panics with NFS
Summary: 2.4.18-5smp panics with NFS
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.3
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
Assignee: Steve Dickson
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2002-07-11 22:06 UTC by Chance Reschke
Modified: 2007-04-18 16:44 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-08-11 11:11:37 UTC
Embargoed:


Attachments (Terms of Use)

Description Chance Reschke 2002-07-11 22:06:28 UTC
Description of Problem: 

Crashes aparently due to SMP kernel scheduling problem with nfsd.  The 
system is a Dell 6450 (quad PIII Xeon, Serverworks he chipset) with 
Perc3/QC (LSI Megaraid 1600) disk controler.  Complete RH 7.3 install 
with all RPM updates through July 1 applied.  Runs the 2.4.18-5smp 
kernel from the RH i686 RPM.


Version-Release number of selected component (if applicable):
RedHat 7.3, kernel 2.4.18-5smp

How Reproducible:

After running for several days as the server for a cluster of 42 diskless 
dual Athlon systems, system crashes.


Steps to Reproduce:
1. Boot server
2. initiate NFS requests from several NFS clients - mixed reads & writes
3. run for two to five days

Actual Results:

Kernel panic and crash.  Screen shows:

CPU: 3
EIP: 0010:[<cd9f3de0>] Not tainted
EEFLAGS: 00010086

EIP is at ___strtok_Rsmp_29805c13 [] 0xd607f48 (2.4.18-5smp)
eax: 00000003 ebx: cd9f2000 ecx: f7acdd8 edx: 0000006a
esi: dec42000 edi: 00000000 ebp: dec43f4c esp: dec43f1c
ds: 0018 es: 0018 ss: 0018
Process nfsd (pid 1307, stackpage=dec43000)
Stack: c0118b8c f6483da0 dec42000 c0118b61 f41c3a0c dec42000 
f6483da0 dec42000
  dec42000 dec43f58 00d26aef dec43f98 f41c3da8 c012515c dec43f58 
f18d7f58
  c60a9f58 00d26aef dec42000 c01250d0 ec3495e0 00000246 f41c3a00 
f41c3a00
Call Trace: [<c0118b8c>] schedule [kernel] 0x37c
[<c0118b61>] schedule [kerne;] 0x351
[<c012515c>] schedule_timeout [kernel] 0x7c
[<c01250d0>] process_timeout [kernel] 0x0
[<f8afdb61>] svc_recv_Rsmp_e7d2e7df [sunrpc] 0x221
[<f8b1b334>] nfsd [nfsd] 0x144
[<f8b1b1f0>] nfsd [nfsd] 0x0
[<c0107286>] kernel_thread [kernel] 0x26
[<f8b1b1f0>] nfsd [nfsd] 0x0

code: 01 00 00 00 49 78 10 c0 01 00 00 00 00 20 9f cd a0 1c fc f7



Expected Results: No crashes!


Additional Information:
	
More information scrolls off the screen when the system crashes.  The 
system is located off site.  Users at the server location photographed the 
screen of the crashed system and sent me the digital image.

Comment 1 Chance Reschke 2002-08-01 23:32:24 UTC
We have repeated this problem with four additional machines, all Quad PIII Xeons based 
on the Serverworks HE chipset (Dell 6450, Supermicro SC860).  The kernel panic  has 
occured when scheduling nfsd as well as ssh.  

The crashes are frequent enough to be very disruptive.



Comment 2 Steve Dickson 2004-08-11 11:11:37 UTC
This appears to be fixed in later kernels.


Note You need to log in before you can comment on or make changes to this bug.