Description of Problem: Crashes aparently due to SMP kernel scheduling problem with nfsd. The system is a Dell 6450 (quad PIII Xeon, Serverworks he chipset) with Perc3/QC (LSI Megaraid 1600) disk controler. Complete RH 7.3 install with all RPM updates through July 1 applied. Runs the 2.4.18-5smp kernel from the RH i686 RPM. Version-Release number of selected component (if applicable): RedHat 7.3, kernel 2.4.18-5smp How Reproducible: After running for several days as the server for a cluster of 42 diskless dual Athlon systems, system crashes. Steps to Reproduce: 1. Boot server 2. initiate NFS requests from several NFS clients - mixed reads & writes 3. run for two to five days Actual Results: Kernel panic and crash. Screen shows: CPU: 3 EIP: 0010:[<cd9f3de0>] Not tainted EEFLAGS: 00010086 EIP is at ___strtok_Rsmp_29805c13 [] 0xd607f48 (2.4.18-5smp) eax: 00000003 ebx: cd9f2000 ecx: f7acdd8 edx: 0000006a esi: dec42000 edi: 00000000 ebp: dec43f4c esp: dec43f1c ds: 0018 es: 0018 ss: 0018 Process nfsd (pid 1307, stackpage=dec43000) Stack: c0118b8c f6483da0 dec42000 c0118b61 f41c3a0c dec42000 f6483da0 dec42000 dec42000 dec43f58 00d26aef dec43f98 f41c3da8 c012515c dec43f58 f18d7f58 c60a9f58 00d26aef dec42000 c01250d0 ec3495e0 00000246 f41c3a00 f41c3a00 Call Trace: [<c0118b8c>] schedule [kernel] 0x37c [<c0118b61>] schedule [kerne;] 0x351 [<c012515c>] schedule_timeout [kernel] 0x7c [<c01250d0>] process_timeout [kernel] 0x0 [<f8afdb61>] svc_recv_Rsmp_e7d2e7df [sunrpc] 0x221 [<f8b1b334>] nfsd [nfsd] 0x144 [<f8b1b1f0>] nfsd [nfsd] 0x0 [<c0107286>] kernel_thread [kernel] 0x26 [<f8b1b1f0>] nfsd [nfsd] 0x0 code: 01 00 00 00 49 78 10 c0 01 00 00 00 00 20 9f cd a0 1c fc f7 Expected Results: No crashes! Additional Information: More information scrolls off the screen when the system crashes. The system is located off site. Users at the server location photographed the screen of the crashed system and sent me the digital image.
We have repeated this problem with four additional machines, all Quad PIII Xeons based on the Serverworks HE chipset (Dell 6450, Supermicro SC860). The kernel panic has occured when scheduling nfsd as well as ssh. The crashes are frequent enough to be very disruptive.
This appears to be fixed in later kernels.