From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; FunWebProducts-MyWay; .NET CLR 1.1.4322) Description of problem: I am encountering problems on my Red Hat Linux with a KERNEL problem. Server running Red Hat Linux with VMWare GSX Server Version 2.5.0 Build 3986 (8 environments Windows 2000 and Windows 2003). Version-Release number of selected component (if applicable): kernel 2.4.9-e.12 smp How reproducible: Sometimes Steps to Reproduce: 1. None. The problem reproduces itself without even doing anything on the server. 2. 3. Actual Results: Reboot server. Additional info: Processor swapper (pid '0, stackpage C545F000) Stack C545E000 C545E000 C545E000 C01139eF F77eBee8 C0105400 C024556A Call trace [<C0139eF>] SMP_Call_function_interrupt [<C0105400>] default_idle [KERNEL]0x0 [Kernel]0x2F [<C024556A>] call_call_function_interrupt [KERNEL]0x5 [<C0105400>] default_idle [KERNEL]0x0 [<C010542e>] default_idle [KERNEL]0x0 [<C0105492>] default_idle [KERNEL]0x2e [<C011C5e6>] call_console_drivers[KERNEL]0x46 [<C011C756>] call_console_drivers[KERNEL]0xeb Code 8b 3c 1e 89 04 1e 8b 42 20 89 3C 81 5b 5e 5F C3 <0> KERNEL PANIC: NOT CONTINUING 8d b4 26 00
can you provide lsmod output ?
closing due to inactivity on requested information
Arjan, I have seen another case on this. This is the panic log details; Oops: 0000 Kernel 2.4.9-e.37enterprise CPU: 1 EIP: 0010:[<c0138652>] Tainted: PF EFLAGS: 00013002 EIP is at do_ccupdate_local [kernel] 0x22 eax: 00000000 ebx: 00000004 ecx: f7f65efc edx: ce838000 esi: 00000074 edi: ce838000 ebp: ce839f00 esp: ce839e90 ds: 0018 es: 0018 ss: 0018 Process vmware (pid: 28276, stackpage=ce839000) Stack: ce838000 f7f84000 ce838000 c0113bef f7f65ef8 c0357120 c024724e c0357120 f4e465c0 00000001 f7f84000 ce838000 ce839f00 ffffe000 f7f80018 f7f80018 fffffffa c0119a8d 00000010 00003206 ce839f0c 00003202 f4e465c0 00000000 Call Trace: [<c0113bef>] smp_call_function_interrupt [kernel] 0x2f (0xce839e9c) [<c024724e>] call_call_function_interrupt [kernel] 0x5 (0xce839ea8) [<c0119a8d>] schedule [kernel] 0x3ad (0xce839ed4) [<c0125704>] schedule_timeout [kernel] 0x84 (0xce839f04) [<c0125670>] process_timeout [kernel] 0x0 (0xce839f1c) [<c01574fe>] do_select [kernel] 0x20e (0xce839f34) [<c01578a9>] sys_select [kernel] 0x339 (0xce839f6c) [<c0146bd6>] sys_read [kernel] 0x96 (0xce839f7c) [<c01073e3>] system_call [kernel] 0x33 (0xce839fc0) Can you varify that it is the same problem or not please?
Created attachment 102917 [details] patch to fix race in smp_call_function After speaking with vmware, I believe that they have identified a race condition that may arise under heavy load. smp_call_function in the 2.4.9 kernel contains a race condition in both the call data assigned to it and the use of the atomic fields that gate its execution. This appears to be corrected both in upstream 2.6 kernels and in the RHEL3 2.4.21 kernel series. This patch is a variant of the patch provided by vmware for the problem, which solves this issue. Tested by both vmware and myself, and appears to work well.
Created attachment 102939 [details] final copy of patch to fix race in smp_call_function Heres the final copy of the patch that got acked after various and sundry clean-ups were preformed.
This patch looks OK and was reviewed on rhkernel-list so it sould be included in Pensacola-U6. Larry
This is already in U6...changing status to modified.
When is the kernel with this patch released? I checked the last available e-49 and there it is not included yet. As I'm hit of this problem too on our VMware RH AS 2.1 server I would be curious to know in which version of kernel this will be included.
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2004-505.html