Description of problem: The server kernel panicked with the message in the summary as the error message. Version-Release number of selected component (if applicable): kernel-smp-2.6.9-6.16.EL Additional info: This is the third time this same machine has panicked since we installed RHEL 4 on it. It's had a different error every time, so separate bugs have been filed for each incident. The previous bugs are bug 150044 and bug 150743. The panic log did NOT get written to the logfile. The following is what was visible on the console prior to rebooting: RBP: 00000100065cb440 R08: 00000101f8f1cff0 R09: 00000101f8f1c088 R10: 00000101f8f1cfa8 R11: 0000000000000001 R12: 0000010000012780 R13: 0000000000000000 R14: 0000000000000000 r15: 000001022fde7e48 FS: 0000000000000000(0000) GS:ffffffff804c0c00(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000070 CR3: 000000000e3f4000 CR4: 00000000000006e0 Process kswapd0 (pid: 66, threadinfo 000001022fde6000, task 000001022fd947f0) Stack: 00000101f8f1cff0 ffffffff80174ca5 00000000000000d0 0000000000000000 0000000000000001 00000100065cb440 0000010000012780 ffffffff8015ebdd 0000001700000001 ffffffff00000000 Call Trace:<ffffffff80174ca5>{try_to_free_buffers+67} <ffffffff8015ebdd>{shrink_zone+3369} <ffffffff8015f3a8>{balance_pgdat+506} <ffffffff8015f5f2>{kswapd+252} <ffffffff80133686>{autoremove_wake_function+0} <ffffffff80130bd5>{finish_task_switch+55} <ffffffff80133686>{autoremove_wake_function+0} <ffffffff80130c24>{schedule_tail+11} <ffffffff80110c87>{child_rip+8} <ffffffff8015f4f6>{kswapd+0} <ffffffff80110c7f>{child_rip+0} Code: f0 0f ba 68 70 10 8b 11 8b 41 18 83 e2 06 09 d0 75 51 48 8b RIP <ffffffff80174a20>{drop_buffers+39} RSP <000001022fde7b28> CR2: 0000000000000070 <0>Kernel panic - not syncing: Oops
panicked again today with this error again. Got a screenshot this time, will attach shortly.
Created attachment 112238 [details] screenshot of console after panic
Hi Dave, I really cant make any sence out of these stack traces yet. There appears to be multiple crashes that dont look related here, do you think there is some memory corruption taking place? Can you attach a serial console for a complete OOPs/panic message? Can you get a dump? Thanks, Larry
This machine has been stable for a few months now (and there have been additional kernel upgrades since then). I would guess that these were flukes, or the problem has been fixed in one of the more recent kernels.
Created attachment 119170 [details] Kernel panic screenshot from 2.6.9-11.ELsmp
We are having the same problem, except on 2.6.9-11.ELsmp. The behaviour can be duplicated easily. I have attached the screenshot above (sorry I couldn't get a text output, I'm unable to analyse the diskdump file at the moment because the kernel-debuginfo packages don't seem to exist anymore).
The bug in comment #5 looks very similar to bug 156854. It has the same mpol_free_shared_policy+53. The weird thing is that bug 156854 was supposed to be fixed already. It's not 100% clear to me that the bug in comment #5 is the same as the original bug that was reported in this incident. How are you able to reproduce the bug? It says it was excuting the 'rm' command when it crashed...
We are able to reproduce it very easily. It happens when building larger rpm packages such as kernel or java with rpmbuild, and occurs at the point when rpmbuild does an rm -rf on the temporary build directory.
Here is some additional information from crash analysis: SYSTEM MAP: /boot/System.map-2.6.9-11.ELsmp DEBUG KERNEL: /usr/lib/debug/lib/modules/2.6.9-11.ELsmp/vmlinux (2.6.9-11.ELsmp) DUMPFILE: vmcore CPUS: 2 DATE: Thu Sep 22 15:59:16 2005 UPTIME: 49 days, 17:12:50 LOAD AVERAGE: 0.58, 0.16, 0.07 TASKS: 127 NODENAME: xxxxxxxx RELEASE: 2.6.9-11.ELsmp VERSION: #1 SMP Fri May 20 18:25:30 EDT 2005 MACHINE: x86_64 (3200 Mhz) MEMORY: 4.8 GB PANIC: "" PID: 8177 COMMAND: "rm" TASK: 10122be97f0 [THREAD_INFO: 1010eb96000] CPU: 0 STATE: TASK_RUNNING (PANIC) And a backtrace: crash> bt -a PID: 8177 TASK: 10122be97f0 CPU: 0 COMMAND: "rm" #0 [1010eb97d50] start_disk_dump at ffffffffa00ef1e5 #1 [1010eb97d80] try_crashdump at ffffffff8014978e #2 [1010eb97d90] die at ffffffff8011190b #3 [1010eb97db0] do_general_protection at ffffffff80112255 #4 [1010eb97df0] error_exit at ffffffff80110ad9 RIP: ffffffff801dced5 RSP: 000001010eb97ea0 RFLAGS: 00010202 RAX: 2e74722f62696c2f RBX: 00000101123a6068 RCX: 000001000000e000 RDX: 0000000000000000 RSI: 000000000000006c RDI: 00000101123a6060 RBP: 000001010d052000 R8: 000001010eb97db8 R9: 0000000000000000 R10: 000001010eb97e18 R11: ffffffff80170638 R12: 00000101123a6060 R13: 000000000050d538 R14: 00000101123a6120 R15: 000000000050a040 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #5 [1010eb97e78] rb_first at ffffffff801dced5 #6 [1010eb97ea0] mpol_free_shared_policy at ffffffff8016da67 #7 [1010eb97ec0] shmem_destroy_inode at ffffffff80170649 #8 [1010eb97ed0] sys_unlink at ffffffff80181672 #9 [1010eb97f30] sys_getdents64 at ffffffff80183df4 #10 [1010eb97f50] sys_fcntl at ffffffff80183152 #11 [1010eb97f80] system_call at ffffffff8011003e RIP: 0000003ce03b9319 RSP: 0000007fbffff440 RFLAGS: 00000246 RAX: 0000000000000057 RBX: ffffffff8011003e RCX: 0000000000000002 RDX: 0000000000000002 RSI: 000000000050d54b RDI: 000000000050d54b RBP: 0000000000000002 R8: 0000007fbffff530 R9: 0000007fbffff534 R10: 00000000000002f8 R11: 0000000000000293 R12: 000000000050d538 R13: 000000000050d54b R14: 000000000050a040 R15: 0000007fbffff800 ORIG_RAX: 0000000000000057 CS: 0033 SS: 002b PID: 0 TASK: 10009f84030 CPU: 1 COMMAND: "swapper" #0 [10009fabfa0] smp_call_function_interrupt at ffffffff8011bc45 #1 [10009fabfb0] call_function_interrupt at ffffffff801108b1 --- <IRQ stack> --- #2 [10037ed5e98] call_function_interrupt at ffffffff801108b1 RIP: ffffffff8010e6cc RSP: 0000010037ed5f48 RFLAGS: 00000246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000010009f84030 RDI: 00000100052ca5e0 RBP: 0000000000000001 R8: 0000010037ed4000 R9: 0000000000000001 R10: 0000000000000080 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: fffffffffffffffa CS: 0010 SS: 0018 #3 [10037ed5f48] cpu_idle at ffffffff8010e65c
Does this problem still occur with the latest RHEL4-U4 kernel? We have never been able to reproduce this problem so we could never figure out the cause. Larry Woodman
(In reply to comment #10) > Does this problem still occur with the latest RHEL4-U4 kernel? We have never > been able to reproduce this problem so we could never figure out the cause. See comment 4. There's been no change since then. (We still haven't seen it again since then)
Problem appears to have been fixed, its no longer reproducable.