Bug 151100
| Summary: | Kernel panic - not syncing: Oops | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 4 | Reporter: | Dave Miller <justdave> | ||||||
| Component: | kernel | Assignee: | Larry Woodman <lwoodman> | ||||||
| Status: | CLOSED WORKSFORME | QA Contact: | Brian Brock <bbrock> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 4.0 | CC: | blizzard, davej, managed, riel | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2007-07-10 15:31:43 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Dave Miller
2005-03-14 22:04:09 UTC
panicked again today with this error again. Got a screenshot this time, will attach shortly. Created attachment 112238 [details]
screenshot of console after panic
Hi Dave, I really cant make any sence out of these stack traces yet. There appears to be multiple crashes that dont look related here, do you think there is some memory corruption taking place? Can you attach a serial console for a complete OOPs/panic message? Can you get a dump? Thanks, Larry This machine has been stable for a few months now (and there have been additional kernel upgrades since then). I would guess that these were flukes, or the problem has been fixed in one of the more recent kernels. Created attachment 119170 [details]
Kernel panic screenshot from 2.6.9-11.ELsmp
We are having the same problem, except on 2.6.9-11.ELsmp. The behaviour can be duplicated easily. I have attached the screenshot above (sorry I couldn't get a text output, I'm unable to analyse the diskdump file at the moment because the kernel-debuginfo packages don't seem to exist anymore). The bug in comment #5 looks very similar to bug 156854. It has the same mpol_free_shared_policy+53. The weird thing is that bug 156854 was supposed to be fixed already. It's not 100% clear to me that the bug in comment #5 is the same as the original bug that was reported in this incident. How are you able to reproduce the bug? It says it was excuting the 'rm' command when it crashed... We are able to reproduce it very easily. It happens when building larger rpm packages such as kernel or java with rpmbuild, and occurs at the point when rpmbuild does an rm -rf on the temporary build directory. Here is some additional information from crash analysis:
SYSTEM MAP: /boot/System.map-2.6.9-11.ELsmp
DEBUG KERNEL: /usr/lib/debug/lib/modules/2.6.9-11.ELsmp/vmlinux (2.6.9-11.ELsmp)
DUMPFILE: vmcore
CPUS: 2
DATE: Thu Sep 22 15:59:16 2005
UPTIME: 49 days, 17:12:50
LOAD AVERAGE: 0.58, 0.16, 0.07
TASKS: 127
NODENAME: xxxxxxxx
RELEASE: 2.6.9-11.ELsmp
VERSION: #1 SMP Fri May 20 18:25:30 EDT 2005
MACHINE: x86_64 (3200 Mhz)
MEMORY: 4.8 GB
PANIC: ""
PID: 8177
COMMAND: "rm"
TASK: 10122be97f0 [THREAD_INFO: 1010eb96000]
CPU: 0
STATE: TASK_RUNNING (PANIC)
And a backtrace:
crash> bt -a
PID: 8177 TASK: 10122be97f0 CPU: 0 COMMAND: "rm"
#0 [1010eb97d50] start_disk_dump at ffffffffa00ef1e5
#1 [1010eb97d80] try_crashdump at ffffffff8014978e
#2 [1010eb97d90] die at ffffffff8011190b
#3 [1010eb97db0] do_general_protection at ffffffff80112255
#4 [1010eb97df0] error_exit at ffffffff80110ad9
RIP: ffffffff801dced5 RSP: 000001010eb97ea0 RFLAGS: 00010202
RAX: 2e74722f62696c2f RBX: 00000101123a6068 RCX: 000001000000e000
RDX: 0000000000000000 RSI: 000000000000006c RDI: 00000101123a6060
RBP: 000001010d052000 R8: 000001010eb97db8 R9: 0000000000000000
R10: 000001010eb97e18 R11: ffffffff80170638 R12: 00000101123a6060
R13: 000000000050d538 R14: 00000101123a6120 R15: 000000000050a040
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#5 [1010eb97e78] rb_first at ffffffff801dced5
#6 [1010eb97ea0] mpol_free_shared_policy at ffffffff8016da67
#7 [1010eb97ec0] shmem_destroy_inode at ffffffff80170649
#8 [1010eb97ed0] sys_unlink at ffffffff80181672
#9 [1010eb97f30] sys_getdents64 at ffffffff80183df4
#10 [1010eb97f50] sys_fcntl at ffffffff80183152
#11 [1010eb97f80] system_call at ffffffff8011003e
RIP: 0000003ce03b9319 RSP: 0000007fbffff440 RFLAGS: 00000246
RAX: 0000000000000057 RBX: ffffffff8011003e RCX: 0000000000000002
RDX: 0000000000000002 RSI: 000000000050d54b RDI: 000000000050d54b
RBP: 0000000000000002 R8: 0000007fbffff530 R9: 0000007fbffff534
R10: 00000000000002f8 R11: 0000000000000293 R12: 000000000050d538
R13: 000000000050d54b R14: 000000000050a040 R15: 0000007fbffff800
ORIG_RAX: 0000000000000057 CS: 0033 SS: 002b
PID: 0 TASK: 10009f84030 CPU: 1 COMMAND: "swapper"
#0 [10009fabfa0] smp_call_function_interrupt at ffffffff8011bc45
#1 [10009fabfb0] call_function_interrupt at ffffffff801108b1
--- <IRQ stack> ---
#2 [10037ed5e98] call_function_interrupt at ffffffff801108b1
RIP: ffffffff8010e6cc RSP: 0000010037ed5f48 RFLAGS: 00000246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000010009f84030 RDI: 00000100052ca5e0
RBP: 0000000000000001 R8: 0000010037ed4000 R9: 0000000000000001
R10: 0000000000000080 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: fffffffffffffffa CS: 0010 SS: 0018
#3 [10037ed5f48] cpu_idle at ffffffff8010e65c
Does this problem still occur with the latest RHEL4-U4 kernel? We have never been able to reproduce this problem so we could never figure out the cause. Larry Woodman (In reply to comment #10) > Does this problem still occur with the latest RHEL4-U4 kernel? We have never > been able to reproduce this problem so we could never figure out the cause. See comment 4. There's been no change since then. (We still haven't seen it again since then) Problem appears to have been fixed, its no longer reproducable. |