Bug 151100

Summary:

Kernel panic - not syncing: Oops

Product:

Red Hat Enterprise Linux 4

Reporter:

Dave Miller <justdave>

Component:

kernel

Assignee:

Larry Woodman <lwoodman>

Status:

CLOSED WORKSFORME

QA Contact:

Brian Brock <bbrock>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

4.0

CC:

blizzard, davej, managed, riel

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2007-07-10 15:31:43 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
screenshot of console after panic	none
Kernel panic screenshot from 2.6.9-11.ELsmp	none

Description Dave Miller 2005-03-14 22:04:09 UTC

Description of problem:
The server kernel panicked with the message in the summary as the
error message.

Version-Release number of selected component (if applicable):
kernel-smp-2.6.9-6.16.EL

Additional info:

This is the third time this same machine has panicked since we
installed RHEL 4 on it.  It's had a different error every time, so
separate bugs have been filed for each incident.  The previous bugs
are bug 150044 and bug 150743.

The panic log did NOT get written to the logfile.  The following is
what was visible on the console prior to rebooting:

RBP: 00000100065cb440 R08: 00000101f8f1cff0 R09: 00000101f8f1c088
R10: 00000101f8f1cfa8 R11: 0000000000000001 R12: 0000010000012780
R13: 0000000000000000 R14: 0000000000000000 r15: 000001022fde7e48
FS:  0000000000000000(0000) GS:ffffffff804c0c00(0000)
knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000070 CR3: 000000000e3f4000 CR4: 00000000000006e0
Process kswapd0 (pid: 66, threadinfo 000001022fde6000, task
000001022fd947f0)
Stack: 00000101f8f1cff0 ffffffff80174ca5 00000000000000d0 0000000000000000
       0000000000000001 00000100065cb440 0000010000012780
ffffffff8015ebdd
       0000001700000001 ffffffff00000000
Call Trace:<ffffffff80174ca5>{try_to_free_buffers+67}
<ffffffff8015ebdd>{shrink_zone+3369}
       <ffffffff8015f3a8>{balance_pgdat+506}
<ffffffff8015f5f2>{kswapd+252}
       <ffffffff80133686>{autoremove_wake_function+0}
<ffffffff80130bd5>{finish_task_switch+55}
       <ffffffff80133686>{autoremove_wake_function+0}
<ffffffff80130c24>{schedule_tail+11}
       <ffffffff80110c87>{child_rip+8} <ffffffff8015f4f6>{kswapd+0}
       <ffffffff80110c7f>{child_rip+0}

Code: f0 0f ba 68 70 10 8b 11 8b 41 18 83 e2 06 09 d0 75 51 48 8b
RIP <ffffffff80174a20>{drop_buffers+39} RSP <000001022fde7b28>
CR2: 0000000000000070
 <0>Kernel panic - not syncing: Oops

Comment 1 Dave Miller 2005-03-23 00:58:23 UTC

panicked again today with this error again.  Got a screenshot this time, will
attach shortly.

Comment 2 Dave Miller 2005-03-23 01:01:35 UTC

Created attachment 112238 [details]
screenshot of console after panic

Comment 3 Larry Woodman 2005-05-11 14:12:08 UTC

Hi Dave, I really cant make any sence out of these stack traces yet.  There
appears to be multiple crashes that dont look related here, do you think there
is some memory corruption taking place?  Can you attach a serial console for a
complete OOPs/panic message? Can you get a dump?

Thanks, Larry

Comment 4 Dave Miller 2005-08-28 04:52:26 UTC

This machine has been stable for a few months now (and there have been
additional kernel upgrades since then).  I would guess that these were flukes,
or the problem has been fixed in one of the more recent kernels.

Comment 5 Anchor Systems Managed Hosting 2005-09-23 00:59:15 UTC

Created attachment 119170 [details]
Kernel panic screenshot from 2.6.9-11.ELsmp

Comment 6 Anchor Systems Managed Hosting 2005-09-23 01:02:38 UTC

We are having the same problem, except on 2.6.9-11.ELsmp. The behaviour can be
duplicated easily. I have attached the screenshot above (sorry I couldn't get a
text output, I'm unable to analyse the diskdump file at the moment because the
kernel-debuginfo packages don't seem to exist anymore).

Comment 7 Dan Carpenter 2005-09-23 06:49:16 UTC

The bug in comment #5 looks very similar to bug 156854.  It has the same
mpol_free_shared_policy+53.  The weird thing is that bug 156854 was supposed to
be fixed already.

It's not 100% clear to me that the bug in comment #5 is the same as the original
bug that was reported in this incident.

How are you able to reproduce the bug?  It says it was excuting the 'rm' command
when it crashed...

Comment 8 Anchor Systems Managed Hosting 2005-09-23 06:55:30 UTC

We are able to reproduce it very easily. It happens when building larger rpm
packages such as kernel or java with rpmbuild, and occurs at the point when
rpmbuild does an rm -rf on the temporary build directory.

Comment 9 Anchor Systems Managed Hosting 2005-09-27 02:55:55 UTC

Here is some additional information from crash analysis:

  SYSTEM MAP: /boot/System.map-2.6.9-11.ELsmp
DEBUG KERNEL: /usr/lib/debug/lib/modules/2.6.9-11.ELsmp/vmlinux (2.6.9-11.ELsmp)
    DUMPFILE: vmcore
        CPUS: 2
        DATE: Thu Sep 22 15:59:16 2005
      UPTIME: 49 days, 17:12:50
LOAD AVERAGE: 0.58, 0.16, 0.07
       TASKS: 127
    NODENAME: xxxxxxxx
     RELEASE: 2.6.9-11.ELsmp
     VERSION: #1 SMP Fri May 20 18:25:30 EDT 2005
     MACHINE: x86_64  (3200 Mhz)
      MEMORY: 4.8 GB
       PANIC: ""
         PID: 8177
     COMMAND: "rm"
        TASK: 10122be97f0  [THREAD_INFO: 1010eb96000]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

And a backtrace:

crash> bt -a
PID: 8177   TASK: 10122be97f0       CPU: 0   COMMAND: "rm"
 #0 [1010eb97d50] start_disk_dump at ffffffffa00ef1e5
 #1 [1010eb97d80] try_crashdump at ffffffff8014978e
 #2 [1010eb97d90] die at ffffffff8011190b
 #3 [1010eb97db0] do_general_protection at ffffffff80112255
 #4 [1010eb97df0] error_exit at ffffffff80110ad9
    RIP: ffffffff801dced5  RSP: 000001010eb97ea0  RFLAGS: 00010202
    RAX: 2e74722f62696c2f  RBX: 00000101123a6068  RCX: 000001000000e000
    RDX: 0000000000000000  RSI: 000000000000006c  RDI: 00000101123a6060
    RBP: 000001010d052000   R8: 000001010eb97db8   R9: 0000000000000000
    R10: 000001010eb97e18  R11: ffffffff80170638  R12: 00000101123a6060
    R13: 000000000050d538  R14: 00000101123a6120  R15: 000000000050a040
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #5 [1010eb97e78] rb_first at ffffffff801dced5
 #6 [1010eb97ea0] mpol_free_shared_policy at ffffffff8016da67
 #7 [1010eb97ec0] shmem_destroy_inode at ffffffff80170649
 #8 [1010eb97ed0] sys_unlink at ffffffff80181672
 #9 [1010eb97f30] sys_getdents64 at ffffffff80183df4
#10 [1010eb97f50] sys_fcntl at ffffffff80183152
#11 [1010eb97f80] system_call at ffffffff8011003e
    RIP: 0000003ce03b9319  RSP: 0000007fbffff440  RFLAGS: 00000246
    RAX: 0000000000000057  RBX: ffffffff8011003e  RCX: 0000000000000002
    RDX: 0000000000000002  RSI: 000000000050d54b  RDI: 000000000050d54b
    RBP: 0000000000000002   R8: 0000007fbffff530   R9: 0000007fbffff534
    R10: 00000000000002f8  R11: 0000000000000293  R12: 000000000050d538
    R13: 000000000050d54b  R14: 000000000050a040  R15: 0000007fbffff800
    ORIG_RAX: 0000000000000057  CS: 0033  SS: 002b

PID: 0      TASK: 10009f84030       CPU: 1   COMMAND: "swapper"
 #0 [10009fabfa0] smp_call_function_interrupt at ffffffff8011bc45
 #1 [10009fabfb0] call_function_interrupt at ffffffff801108b1
--- <IRQ stack> ---
 #2 [10037ed5e98] call_function_interrupt at ffffffff801108b1
    RIP: ffffffff8010e6cc  RSP: 0000010037ed5f48  RFLAGS: 00000246
    RAX: 0000000000000000  RBX: 0000000000000000  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: 0000010009f84030  RDI: 00000100052ca5e0
    RBP: 0000000000000001   R8: 0000010037ed4000   R9: 0000000000000001
    R10: 0000000000000080  R11: 0000000000000001  R12: 0000000000000000
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: fffffffffffffffa  CS: 0010  SS: 0018
 #3 [10037ed5f48] cpu_idle at ffffffff8010e65c

Comment 10 Larry Woodman 2006-12-08 14:17:42 UTC

Does this problem still occur with the latest RHEL4-U4 kernel?  We have never
been able to reproduce this problem so we could never figure out the cause.

Larry Woodman

Comment 11 Dave Miller 2006-12-09 04:59:56 UTC

(In reply to comment #10)
> Does this problem still occur with the latest RHEL4-U4 kernel?  We have never
> been able to reproduce this problem so we could never figure out the cause.

See comment 4.  There's been no change since then.  (We still haven't seen it
again since then)

Comment 12 Larry Woodman 2007-07-10 15:31:43 UTC

Problem appears to have been fixed, its no longer reproducable.