Bug 117941 - frequent kernel panics
Summary: frequent kernel panics
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
Assignee: Jim Paradis
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-03-10 09:28 UTC by David Juran
Modified: 2013-08-06 01:04 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-05-12 01:08:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
kernel panic console output (1.60 KB, text/plain)
2004-03-19 12:48 UTC, Tomas Tengling
no flags Details
Yet another kernel panic console output (2.81 KB, text/plain)
2004-03-19 12:50 UTC, Tomas Tengling
no flags Details
kernel panic console output (2.72 KB, text/plain)
2004-03-20 10:26 UTC, Tomas Tengling
no flags Details
kernel panic console output (585 bytes, text/plain)
2004-03-24 14:10 UTC, Tomas Tengling
no flags Details
kernel panic console output (2.54 KB, text/plain)
2004-03-24 16:12 UTC, Tomas Tengling
no flags Details
kernel panic console output (2.61 KB, text/plain)
2004-03-25 12:06 UTC, Tomas Tengling
no flags Details
kernel panic console output (2.43 KB, text/plain)
2004-03-25 12:08 UTC, Tomas Tengling
no flags Details
kernel panic console output (2.75 KB, text/plain)
2004-03-27 11:31 UTC, Tomas Tengling
no flags Details
kernel panic console output (2.73 KB, text/plain)
2004-03-30 14:09 UTC, Tomas Tengling
no flags Details
kernel panic console output (2.70 KB, text/plain)
2004-03-30 14:11 UTC, Tomas Tengling
no flags Details
kernel panic console output (2.69 KB, text/plain)
2004-04-06 09:46 UTC, Tomas Tengling
no flags Details
kernel panic console output (2.61 KB, text/plain)
2004-04-06 09:48 UTC, Tomas Tengling
no flags Details
kernel panic console output (2.53 KB, text/plain)
2004-04-08 13:49 UTC, Tomas Tengling
no flags Details
kernel panic console output (2.59 KB, text/plain)
2004-04-08 13:50 UTC, Tomas Tengling
no flags Details
kernel panic console output (2.60 KB, text/plain)
2004-04-08 13:51 UTC, Tomas Tengling
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2004:188 0 normal SHIPPED_LIVE Important: Updated kernel packages available for Red Hat Enterprise Linux 3 Update 2 2004-05-11 04:00:00 UTC

Description David Juran 2004-03-10 09:28:30 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030922

Description of problem:
We have several dual opeteron machines that die with kernel panic, all
in a similar fasion, see the panic text below.

Unable to handle kernel NULL pointer dereference at virtual address
0000000000000000
 printing rip:
ffffffff802cd1d0
PML4 f580e067 PGD a739d067 PMD 0
Oops: 0000
CPU 1
Pid: 31175, comm: matador Not tainted
RIP: 0010:[<ffffffff802cd1d0>]{strlen+0}
RSP: 0000:00000100f13fb880  EFLAGS: 00010216
RAX: 0000000000000001 RBX: 00000100cd491b28 RCX: 000001002d7d6880
RDX: 0000000000000001 RSI: 0000010071b02a40 RDI: 0000000000000000
RBP: 0000010071b02a40 R08: 0000000000000090 R09: 00000100ed578700
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000004b44000
R13: 00000100f13fbce8 R14: 00000000000007f4 R15: 0000000000000004

FS:  0000002a955804c0(0000) GS:ffffffff80603980(005b)
knlGS:00000000406ad2a0
CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 0000000000000000 CR3: 00000000079c4000 CR4: 00000000000006e0

Call Trace: [<ffffffff801b76dc>]{load_elf32_binary+4092}
       [<ffffffff801b83c5>]{load_elf32_binary+7397}
[<ffffffffa009b300>]{:sunrpc:rpc_run_timer+0}
       [<ffffffff80178e5c>]{notify_change+636}
[<ffffffff8016849c>]{do_coredump+540}
       [<ffffffff80130269>]{__dequeue_signal+393}
[<ffffffff801323a9>]{get_signal_to_deliver+1113}
       [<ffffffff8010feb1>]{do_signal+97}
[<ffffffff80132505>]{sys_rt_sigprocmask+213}
       [<ffffffff801a79ec>]{sys32_rt_sigprocmask+156}
[<ffffffff8011033f>]{intret_signal+45}

Process matador (pid: 31175, stackpage=100f13fb000)
Stack: 00000100f13fb880 0000000000000000 ffffffff801b76dc 0000009000000005
       0000010000000001 00000000000007f4 0000000000000001 00000100cd491800
       ffffffff801b83c5 00000100f0347000 00000100f13fba98 000002b8beb5c800
       ffffffffffffffff 0000000000002000 0000003d64136040 0000010000000000
       0000000100000000 0000010071b02a40 00000100641360d8 00000100f0347000
       00000100f13fbae8 04b2100000000001 00000000fffdb000 0002300000023000
       0000100000000007 000007f400000004 0000000000000000 0000000000000fe0
       0000000000000000 000000000120027f 0000002300000000 0000002b00000000
       0000ffff00001f80 c264f5e126dbf800 000000000000400e a10b44b71d983000
       0000000000004000 e8881ca5a4fe0000 0000000000003ff9 8637bd05af6c6800

Call Trace: [<ffffffff801b76dc>]{load_elf32_binary+4092}
       [<ffffffff801b83c5>]{load_elf32_binary+7397}
[<ffffffffa009b300>]{:sunrpc:rpc_run_timer+0}
       [<ffffffff80178e5c>]{notify_change+636}
[<ffffffff8016849c>]{do_coredump+540}
       [<ffffffff80130269>]{__dequeue_signal+393}
[<ffffffff801323a9>]{get_signal_to_deliver+1113}
       [<ffffffff8010feb1>]{do_signal+97}
[<ffffffff80132505>]{sys_rt_sigprocmask+213}
       [<ffffffff801a79ec>]{sys32_rt_sigprocmask+156}
[<ffffffff8011033f>]{intret_signal+45}


Code: 80 3f 00 48 89 f8 74 08 48 ff c0 80 38 00 75 f8 48 29 f8 c3

Kernel panic: Fatal exception


Version-Release number of selected component (if applicable):
kernel-smp-2.4.21-9.0.1.EL

How reproducible:
Always

Steps to Reproduce:
1. let the computer run for a while...


Additional info:

Comment 1 David Juran 2004-03-10 09:40:33 UTC
Here is another similar panic which I got twice tonight...

Unable to handle kernel NULL pointer dereference at virtual address
0000000000000000
 printing rip:
ffffffff802cd210
PML4 3a87f067 PGD c4fa067 PMD 0 
Oops: 0000
CPU 1 
Pid: 7567, comm: SNCF_CCR_CON Not tainted
RIP: 0010:[<ffffffff802cd210>]{strlen+0}
RSP: 0000:0000010043273880  EFLAGS: 00010216
RAX: 0000000000000001 RBX: 0000010091e2c328 RCX: 00000100bea09740
RDX: 0000000000000001 RSI: 00000100f58203c0 RDI: 0000000000000000
RBP: 00000100f58203c0 R08: 0000000000000090 R09: 00000100fa3b7a80
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000016164000
R13: 0000010043273ce8 R14: 0000000000000574 R15: 0000000000000004
FS:  0000002a955804c0(0000) GS:ffffffff806039c0(005b)
knlGS:00000000405b5780
CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 0000000000000000 CR3: 00000000079c4000 CR4: 00000000000006e0


Call Trace: [<ffffffff801b771c>]{load_elf32_binary+4092} 
       [<ffffffff801b8405>]{load_elf32_binary+7397}
[<ffffffffa009b300>]{:sunrpc:rpc_run_timer+0} 
       [<ffffffff80178e9c>]{notify_change+636}
[<ffffffff801684dc>]{do_coredump+540} 
       [<ffffffff80130269>]{__dequeue_signal+393}
[<ffffffff801323a9>]{get_signal_to_deliver+1113} 
       [<ffffffff8010feb1>]{do_signal+97}
[<ffffffff801454a8>]{generic_file_write+296} 
       [<ffffffffa00c069e>]{:nfs:nfs_file_write+206}
[<ffffffff801109af>]{error_signal_test+0} 
       
Process SNCF_CCR_CON (pid: 7567, stackpage=10043273000)
Stack: 0000010043273880 0000000000000000 ffffffff801b771c
0000009000000005 
       0000010000000001 0000000000000574 0000000000000001
0000010091e2c000 
       ffffffff801b8405 00000100fa13dc00 0000010043273a98
000002b89f44f800 
       ffffffffffffffff 0000000000002000 0000002939238040
0000010000000000 
       0000000100000000 00000100f58203c0 0000010039238208
00000100fa13dc00 
       0000010043273ae8 1614500000000001 00000000fffdf000
0001f0000001f000 
       0000100000000007 0000057400000004 0000000000000000
0000000000000fe0 
       0000000000000000 000000000020027f 0000002300000000
0000002b00000000 
       0000ffff00001f80 ade8000000000000 000000000000400a
b71b00000218e000 
       0000000000004011 8000000000000000 000000000000bffe
d6bf94d5e57a4000 
Call Trace: [<ffffffff801b771c>]{load_elf32_binary+4092} 
       [<ffffffff801b8405>]{load_elf32_binary+7397}
[<ffffffffa009b300>]{:sunrpc:rpc_run_timer+0} 
       [<ffffffff80178e9c>]{notify_change+636}
[<ffffffff801684dc>]{do_coredump+540} 
       [<ffffffff80130269>]{__dequeue_signal+393}
[<ffffffff801323a9>]{get_signal_to_deliver+1113} 

       [<ffffffff8010feb1>]{do_signal+97}
[<ffffffff801454a8>]{generic_file_write+296} 
       [<ffffffffa00c069e>]{:nfs:nfs_file_write+206}
[<ffffffff801109af>]{error_signal_test+0} 
       

Code: 80 3f 00 48 89 f8 74 08 48 ff c0 80 38 00 75 f8 48 29 f8 c3

Kernel panic: Fatal exception


Comment 2 Tomas Tengling 2004-03-19 12:48:08 UTC
Created attachment 98677 [details]
kernel panic console output

Console output on our 'catoosa' computer when it got a panic tonight.
From the same set of 8 computers where the panics above happened.
All of them are 'up2date'.

Comment 3 Tomas Tengling 2004-03-19 12:50:32 UTC
Created attachment 98678 [details]
Yet another kernel panic console output

Console output on our 'colusa' computer when it got a panic tonight.

Comment 4 Tomas Tengling 2004-03-19 14:47:36 UTC
Some hardware information about the computers where the kernel panics:

Motherboard Rioworks HDAMA
Phoenex Server Bios 3 Release 6.0, HDAMA ver 1.82 11/27/03 15:30:08,
Rhapsody Mainboard
Dual AMD Opteron 246 cpu
4 GB ECC DDR memory (Ventura Tech VDDR-512-P320ERLP) (8 memory slots
in total, 0 free)
36 GB U320 SCSI disk (Seagate Cheetah ST336607LC)
Slim Mitsumi CD-ROM (ATA) & floppy
Adapted 29320 U320 SCSI PCI-X 133 card in the only slot
1U rackmounted case with 460 W power supply

Comment 5 Tomas Tengling 2004-03-20 10:26:08 UTC
Created attachment 98707 [details]
kernel panic console output

Console output on our 'oroville' computer when it got a panic this morning.

Comment 6 Tomas Tengling 2004-03-24 14:10:29 UTC
Created attachment 98827 [details]
kernel panic console output

Console output on our 'eastmoriches' computer when it got a panic a few minutes
ago.

Comment 7 Tomas Tengling 2004-03-24 14:24:26 UTC
Comment on attachment 98827 [details]
kernel panic console output

Unable to handle kernel NULL pointer dereference at virtual address
0000000000000000
 printing rip:
ffffffff802cd210
PML4 11c8d067 PGD d8974067 PMD 0 
Oops: 0000
CPU 1 
Pid: 24350, comm: matador Not tainted
RIP: 0010:[<ffffffff802cd210>]{strlen+0}
RSP: 0000:0000010055b4d880  EFLAGS: 00010216
RAX: 0000000000000001 RBX: 000001002aee2f28 RCX: 000001007f5ad400
RDX: 0000000000000001 RSI: 00000100eccb3c00 RDI: 0000000000000000
RBP: 00000100eccb3c00 R08: 0000000000000090 R09: 00000100be6ae400
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000005055000
R13: 0000010055b4dce8 R14: 0000000000000874 R15: 0000000000000004
FS:  0000002a955814c0(0000) GS:ffffffff806039c0(005b) knlGS:00000000406ae2a0
CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 0000000000000000 CR3: 000000000daee000 CR4: 00000000000006e0

Call Trace: [<ffffffff801b771c>]{load_elf32_binary+4092} 
       [<ffffffff801b8405>]{load_elf32_binary+7397}
[<ffffffffa009b300>]{:sunrpc:rpc_run_timer+0} 
       [<ffffffff80178e9c>]{notify_change+636}
[<ffffffff801684dc>]{do_coredump+540} 
       [<ffffffff80130269>]{__dequeue_signal+393}
[<ffffffff801323a9>]{get_signal_to_deliver+1113} 
       [<ffffffff8010feb1>]{do_signal+97}
[<ffffffff801454a8>]{generic_file_write+296} 
       [<ffffffffa00c069e>]{:nfs:nfs_file_write+206}
[<ffffffff801109af>]{error_signal_test+0} 

Process matador (pid: 24350, stackpage=10055b4d000)
Stack: 0000010055b4d880 0000000000000000 ffffffff801b771c 0000009000000005 
       0000010000000001 0000000000000874 0000000000000001 000001002aee2c00 
       ffffffff801b8405 00000100afce6200 0000010055b4da98 000002b8d0130800 
       ffffffffffffffff 0000000000002000 00000041e7822040 0000010000000000 
       0000000100000000 00000100eccb3c00 00000100e7822598 00000100afce6200 
       0000010055b4dae8 0503200000000001 00000000fffdb000 0002300000023000 
       0000100000000007 0000087400000004 0000000000000000 0000000000000fe0 
       0000000000000000 000000000120027f 0000002300000000 0000002b00000000 
       0000ffff00001f80 d4dd0971f8cd7000 0000000000004007 8637bd05af6c6800 
       0000000000003feb 8000000000000000 0000000000004002 9fae147ae147b000 
Call Trace: [<ffffffff801b771c>]{load_elf32_binary+4092} 
       [<ffffffff801b8405>]{load_elf32_binary+7397}
[<ffffffffa009b300>]{:sunrpc:rpc_run_timer+0} 
       [<ffffffff80178e9c>]{notify_change+636}
[<ffffffff801684dc>]{do_coredump+540} 
       [<ffffffff80130269>]{__dequeue_signal+393}
[<ffffffff801323a9>]{get_signal_to_deliver+1113} 
       [<ffffffff8010feb1>]{do_signal+97}
[<ffffffff801454a8>]{generic_file_write+296} 
       [<ffffffffa00c069e>]{:nfs:nfs_file_write+206}
[<ffffffff801109af>]{error_signal_test+0} 

Code: 80 3f 00 48 89 f8 74 08 48 ff c0 80 38 00 75 f8 48 29 f8 c3

Kernel panic: Fatal exception

Comment 8 Tomas Tengling 2004-03-24 16:12:57 UTC
Created attachment 98830 [details]
kernel panic console output

Console output on our 'morgantown' computer when it got a panic a few minutes
ago.

Comment 9 Tomas Tengling 2004-03-25 12:06:32 UTC
Created attachment 98847 [details]
kernel panic console output

Console output on our 'chester' computer when it got a panic this morning.

Comment 10 Tomas Tengling 2004-03-25 12:08:02 UTC
Created attachment 98848 [details]
kernel panic console output

Console output on our 'beckwourth' computer when it got a panic this morning.

Comment 11 Tomas Tengling 2004-03-27 11:31:42 UTC
Created attachment 98900 [details]
kernel panic console output

Console output on our 'hamburg' computer when it got a panic this morning.

Comment 12 Tomas Tengling 2004-03-30 14:09:09 UTC
Created attachment 98961 [details]
kernel panic console output

Console output on our 'hamburg' computer when it got a panic today.

Comment 13 Tomas Tengling 2004-03-30 14:11:10 UTC
Created attachment 98962 [details]
kernel panic console output

Console output on our 'morgantown' computer when it got a panic today.	This
time it was running the non-smp kernel but it still crashed!

Comment 14 Tomas Tengling 2004-03-31 14:09:55 UTC
I see that there is a 2.4.21-12 kernel in the beta channel.  Is that
something that could contain fixes that solves our problem?

Do you want any more outputs from our kernel panics or have you got
enough of them by now?


Comment 15 Tomas Tengling 2004-04-06 09:46:51 UTC
Created attachment 99132 [details]
kernel panic console output

Console output on our 'oroville' computer when it got a panic today.

Comment 16 Tomas Tengling 2004-04-06 09:48:58 UTC
Created attachment 99133 [details]
kernel panic console output

Console output on our 'colusa' computer when it got a panic a few minutes ago.

Is anything happening regarding this issue?  It's causing major problems for
us.

Comment 17 Tomas Tengling 2004-04-08 13:49:49 UTC
Created attachment 99236 [details]
kernel panic console output

Console output on our 'catoosa' computer when it got a panic today.

Comment 18 Tomas Tengling 2004-04-08 13:50:44 UTC
Created attachment 99237 [details]
kernel panic console output

Console output on our 'hamburg' computer when it got a panic today.

Comment 19 Tomas Tengling 2004-04-08 13:51:32 UTC
Created attachment 99238 [details]
kernel panic console output

Console output on our 'beckwourth' computer when it got a panic today.

Comment 20 Don Howard 2004-04-09 00:44:28 UTC
Each of these crashes is occuring at the same address with similar 
backtraces, so there is no need to post any further kernel panics. 
 
Thank you. =) 

Comment 21 Don Howard 2004-04-14 01:46:17 UTC

*** This bug has been marked as a duplicate of 113890 ***

Comment 22 John Flanagan 2004-05-12 01:08:37 UTC
An errata has been issued which should help the problem described in this bug report. 
This report is therefore being closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, please follow the link below. You may reopen 
this bug report if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2004-188.html



Note You need to log in before you can comment on or make changes to this bug.