Bug 91230 - Kernel BUG at page_alloc WITHOUT nvidia driver
Kernel BUG at page_alloc WITHOUT nvidia driver
Status: CLOSED NOTABUG
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
7.3
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Arjan van de Ven
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2003-05-20 03:48 EDT by Sebastien BLAISOT
Modified: 2007-04-18 12:53 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2003-06-10 11:35:32 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
traces of previous hangs (119.72 KB, text/plain)
2003-06-10 03:11 EDT, Sebastien BLAISOT
no flags Details

  None (edit)
Description Sebastien BLAISOT 2003-05-20 03:48:15 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; fr-FR; rv:1.3) Gecko/20030313

Description of problem:
I've seen some kernel bug traces in my /var/log/messages (see below).
Today, I've lost my mysqld daemon (or at least a part of it, some threads were
always running), but the service didn't work.

kernel is kernel-2.4.20-13.7, system is RH 7.3 + last updates.


I've already seen this kind of thing with a perl process.

any idea ?

here are the logs below (don't know if the three are related)


Version-Release number of selected component (if applicable): 
2.4.20-13.7


How reproducible:
Couldn't Reproduce

Steps to Reproduce:
unable to reproduce it.    

Additional info:

May 18 04:22:02 csvisu1g kernel: memory.c:170: bad pmd 000000c3.


May 18 05:00:18 csvisu1g kernel: Unable to handle kernel NULL pointer
dereference at virtual address 000000bd
May 18 05:00:18 csvisu1g kernel:  printing eip:
May 18 05:00:18 csvisu1g kernel: c013ab90
May 18 05:00:18 csvisu1g kernel: *pde = 00000000
May 18 05:00:18 csvisu1g kernel: Oops: 0000
May 18 05:00:18 csvisu1g kernel: autofs eepro100 mii ehci-hcd usb-uhci usbcore
ext3 jbd DAC960 aic7xxx sd_mod scsi_mod
May 18 05:00:18 csvisu1g kernel: CPU:    0
May 18 05:00:18 csvisu1g kernel: EIP:    0010:[<c013ab90>]    Not tainted
May 18 05:00:18 csvisu1g kernel: EFLAGS: 00010206
May 18 05:00:18 csvisu1g kernel:
May 18 05:00:18 csvisu1g kernel: EIP is at page_referenced [kernel] 0x140
(2.4.20-13.7)
May 18 05:00:18 csvisu1g kernel: eax: 00000000   ebx: c1da26f4   ecx: f7feb880 
 edx: 00000001
May 18 05:00:18 csvisu1g kernel: esi: 00000041   edi: 00000003   ebp: c1da26d8 
 esp: f7fa5f7c
May 18 05:00:18 csvisu1g kernel: ds: 0018   es: 0018   ss: 0018
May 18 05:00:18 csvisu1g kernel: Process kscand (pid: 6, stackpage=f7fa5000)
May 18 05:00:18 csvisu1g kernel: Stack: 00000000 00000000 00000001 f7fa5fac
c1da26f4 c1da26d8 00000003 000001f4
May 18 05:00:18 csvisu1g kernel:        c013397d f7fa5fac c1c8658c c02def1c
00000001 f7fa4000 c02dedd0 00000003
May 18 05:00:18 csvisu1g kernel:        000001f4 c0135769 c02dedd0 00000003
00000001 f7fa4000 00000002 00000000
May 18 05:00:18 csvisu1g kernel: Call Trace:   [<c013397d>] scan_active_list
[kernel] 0x5d (0xf7fa5f9c))
May 18 05:00:18 csvisu1g kernel: [<c0135769>] kscand [kernel] 0xc9 (0xf7fa5fc0))
May 18 05:00:18 csvisu1g kernel: [<c0105000>] stext [kernel] 0x0 (0xf7fa5fe8))
May 18 05:00:18 csvisu1g kernel: [<c0107146>] arch_kernel_thread [kernel] 0x26
(0xf7fa5ff0))
May 18 05:00:18 csvisu1g kernel: [<c01356a0>] kscand [kernel] 0x0 (0xf7fa5ff8))
May 18 05:00:18 csvisu1g kernel:
May 18 05:00:18 csvisu1g kernel:
May 18 05:00:18 csvisu1g kernel: Code: 8b 46 7c bf 1e 00 00 00 85 c0 0f 84 16 01
00 00 8b 1d 30 b2



May 20 08:58:21 csvisu1g kernel:  ------------[ cut here ]------------
May 20 08:58:21 csvisu1g kernel: kernel BUG at page_alloc.c:131!
May 20 08:58:21 csvisu1g kernel: invalid operand: 0000
May 20 08:58:21 csvisu1g kernel: autofs eepro100 mii ehci-hcd usb-uhci usbcore
ext3 jbd DAC960 aic7xxx sd_mod scsi_mod
May 20 08:58:21 csvisu1g kernel: CPU:    0
May 20 08:58:21 csvisu1g kernel: EIP:    0010:[<c0135b66>]    Not tainted
May 20 08:58:21 csvisu1g kernel: EFLAGS: 00010206
May 20 08:58:21 csvisu1g kernel:
May 20 08:58:21 csvisu1g kernel: EIP is at __free_pages_ok [kernel] 0x106
(2.4.20-13.7)
May 20 08:58:21 csvisu1g kernel: eax: 02001000   ebx: c1da26d8   ecx: c1000030 
 edx: 00000041
May 20 08:58:21 csvisu1g kernel: esi: 00000000   edi: 00000000   ebp: 00000000 
 esp: c9e15cf0
May 20 08:58:21 csvisu1g kernel: ds: 0018   es: 0018   ss: 0018
May 20 08:58:21 csvisu1g kernel: Process mysqld (pid: 19021, stackpage=c9e15000)
May 20 08:58:21 csvisu1g kernel: Stack: c1000030 c1cf1288 c02dedd0 c1c40030
c1da26d8 00000000 c0129d0a c1da26d8
May 20 08:58:21 csvisu1g kernel:        c1da26d8 00000000 00000000 f77eba28
c012a018 00000000 00000001 c9e15d98
May 20 08:58:21 csvisu1g kernel:        00000000 f48688c0 ceb782d0 00000000
00000000 f48688c0 c9e15dcc f77eb980
May 20 08:58:21 csvisu1g kernel: Call Trace:   [<c0129d0a>] remove_inode_page
[kernel] 0x1a (0xc9e15d08))
May 20 08:58:21 csvisu1g kernel: [<c012a018>] truncate_list_pages [kernel] 0x1a8
(0xc9e15d20))
May 20 08:58:21 csvisu1g kernel: [<f8878161>] ext3_reserve_inode_write [ext3]
0x31 (0xc9e15d50))
May 20 08:58:21 csvisu1g kernel: [<c012a09b>] truncate_inode_pages [kernel] 0x3b
(0xc9e15d94))
May 20 08:58:21 csvisu1g kernel: [<c0127c5b>] vmtruncate [kernel] 0x9b (0xc9e15dac))
May 20 08:58:21 csvisu1g kernel: [<c01514e6>] inode_setattr [kernel] 0x26
(0xc9e15dd0))
May 20 08:58:21 csvisu1g kernel: [<f887800d>] ext3_setattr [ext3] 0x19d
(0xc9e15df0))
May 20 08:58:21 csvisu1g kernel: [<c0151665>] notify_change [kernel] 0x55
(0xc9e15e40))
May 20 08:58:21 csvisu1g kernel: [<f8856404>] DAC960_ProcessRequest [DAC960]
0xf4 (0xc9e15e58))
May 20 08:58:21 csvisu1g kernel: [<f8856430>] DAC960_RequestFunction [DAC960]
0x20 (0xc9e15e74))
May 20 08:58:21 csvisu1g kernel: [<c013bfc6>] do_truncate [kernel] 0x46
(0xc9e15e8c))
May 20 08:58:21 csvisu1g kernel: [<c0147d41>] open_namei [kernel] 0x421
(0xc9e15ed4))
May 20 08:58:21 csvisu1g kernel: [<c014673e>] getname [kernel] 0x5e (0xc9e15f0c))
May 20 08:58:21 csvisu1g kernel: [<c0147832>] __user_walk [kernel] 0x32
(0xc9e15f28))
May 20 08:58:21 csvisu1g kernel: [<c013cf26>] dentry_open [kernel] 0xe6
(0xc9e15f48))
May 20 08:58:21 csvisu1g kernel: [<c013ce1c>] filp_open [kernel] 0x3c (0xc9e15f70))
May 20 08:58:21 csvisu1g kernel: [<c013d154>] sys_open [kernel] 0x34 (0xc9e15fa8))
May 20 08:58:22 csvisu1g kernel: [<c01088c3>] system_call [kernel] 0x33
(0xc9e15fc0))
May 20 08:58:22 csvisu1g kernel:
May 20 08:58:22 csvisu1g kernel:
May 20 08:58:22 csvisu1g kernel: Code: 0f 0b 83 00 16 3c 23 c0 8b 43 18 89 f9 89
dd 83 e0 eb 89 43
Comment 1 Sebastien BLAISOT 2003-05-20 06:21:09 EDT
If Useful, I passed the oops through ksymoops. Here are the results :

ksymoops 2.4.4 on i686 2.4.20-13.7.  Options used
     -V (default)
     -k /proc/ksyms (specified)
     -l /proc/modules (default)
     -o /lib/modules/2.4.20-13.7/ (default)
     -m /boot/System.map-2.4.20-13.7 (default)

kernel BUG at page_alloc.c:131!
invalid operand: 0000
CPU:    0
EIP:    0010:[<c0135b66>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010206
eax: 02001000   ebx: c1da26d8   ecx: c1000030   edx: 00000041
esi: 00000000   edi: 00000000   ebp: 00000000   esp: c9e15cf0
ds: 0018   es: 0018   ss: 0018
Process mysqld (pid: 19021, stackpage=c9e15000)
Stack: c1000030 c1cf1288 c02dedd0 c1c40030 c1da26d8 00000000 c0129d0a c1da26d8
       c1da26d8 00000000 00000000 f77eba28 c012a018 00000000 00000001 c9e15d98
       00000000 f48688c0 ceb782d0 00000000 00000000 f48688c0 c9e15dcc f77eb980
Call Trace:   [<c0129d0a>] remove_inode_page [kernel] 0x1a (0xc9e15d08))
[<c012a018>] truncate_list_pages [kernel] 0x1a8 (0xc9e15d20))
[<f8878161>] ext3_reserve_inode_write [ext3] 0x31 (0xc9e15d50))
[<c012a09b>] truncate_inode_pages [kernel] 0x3b (0xc9e15d94))
[<c0127c5b>] vmtruncate [kernel] 0x9b (0xc9e15dac))
[<c01514e6>] inode_setattr [kernel] 0x26 (0xc9e15dd0))
[<f887800d>] ext3_setattr [ext3] 0x19d (0xc9e15df0))
[<c0151665>] notify_change [kernel] 0x55 (0xc9e15e40))
[<f8856404>] DAC960_ProcessRequest [DAC960] 0xf4 (0xc9e15e58))
[<f8856430>] DAC960_RequestFunction [DAC960] 0x20 (0xc9e15e74))
[<c013bfc6>] do_truncate [kernel] 0x46 (0xc9e15e8c))
[<c0147d41>] open_namei [kernel] 0x421 (0xc9e15ed4))
[<c014673e>] getname [kernel] 0x5e (0xc9e15f0c))
[<c0147832>] __user_walk [kernel] 0x32 (0xc9e15f28))
[<c013cf26>] dentry_open [kernel] 0xe6 (0xc9e15f48))
[<c013ce1c>] filp_open [kernel] 0x3c (0xc9e15f70))
[<c013d154>] sys_open [kernel] 0x34 (0xc9e15fa8))
Code: 0f 0b 83 00 16 3c 23 c0 8b 43 18 89 f9 89 dd 83 e0 eb 89 43

>>EIP; c0135b66 <__free_pages_ok+106/370>   <=====
Trace; c0129d0a <remove_inode_page+1a/20>
Trace; c012a018 <truncate_list_pages+1a8/1f0>
Trace; f8878161 <[ext3]ext3_reserve_inode_write+31/b0>
Trace; c012a09b <truncate_inode_pages+3b/70>
Trace; c0127c5b <vmtruncate+9b/120>
Trace; c01514e6 <inode_setattr+26/e0>
Trace; f887800d <[ext3]ext3_setattr+19d/1e0>
Trace; c0151665 <notify_change+55/150>
Trace; f8856404 <[DAC960]DAC960_ProcessRequest+f4/100>
Trace; f8856430 <[DAC960]DAC960_RequestFunction+20/30>
Trace; c013bfc6 <do_truncate+46/60>
Trace; c0147d41 <open_namei+421/580>
Trace; c014673e <getname+5e/a0>
Trace; c0147832 <__user_walk+32/40>
Trace; c013cf26 <dentry_open+e6/190>
Trace; c013ce1c <filp_open+3c/60>
Trace; c013d154 <sys_open+34/80>
Code;  c0135b66 <__free_pages_ok+106/370>
00000000 <_EIP>:
Code;  c0135b66 <__free_pages_ok+106/370>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c0135b68 <__free_pages_ok+108/370>
   2:   83 00 16                  addl   $0x16,(%eax)
Code;  c0135b6b <__free_pages_ok+10b/370>
   5:   3c 23                     cmp    $0x23,%al
Code;  c0135b6d <__free_pages_ok+10d/370>
   7:   c0 8b 43 18 89 f9 89      rorb   $0x89,0xf9891843(%ebx)
Code;  c0135b74 <__free_pages_ok+114/370>
   e:   dd 83 e0 eb 89 43         fldl   0x4389ebe0(%ebx)

Comment 2 Sebastien BLAISOT 2003-05-26 03:55:51 EDT
somebody to look at this ?
Comment 3 Alan Cox 2003-06-08 09:26:50 EDT
Its hard to tell. It looks like memory corruption (either hardware or kernel).
Upgrading to the latest kernel might help but I see no reason to be sure of
that. Can you run memtest86 on the system for a few hours (www.memtest86.com) so
we can see if it appears to be bad ram

Where there any older kernels that were stable ?
Comment 4 Sebastien BLAISOT 2003-06-10 03:11:30 EDT
Created attachment 92293 [details]
traces of previous hangs

This is the kernel messages I have in my /var/log/messages files.
Comment 5 Sebastien BLAISOT 2003-06-10 03:19:27 EDT
I added as an attachement the traces of all previous crashes, if this can help.
Unfortunately, I didn't tried to send them throught ksymoops, and they are not
all related to the same kernel as I used each time the last available kernel.
However, the last crash is with the last available kernel.

I also thought of a memory related problem too and changed the memory stick last
friday. I'm currently running memtest86 on another computer with the original
RAM of the computer that was crashing every time.

This is very difficult to reproduce. Didn't find a way to reproduce the problem
each time.
Comment 6 Sebastien BLAISOT 2003-06-10 11:35:32 EDT
OK, after 8 hours of memtest86, there was no error. But after I anabled ECC (in
memtest86 config), It found 4 errors in only 1 hour.

seems to be a hardware problem.

thanks for your help.

Note You need to log in before you can comment on or make changes to this bug.