Bug 56426

Summary:	random OOPS upon heavy load with AMD processor
Product:	[Retired] Red Hat Linux	Reporter:	Hetz Ben Hamo <hetz>
Component:	kernel	Assignee:	Arjan van de Ven <arjanv>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Brock Organ <borgan>
Severity:	high	Docs Contact:
Priority:	high
Version:	7.2	CC:	sbrighi
Target Milestone:	---
Target Release:	---
Hardware:	athlon
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2004-09-30 15:39:17 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Hetz Ben Hamo 2001-11-17 19:09:26 UTC

From Bugzilla Helper:
User-Agent: Mozilla/4.7 [en] (Win98; U)

Description of problem:
I'm using AMD 800 Mhz + 768MB RAM machine. I checked the RAM with some
diagnostic programs (quick, burn-in tests) and everything seems ok. I
booted kernel 2.4.9-13 with noathlon option and still see the same result -
and I'll explain..

When the machine is heavily working (i'm using a backup program called
MONDO which compresses my filesystem with bzip2 and burns it into cd-r's) I
sometimes get oops from random processes - oops copy is below.

I have checked my hardware, CPU fan, board, memory - everything seems to be
OK, so it doesn't look like hardware failure (in fact - I'm posting this
from Konqueror after I just got OOPS 2 minutes ago and I didn't reboot yet).

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.Install RH 7.2 with AMD (board: ASUS A7V)
2.Start backup and load some apps (I load kmail, konqueror and netscape and
use Java)
3.After it completes recording the first CD and continuing the backup - I
get a random OOPS, sometimes it's kswapd, sometimes it's the BZIP2 and the
memory address is random
 

Actual Results:  Ususally After the first CD is burned, and it continues to
compress and backup - after several minutes - it gives me a random OOPS.

Here is my last one from 2 minutes ago:

Nov 17 20:43:28 gorgeous kernel: VM: refill_inactive, wrong page on list.
Nov 17 20:43:28 gorgeous kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000048
Nov 17 20:43:28 gorgeous kernel:  printing eip:
Nov 17 20:43:28 gorgeous kernel: c012b4b2
Nov 17 20:43:28 gorgeous kernel: *pde = 00000000
Nov 17 20:43:28 gorgeous kernel: Oops: 0002
Nov 17 20:43:28 gorgeous kernel: CPU:    0
Nov 17 20:43:28 gorgeous kernel: EIP:    0010:[refill_inactive_scan+70/424]
   Not tainted
Nov 17 20:43:28 gorgeous kernel: EIP:    0010:[<c012b4b2>]    Not tainted
Nov 17 20:43:28 gorgeous kernel: EFLAGS: 00010286
Nov 17 20:43:28 gorgeous kernel: eax: c031abfc   ebx: 00000006   ecx:
c0218620   edx: 00000048
Nov 17 20:43:28 gorgeous kernel: esi: c1453268   edi: c145324c   ebp:
000004cd   esp: effdffc4
Nov 17 20:43:28 gorgeous kernel: ds: 0018   es: 0018   ss: 0018
Nov 17 20:43:28 gorgeous kernel: Process kswapd (pid: 5, stackpage=effdf000)
Nov 17 20:43:28 gorgeous kernel: Stack: effde000 00000000 00000006 000000c0
00000000 0008e000 c012b862 00000006
Nov 17 20:43:28 gorgeous kernel:        00010f00 eff89fb8 c0105000 c010566e
00000000 c012b7c0 c02cbfd8
Nov 17 20:43:28 gorgeous kernel: Call Trace: [kswapd+162/228] kswapd
[kernel] 0xa2
Nov 17 20:43:28 gorgeous kernel: Call Trace: [<c012b862>] kswapd [kernel] 0xa2
Nov 17 20:43:28 gorgeous kernel: [_stext+0/40] stext [kernel] 0x0
Nov 17 20:43:28 gorgeous kernel: [<c0105000>] stext [kernel] 0x0
Nov 17 20:43:28 gorgeous kernel: [kernel_thread+38/48] kernel_thread
[kernel] 0x26
Nov 17 20:43:28 gorgeous kernel:[<c010566e>] kernel_thread [kernel] 0x26
Nov 17 20:43:28 gorgeous kernel: [kswapd+0/228] kswapd [kernel] 0x0
Nov 17 20:43:28 gorgeous kernel: [<c012b7c0>] kswapd [kernel] 0x0
Nov 17 20:43:28 gorgeous kernel:
Nov 17 20:43:28 gorgeous kernel:
Nov 17 20:43:28 gorgeous kernel: Code: 89 02 c7 46 04 00 00 00 00 c7 06 00
00 00 00 ff 0d f4 ab 31


Expected Results:  It should continue to work as usual

Additional info:

here is another (previous) oops:
Nov 17 14:43:48 gorgeous kernel: Unable to handle kernel paging request at
virtual address 206f6e48
Nov 17 14:43:48 gorgeous kernel:  printing eip:
Nov 17 14:43:48 gorgeous kernel: c0158742
Nov 17 14:43:48 gorgeous kernel: *pde = 00000000
Nov 17 14:43:48 gorgeous kernel: Oops: 0000
Nov 17 14:43:48 gorgeous kernel: CPU:    0
Nov 17 14:43:48 gorgeous kernel: EIP:   
0010:[journal_try_to_free_buffers+82/144]    Not tainted
Nov 17 14:43:48 gorgeous kernel: EIP:    0010:[<c0158742>]    Not tainted
Nov 17 14:43:48 gorgeous kernel: EFLAGS: 00010207
Nov 17 14:43:48 gorgeous kernel: eax: 00000000   ebx: 206f6e20   ecx:
c1c15ef0   edx: 206f6e20
Nov 17 14:43:48 gorgeous kernel: esi: ee333620   edi: 00000000   ebp:
00000000   esp: c1c15ef0
Nov 17 14:43:48 gorgeous kernel: ds: 0018   es: 0018   ss: 0018
Nov 17 14:43:48 gorgeous kernel: Process kswapd (pid: 5, stackpage=c1c15000)
Nov 17 14:43:48 gorgeous kernel: Stack: 00000001 c17cd680 000001d0 00000012
0000697f c0150ef2 c1c4d000 c17cd680
Nov 17 14:43:48 gorgeous kernel:        000001d0 c0131d29 c17cd680 000001d0
000001d0 c17cd680 c01299b5 c17cd680
Nov 17 14:43:48 gorgeous kernel:        000001d0 00000000 c1c14000 00000100
000001d0 c0244548 effeb0a0 eb69e7a0
Nov 17 14:43:48 gorgeous kernel: Call Trace: [ext3_releasepage+34/48]
[try_to_release_page+41/80] [shrink_cache+453/704] [shrink_caches+82/128]
[try_to_free_pages+44/80]
Nov 17 14:43:48 gorgeous kernel: Call Trace: [<c0150ef2>] [<c0131d29>]
[<c01299b5>] [<c0129bf2>] [<c0129c4c>]
Nov 17 14:43:48 gorgeous kernel:    [kswapd_balance_pgdat+81/160]
[kswapd_balance+38/64] [kswapd+161/192] [kswapd+0/192] [_stext+0/48]
[kernel_thread+38/48]
Nov 17 14:43:48 gorgeous kernel:    [<c0129cf1>] [<c0129d66>] [<c0129ea1>]
[<c0129e00>] [<c0105000>] [<c0105726>]
Nov 17 14:43:48 gorgeous kernel:    [kswapd+0/192]
Nov 17 14:43:48 gorgeous kernel:    [<c0129e00>]
Nov 17 14:43:48 gorgeous kernel:
Nov 17 14:43:48 gorgeous kernel: Code: 8b 5b 28 f6 42 19 02 74 10 89 e0 50
52 e8 ec fe ff ff 5a 85

Comment 1 Need Real Name 2002-07-17 12:34:20 UTC

I've te identical problem with RH7.3, Athlon XP 1600+.
I don't know about 2.4 kernel's memory model but it seems that all of my 458Mb of  
initial free phisical memory 360 become cache memory. 
Never need to be cached apparently!

Comment 2 Need Real Name 2002-07-18 18:29:40 UTC

An hint. 
My colleague is using a PC with identical components, MB ASUS A7N266E and athlon
XP 1600+ 2, but it is free from the bug of the kernel 2.4.18-3 to 18-5 (I tried
also 18-5e but results are identical to the others and 2.4.9..).
So I rebooted his machine and found a difference in BIOS release. Both are AWARD
MEDALLION V6.0 but I have a rev. 1001.A and the lucky colleagues the 1001.C.
So is a good direction to investigate about the differences between them. I'm
going to update my flash and .....

Comment 3 Bugzilla owner 2004-09-30 15:39:17 UTC

Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/