Bug 101040 - Massive oops & filesystem corruption on AMD Athlon
Summary: Massive oops & filesystem corruption on AMD Athlon
Keywords:
Status: CLOSED DUPLICATE of bug 99507
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: athlon
OS: Linux
high
high
Target Milestone: ---
Assignee: Ernie Petrides
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 97942
TreeView+ depends on / blocked
 
Reported: 2003-07-28 19:09 UTC by Felipe Alfaro Solana
Modified: 2007-11-30 22:06 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-02-21 18:57:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
output of the "dmesg" command (15.12 KB, text/plain)
2003-07-28 19:10 UTC, Felipe Alfaro Solana
no flags Details
output of the "lspci -vvv" command (7.26 KB, text/plain)
2003-07-28 19:11 UTC, Felipe Alfaro Solana
no flags Details
output of "cat /proc/interrupts" (425 bytes, text/plain)
2003-07-28 19:11 UTC, Felipe Alfaro Solana
no flags Details
output of "cat /proc/ioports" (929 bytes, text/plain)
2003-07-28 19:12 UTC, Felipe Alfaro Solana
no flags Details

Description Felipe Alfaro Solana 2003-07-28 19:09:31 UTC
Description of problem: 
After installing Taroon Beta 1 on my AMD Athlon Compaq Presario laptop, I 
booted it up and found a lot of oopses during boot. The kernel is oopsing at 
"do_generic_file_read" at a very fast rate which overflows the "dmesg" kernel ring 
buffer. 
 
I have attached to thig bug report the output of "dmesg", plus "lspci" and a copy of 
"/proc/ioports" and "/proc/interrupts". 
 
Version-Release number of selected component (if applicable): 
Stock Red Hat Linux Taroon Beta 1 kernel. 
 
How reproducible: 
Always. 
 
Steps to Reproduce: 
1. Install Taroon AS Beta 1 on a Compaq Presario 706EA, or a similar computer, 
since I don't know if this problem is related to hardware configuration. 
2. Choose to install "Everything" 
3. After installation and upon rebooting, the kernel starts oopsing. The result of 
the "dmesg" command after booting into runlevel 1 is attached in the bug report. 
     
Actual results: 
The system oopses continuously and ends up corrupting the filesystem, requiring 
a full system reinstall. 
 
Expected results: 
The system should be stable with the selected hardware configuration. At least, 
Red Hat Linux Beta is completely stable on the same machine. 
 
Additional info: 
The problem is reproducible with either ext2 or ext3 filesystems. 
The machine used to install RHL AS is a Compaq Presario 706EA.

Comment 1 Felipe Alfaro Solana 2003-07-28 19:10:37 UTC
Created attachment 93190 [details]
output of the "dmesg" command

The kernel oopses at a very fast rate, and thus, I've been unable to capture
the full kernel ring as it overflows.

Comment 2 Felipe Alfaro Solana 2003-07-28 19:11:04 UTC
Created attachment 93191 [details]
output of the "lspci -vvv" command

Comment 3 Felipe Alfaro Solana 2003-07-28 19:11:42 UTC
Created attachment 93192 [details]
output of "cat /proc/interrupts"

Comment 4 Felipe Alfaro Solana 2003-07-28 19:12:10 UTC
Created attachment 93193 [details]
output of "cat /proc/ioports"

Comment 5 Ernie Petrides 2003-07-29 18:10:51 UTC
Thank you for your detailed bug report.  Red Hat is aware of this
problem, and several options for fixing it are currently under
review.  The original bug report is #99507.

*** This bug has been marked as a duplicate of 99507 ***

Comment 6 Bill Nottingham 2003-08-01 19:16:02 UTC
*** Bug 101481 has been marked as a duplicate of this bug. ***

Comment 7 Nitin Dahyabhai 2003-08-06 02:42:38 UTC
I can't see bug 99507, but I'm getting the same problem on an untainted NForce2
based platform.

Comment 8 Ernie Petrides 2003-08-07 03:58:49 UTC
Nitin, I didn't want to add your "outside" e-mail address to the
cc: list of bug #99507 (which is restricted to beta partners), so
here is a copy of my final comment for 99507 posted on 2003-07-31:

This problem is believed to be caused by a flaw in the Athlon XP
cpu, which under unusual circumstances, allows a "prefetch" machine
instruction with certain alignments and/or cache conflicts to cause
a page fault on address 0 (in violation of the instruction's
functional description).

A work-around has been committed to Taroon today to the Athlon-specific
prefetch routine to test for NULL addresses.  The change is applied by
linux-2.4.18-smallpatches.patch to include/asm-i386/processor.h, and
will be incorporated into any kernel version 2.4.21-1.1931.2.364 or
later (and will thus be part of Taroon B2).


Comment 9 Red Hat Bugzilla 2006-02-21 18:57:48 UTC
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.


Note You need to log in before you can comment on or make changes to this bug.