Bug 426701

Summary: Kernelpanic - not syncing:cpu content Corrupt. happens with 500Gb USB seagate drive, free agent when backing up large 360GB file
Product: [Fedora] Fedora Reporter: G Jacobs <oldkawman>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 7CC: chris.brown, triage
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-06-17 02:57:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg-output
none
lspci-output
none
dmidecode-output
none
I swapped out cpu for another one, new one, dual core
none
I swapped out cpu for another one, new one, dual core
none
lspci-newcpu-output
none
dmidecode-newCPU-output
none
dmesg-output-single core cpu
none
lspci-output-3700 single core
none
dmidecode-out-3700 single core none

Description G Jacobs 2007-12-24 15:41:12 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.10) Gecko/20071128 Fedora/2.0.0.10-2.fc7 Firefox/2.0.0.10

Description of problem:
I get this after about 120Gb have copied from the sata frive to the usb drive;

ata3: timeout waiting for ADMA Idle stat=0x440
ata3: timeout waiting for ADMA Idle stat=0x440
cpu0: Machine Check Exception:0000000000000004
Bank4:b200000000070f0f
Kernelpanic - not syncing:cpu content Corrupt
Clocksource tsc unstable (delta= 4687084458ns)

usb drive is seagate free agent, 500Gb

Version-Release number of selected component (if applicable):
kernel-2.6.23.8-34.fc7

How reproducible:
Always


Steps to Reproduce:
1.as root, just cp the 360GB file to the usb drive
2.wait and it happens every time
3.

Actual Results:
ata3: timeout waiting for ADMA Idle stat=0x440
ata3: timeout waiting for ADMA Idle stat=0x440
cpu0: Machine Check Exception:0000000000000004
Bank4:b200000000070f0f
Kernelpanic - not syncing:cpu content Corrupt
Clocksource tsc unstable (delta= 4687084458ns)

Expected Results:
should have copied without a hitch like in FC4 or 5

I expect this has something to do with the new drive management. Makes thing more complex when you add drives as their names change depending on the number of drives used.

Additional info:
Opteron 180 dual core, Epox NF4 Ultra, 4 Gb ddr400 ram, all stock clocks, 2 ata drives, 4 sata drives

Comment 1 G Jacobs 2007-12-24 20:21:04 UTC
Also happens with 400Gb seagate usb HD drive on files >100GB only as a 84GB file
copied no problem to same drive.

Comment 2 Chuck Ebbert 2008-01-02 20:17:47 UTC
cpu0: Machine Check Exception:0000000000000004
Bank4:b200000000070f0f
Kernelpanic - not syncing:cpu content Corrupt


This looks like a hardware problem.

Comment 3 Christopher Brown 2008-01-16 04:47:10 UTC
Hello,

I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the Fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

As indicated I am closing this NOTABUG as it appears to be faulty hardware. If
you do not believe this to be the case please re-open attaching contents of:

# dmidecode
lspci -vvxxx
dmesg

as separate files of type text/plain to this bug.

Comment 4 G Jacobs 2008-01-18 02:00:56 UTC
Created attachment 292096 [details]
dmesg-output

I swapped out all hardware except hard drives and cpu and still get it;

ata6: timeout waiting for ADMA IDLE, stat=0x0440
ata6: timeout waiting for ADMA LEGACY, stat=0x0440
CPU0: Machine Chck Exception:00000000000000004
Bank4:b2000000000070f0f
Kernel Panic - not syncing: cpu context corrupt
Clacksource tsc unstable (delta=4687113908ns)

Comment 5 G Jacobs 2008-01-18 02:01:32 UTC
Created attachment 292097 [details]
lspci-output

Comment 6 G Jacobs 2008-01-18 02:01:59 UTC
Created attachment 292098 [details]
dmidecode-output

Comment 7 Christopher Brown 2008-01-18 04:39:33 UTC
The CPU is the piece of kit that is throwing exceptions - I ran it through
parsemce and got:

$ ./a.out -e 4 -b 4 -s b200000000070f0f -a 0
Status: (4) Machine Check in progress.
Restart IP invalid.
parsebank(4): b200000000070f0f @ 0
        External tag parity error
        CPU state corrupt. Restart not possible
        Error enabled in control register
        Error not corrected.
        Bus and interconnect error
        Participation: Generic
        Timeout: 
        Request: Generic error
        Transaction type : Invalid
        Memory/IO : Other

so perhaps corrupt cache on your CPU...?

Comment 8 G Jacobs 2008-01-18 22:11:03 UTC
Created attachment 292218 [details]
I swapped out cpu for another one, new one, dual core

I swapped cpu chips. Installed a new x2 4800+ chip. same problem, same kernel
panic

Comment 9 G Jacobs 2008-01-18 22:11:17 UTC
Created attachment 292219 [details]
I swapped out cpu for another one, new one, dual core

I swapped cpu chips. Installed a new x2 4800+ chip. same problem, same kernel
panic

Comment 10 G Jacobs 2008-01-18 22:13:10 UTC
Created attachment 292220 [details]
lspci-newcpu-output

Comment 11 G Jacobs 2008-01-18 22:14:25 UTC
Created attachment 292221 [details]
dmidecode-newCPU-output

Comment 12 G Jacobs 2008-01-18 22:42:00 UTC
I installed another CPU, a new X2 4800+. I also copied the large file to another
internal hard drive, sata still, and tried to copy from that one. I get the same
exact error as with the opteron. I do have a single core cpu, a 3700+ that I
will try and swap in this weekend. So far 2 motherboards, an epox and an msi, ,
2 different X2 chips , 2 different USB drives, a 500 and a 400 seagate, 3
different internal hard drives, 2 seagate and 1 hitachi, different ram, corsair
value ram and geil. I have no problem doing this from my XP3200+ running FC4
with all these same USB, hard drives, and both sets of DDR memory.

Comment 13 G Jacobs 2008-01-19 17:10:52 UTC
Created attachment 292262 [details]
dmesg-output-single core cpu

I swapped out the 2nd dual core cpu for a single core. Same error with usb
drive and same error when cp from the one internal sata drive to the other. I
was able to copy from a samsung sata to the seagate internal sata previously. I
do not have any other internal sata drives onhand right now, but will have a
couple in a week or so. Could  this be an sata error causing the kernel panic?

Comment 14 G Jacobs 2008-01-19 17:12:11 UTC
Created attachment 292263 [details]
lspci-output-3700 single core

Comment 15 G Jacobs 2008-01-19 17:13:19 UTC
Created attachment 292264 [details]
dmidecode-out-3700 single core

Comment 16 Christopher Brown 2008-01-21 15:53:12 UTC
(In reply to comment #13)
> I swapped out the 2nd dual core cpu for a single core. Same error with usb
> drive and same error when cp from the one internal sata drive to the other. I
> was able to copy from a samsung sata to the seagate internal sata previously. I
> do not have any other internal sata drives onhand right now, but will have a
> couple in a week or so. Could  this be an sata error causing the kernel panic?

Not sure - it needs one of the kernel team to comment on this. Thanks for the
exhaustive testing ... re-opened bug.



Comment 17 G Jacobs 2008-01-31 01:40:22 UTC
OK, tried out Samsung HD501LJ 500GB SATA with 16MB buffer. They work, no
problems. The Seagate ST3500320AS 500GB SATA with 32MB buffer are the problem. I
can write to them via the Samsung SATA drives or via the USB drives. Writing to
them is no problem. Reading from them during a copy creates the kernel panic.
copying to a USB or to any other SATA drive, Samsung or Seagate. When copying
from the Seagate to either of the USB drives I get the panic at about 139Gb. I
get the same error when copying to one of the other drives, Seagate or Samsung
at about 149Gb. I just noticed the Seagate is a new drive with a 32Mb buffer.

Comment 18 Bug Zapper 2008-05-14 15:11:34 UTC
This message is a reminder that Fedora 7 is nearing the end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 7. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '7'.

Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 7's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 7 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug. If you are unable to change the version, please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. If possible, it is recommended that you try the newest available Fedora distribution to see if your bug still exists.

Please read the Release Notes for the newest Fedora distribution to make sure it will meet your needs:
http://docs.fedoraproject.org/release-notes/

The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 19 Bug Zapper 2008-06-17 02:57:02 UTC
Fedora 7 changed to end-of-life (EOL) status on June 13, 2008. 
Fedora 7 is no longer maintained, which means that it will not 
receive any further security or bug fix updates. As a result we 
are closing this bug. 

If you can reproduce this bug against a currently maintained version 
of Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.