Bug 169664

Summary: Kernel Crashes - Corrupts Drive Data - Athlon X2 Issues
Product: [Fedora] Fedora Reporter: Marc Perkel <marc>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 4CC: pfrields, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-05-05 01:38:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marc Perkel 2005-09-30 20:21:30 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7

Description of problem:
System crashes shortly after boot - lots of disk driver related complaining on the screen - kernel panics - data on drive corropted.

This applies only to AMD Processords running 64 bit version as far as I know. And it may be limited to specific hardware. The problem appears using my Asus Motherboard with the NVidea Chipset. And it only happens with 4 gigs of ram installed. With 2 gigs the problem goes away. I believe it's related mapping memory above the 4gb limit and the 32 bit drivers don't know how to talk to high memory. It might also be Athlon X2 dual core related

The problem has been fixed in the 2.6.13 kernel. I don't know where it is fixed but the new kernel is rock solid. So - the solution to this problem is create a 2.6.13 kernel.

In addition - there is a second bug related to CPU interropt timers that has been discovered and a patch has been created to fix it. The problem is rapid clock advancing - about 10 seconds per minute. It is apparently related to the two processors and lost ticks. This explains it:

http://bugzilla.kernel.org/show_bug.cgi?id=5105

And it contains a patch that everyone says works - and it works for me.

diff --git a/arch/x86_64/kernel/time.c b/arch/x86_64/kernel/time.c
--- a/arch/x86_64/kernel/time.c
+++ b/arch/x86_64/kernel/time.c
@@ -959,9 +959,6 @@ static __init int unsynchronized_tsc(voi
  	   are handled in the OEM check above. */
  	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
  		return 0;
- 	/* All in a single socket - should be synchronized */
- 	if (cpus_weight(cpu_core_map[0]) == num_online_cpus())
- 		return 0;
 #endif
  	/* Assume multi socket systems are not synchronized */
  	return num_online_cpus() > 1;

So - I'm brain dead right now but a new kernel is needed using 2.6.13.2+ with this patch. 

Version-Release number of selected component (if applicable):
All current Kernels

How reproducible:
Always

Steps to Reproduce:
1.Boot Stock Kernel
2.Wait 2 minutes
3.Run FSCK all night trying to fix drives
  

Additional info:

Comment 1 Dave Jones 2005-11-10 19:38:27 UTC
2.6.14-1.1637_FC4 has been released as an update for FC4.
Please retest with this update, as a large amount of code has been changed in
this release, which may have fixed your problem.

Thank you.


Comment 2 Dave Jones 2006-02-03 06:53:33 UTC
This is a mass-update to all currently open kernel bugs.

A new kernel update has been released (Version: 2.6.15-1.1830_FC4)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

Thank you.


Comment 3 John Thacker 2006-05-05 01:38:22 UTC
Closing per previous comment.