Bug 57881 - Hardlock in case of smp kernels with dual athlons on A7M266-D.
Summary: Hardlock in case of smp kernels with dual athlons on A7M266-D.
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.2
Hardware: athlon
OS: Linux
high
high
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2001-12-30 18:06 UTC by Ivo Sarak
Modified: 2005-10-31 22:00 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2003-06-08 00:02:05 UTC
Embargoed:


Attachments (Terms of Use)

Description Ivo Sarak 2001-12-30 18:06:52 UTC
Description of Problem:
smp kernels (2.4.7-10smp & 2.4.9-13smp) will hardlock after cpu 
initialisation on AsusTek A7M266-D equiped with two Athlon XP 1700+ cpus. 


Version-Release number of selected component (if applicable):
RedHat Linux 7.2.

How Reproducible:
Every time.

Steps to Reproduce:
Nothing to reproduce, just fire system up with smp kernel and You have it.


Actual Results:
After some messages about CPU0&CPU1:

ENABLING IO-APIC IRQS
... CHANGING IO-APIC physical APIC ID to 2 ... OK
Timer: vector=0x31 pin1=2 pin2=0

and hardlock, no kernel OOPS or PANIC...

Expected Results:
System running happily...

Additional Information:
I have not tried to run other smp capable OSes on this board.

Comment 1 Ivo Sarak 2001-12-30 20:44:50 UTC
I broke my RedHat Linux 7.2 by installing 2.4.16-0 kernel from rawhide, but even
this kernel will freeze at the same spot.


Comment 2 Ivo Sarak 2001-12-30 23:05:37 UTC
As with 2.4.9-17.6smp (from kernel-smp-2.4.9-17.6.athlon.rpm) lockup will 
arrive after some additional messages:

Testing the IO APIC .......
.............done.

Using local APIC times interrupts.
Calibrating APIC timer...
... CPU clock speed is 1466.6273 MHz.
... host bus clock speed is 266.6595 MHz.
CPU:0,clocks:2666595,slice:888865
CPU0<T0:2666592,T1:1777712,D:15,S:888865,C:2666595>



Comment 3 Ivo Sarak 2001-12-30 23:32:17 UTC
OK, I solved a problem for now by disableing MPS 1.4 Support from BIOS.
Now kernel 2.4.9-17.6smp is seeing 2 CPUs and system is running happily. 
Problem I still seeing is with "poweroff" what sometimes turns the power off 
but usually result is reboot.


Comment 4 Arjan van de Ven 2001-12-31 08:56:14 UTC
Hmmmm... smells like a bios bug in the mp(s)table.....

Comment 5 Ivo Sarak 2001-12-31 09:14:11 UTC
There is an additional option in BIOS to turn off MP Table. When I disabled it 
then smp kernel did came up, put complained about smp board not found and was 
useing only one CPU.

What for this MPS 1.4 is good for?

How to fix this poweroff problem?


Comment 6 Arjan van de Ven 2002-01-01 15:59:02 UTC
MPS is a bios table that describes how many CPU's there are, and how they
are connected (and how they are connected to the (sometimes virtual) APIC chips)

There's 2 versions of the MPS specification, version 1.1 and version 1.4; it
appears that the 1.4 table in your bios is not compatible with linux (I tend to
blame the bios but....). Turning of "1.4 support" will most likely result in the
1.1 format being used by the bios; turning MPS off alltogether means there's no
information on how many cpu's there are.....

The poweroff is a bios call, and again, lots of bioses get it wrong, especially
on SMP. On SMP so many get it wrong that linux doesn't even try it by default.

You can try passing "apm=realmode-power-off" to the kernel commandline (eg add
that to the vmlinuz line in /boot/grub/grub.conf if you use grub) and then the
kernel is even more conservative in how much it relies on the bios.




Comment 7 Ivo Sarak 2002-01-01 21:39:01 UTC
Are this MPS 1.4 and poweroff are somewhat related to eachother?
Before I disabled MPS 1.4 even Win98SE behaved like Linux in case of shutdown, 
after disabling of it Windows is capable of shutdown, but Linux just stops 
after "Power down." message.

Adding of apm=realmode-power-off didn't change the situation.

What are the sideeffects of not using MPS 1.4? 


Comment 8 Ivo Sarak 2002-01-25 09:55:02 UTC
These problems are not CPU related because the same problems are there even I 
use Athlon MP 1600+'s.


Comment 9 Alan Cox 2002-02-11 16:40:32 UTC
I see the same problem on my board. The BIOS seems to be full of bugs so I have
no faith in the MP table.

Its showing problems that appear to include
o Incorrect PCI initialisation
o MP 1.4 table hang (ditto - MP 1.1 is ok 1.4 fails - your bug made me realise
  that I was seeing that same dependancy)
o Won't boot if you plug in the HD led connector
o Won't boot with a 33Mhz PCI card in a 64bit slot unless you use soft jumpers

I'm hoping ASUS will issue a better BIOS soon


Comment 10 Ivo Sarak 2002-03-03 19:10:23 UTC
I updated to latest BIOS (1005d2e) and to latest version of kernel (2.4.9-31)
from up2date: the board is running with MPS 1.4 enabled.

Comment 11 stuart 2002-03-08 02:02:32 UTC
My lab found a solution that appears to take care of the booting issue for the SMP kernel.

In lilo.conf (assuming you are using lilo) add the line:
append="noapic"

to the smp kernel information and run /sbin/lilo with ./lilo.  Apparently there is some conflict between the A7M266-D bios and RedHat when it goes 
looking for the second CPU.

However, we have a different problem:
In RH v7.2 the KDE environment is rather unstable.  It spontaneously crashes to the login screen, sometimes failing to login at all (enter name, 
password and system shows a few blank screens, then ends up back at login).  If you manage to get into KDE several applications, including 
Konqueror, Konsole, Netscape, and linuxconf fail with " The application 'name' (name) crashed & caused the signal 11 (SIGSEGV) " (verbatim).   
Once the crash occurs sometimes the system is stable afterwards, but generally everything starts to collapse after that.  I have tried two different 
copies of RH v7.2, one of v7.1, and Mandrake v8.1.  Mandrake has been more stable, had the crash for Konqueror once, while the RH installations 
keep crashing.  We've put RH v7.2 back on to try to figure out what the problem is and because the lilo.conf file has a different setup and we do not 
know how to edit it to let the smp kernel (enterprise) load properly.

Asus A7M266-D
2 x AMD MP1800+ (retail)
2 x 512MB Crucial Registered DDR2100 C.L. 2.5
WD400BB Caviar 7200 RPM, ATA100
Abit Siluro GeForce2 MX200 32MB
D-Link DFE-530TX+ Ethernet card
Generic 44X CD-ROM
Generic 3.5" 1.44 MB Floppy
431W EG465P-VE (FM) Enermax PS (ATX 2.03)

RedHat v7.2

Comment 12 Alan Cox 2003-06-08 00:02:05 UTC
Boot hang was a bios problem, newer bios has the tables right. Also note you
need a PS/2 mouse present in such a system or the very latest (as I write) BIOS
for the tyan boards or you will get occasional hangs - also not a Linux bug



Note You need to log in before you can comment on or make changes to this bug.