Red Hat Bugzilla – Bug 57881
Hardlock in case of smp kernels with dual athlons on A7M266-D.
Last modified: 2005-10-31 17:00:50 EST
Description of Problem:
smp kernels (2.4.7-10smp & 2.4.9-13smp) will hardlock after cpu
initialisation on AsusTek A7M266-D equiped with two Athlon XP 1700+ cpus.
Version-Release number of selected component (if applicable):
RedHat Linux 7.2.
Steps to Reproduce:
Nothing to reproduce, just fire system up with smp kernel and You have it.
After some messages about CPU0&CPU1:
ENABLING IO-APIC IRQS
... CHANGING IO-APIC physical APIC ID to 2 ... OK
Timer: vector=0x31 pin1=2 pin2=0
and hardlock, no kernel OOPS or PANIC...
System running happily...
I have not tried to run other smp capable OSes on this board.
I broke my RedHat Linux 7.2 by installing 2.4.16-0 kernel from rawhide, but even
this kernel will freeze at the same spot.
As with 2.4.9-17.6smp (from kernel-smp-2.4.9-17.6.athlon.rpm) lockup will
arrive after some additional messages:
Testing the IO APIC .......
Using local APIC times interrupts.
Calibrating APIC timer...
... CPU clock speed is 1466.6273 MHz.
... host bus clock speed is 266.6595 MHz.
OK, I solved a problem for now by disableing MPS 1.4 Support from BIOS.
Now kernel 2.4.9-17.6smp is seeing 2 CPUs and system is running happily.
Problem I still seeing is with "poweroff" what sometimes turns the power off
but usually result is reboot.
Hmmmm... smells like a bios bug in the mp(s)table.....
There is an additional option in BIOS to turn off MP Table. When I disabled it
then smp kernel did came up, put complained about smp board not found and was
useing only one CPU.
What for this MPS 1.4 is good for?
How to fix this poweroff problem?
MPS is a bios table that describes how many CPU's there are, and how they
are connected (and how they are connected to the (sometimes virtual) APIC chips)
There's 2 versions of the MPS specification, version 1.1 and version 1.4; it
appears that the 1.4 table in your bios is not compatible with linux (I tend to
blame the bios but....). Turning of "1.4 support" will most likely result in the
1.1 format being used by the bios; turning MPS off alltogether means there's no
information on how many cpu's there are.....
The poweroff is a bios call, and again, lots of bioses get it wrong, especially
on SMP. On SMP so many get it wrong that linux doesn't even try it by default.
You can try passing "apm=realmode-power-off" to the kernel commandline (eg add
that to the vmlinuz line in /boot/grub/grub.conf if you use grub) and then the
kernel is even more conservative in how much it relies on the bios.
Are this MPS 1.4 and poweroff are somewhat related to eachother?
Before I disabled MPS 1.4 even Win98SE behaved like Linux in case of shutdown,
after disabling of it Windows is capable of shutdown, but Linux just stops
after "Power down." message.
Adding of apm=realmode-power-off didn't change the situation.
What are the sideeffects of not using MPS 1.4?
These problems are not CPU related because the same problems are there even I
use Athlon MP 1600+'s.
I see the same problem on my board. The BIOS seems to be full of bugs so I have
no faith in the MP table.
Its showing problems that appear to include
o Incorrect PCI initialisation
o MP 1.4 table hang (ditto - MP 1.1 is ok 1.4 fails - your bug made me realise
that I was seeing that same dependancy)
o Won't boot if you plug in the HD led connector
o Won't boot with a 33Mhz PCI card in a 64bit slot unless you use soft jumpers
I'm hoping ASUS will issue a better BIOS soon
I updated to latest BIOS (1005d2e) and to latest version of kernel (2.4.9-31)
from up2date: the board is running with MPS 1.4 enabled.
My lab found a solution that appears to take care of the booting issue for the SMP kernel.
In lilo.conf (assuming you are using lilo) add the line:
to the smp kernel information and run /sbin/lilo with ./lilo. Apparently there is some conflict between the A7M266-D bios and RedHat when it goes
looking for the second CPU.
However, we have a different problem:
In RH v7.2 the KDE environment is rather unstable. It spontaneously crashes to the login screen, sometimes failing to login at all (enter name,
password and system shows a few blank screens, then ends up back at login). If you manage to get into KDE several applications, including
Konqueror, Konsole, Netscape, and linuxconf fail with " The application 'name' (name) crashed & caused the signal 11 (SIGSEGV) " (verbatim).
Once the crash occurs sometimes the system is stable afterwards, but generally everything starts to collapse after that. I have tried two different
copies of RH v7.2, one of v7.1, and Mandrake v8.1. Mandrake has been more stable, had the crash for Konqueror once, while the RH installations
keep crashing. We've put RH v7.2 back on to try to figure out what the problem is and because the lilo.conf file has a different setup and we do not
know how to edit it to let the smp kernel (enterprise) load properly.
2 x AMD MP1800+ (retail)
2 x 512MB Crucial Registered DDR2100 C.L. 2.5
WD400BB Caviar 7200 RPM, ATA100
Abit Siluro GeForce2 MX200 32MB
D-Link DFE-530TX+ Ethernet card
Generic 44X CD-ROM
Generic 3.5" 1.44 MB Floppy
431W EG465P-VE (FM) Enermax PS (ATX 2.03)
Boot hang was a bios problem, newer bios has the tables right. Also note you
need a PS/2 mouse present in such a system or the very latest (as I write) BIOS
for the tyan boards or you will get occasional hangs - also not a Linux bug