Description of Problem: smp kernels (2.4.7-10smp & 2.4.9-13smp) will hardlock after cpu initialisation on AsusTek A7M266-D equiped with two Athlon XP 1700+ cpus. Version-Release number of selected component (if applicable): RedHat Linux 7.2. How Reproducible: Every time. Steps to Reproduce: Nothing to reproduce, just fire system up with smp kernel and You have it. Actual Results: After some messages about CPU0&CPU1: ENABLING IO-APIC IRQS ... CHANGING IO-APIC physical APIC ID to 2 ... OK Timer: vector=0x31 pin1=2 pin2=0 and hardlock, no kernel OOPS or PANIC... Expected Results: System running happily... Additional Information: I have not tried to run other smp capable OSes on this board.
I broke my RedHat Linux 7.2 by installing 2.4.16-0 kernel from rawhide, but even this kernel will freeze at the same spot.
As with 2.4.9-17.6smp (from kernel-smp-2.4.9-17.6.athlon.rpm) lockup will arrive after some additional messages: Testing the IO APIC ....... .............done. Using local APIC times interrupts. Calibrating APIC timer... ... CPU clock speed is 1466.6273 MHz. ... host bus clock speed is 266.6595 MHz. CPU:0,clocks:2666595,slice:888865 CPU0<T0:2666592,T1:1777712,D:15,S:888865,C:2666595>
OK, I solved a problem for now by disableing MPS 1.4 Support from BIOS. Now kernel 2.4.9-17.6smp is seeing 2 CPUs and system is running happily. Problem I still seeing is with "poweroff" what sometimes turns the power off but usually result is reboot.
Hmmmm... smells like a bios bug in the mp(s)table.....
There is an additional option in BIOS to turn off MP Table. When I disabled it then smp kernel did came up, put complained about smp board not found and was useing only one CPU. What for this MPS 1.4 is good for? How to fix this poweroff problem?
MPS is a bios table that describes how many CPU's there are, and how they are connected (and how they are connected to the (sometimes virtual) APIC chips) There's 2 versions of the MPS specification, version 1.1 and version 1.4; it appears that the 1.4 table in your bios is not compatible with linux (I tend to blame the bios but....). Turning of "1.4 support" will most likely result in the 1.1 format being used by the bios; turning MPS off alltogether means there's no information on how many cpu's there are..... The poweroff is a bios call, and again, lots of bioses get it wrong, especially on SMP. On SMP so many get it wrong that linux doesn't even try it by default. You can try passing "apm=realmode-power-off" to the kernel commandline (eg add that to the vmlinuz line in /boot/grub/grub.conf if you use grub) and then the kernel is even more conservative in how much it relies on the bios.
Are this MPS 1.4 and poweroff are somewhat related to eachother? Before I disabled MPS 1.4 even Win98SE behaved like Linux in case of shutdown, after disabling of it Windows is capable of shutdown, but Linux just stops after "Power down." message. Adding of apm=realmode-power-off didn't change the situation. What are the sideeffects of not using MPS 1.4?
These problems are not CPU related because the same problems are there even I use Athlon MP 1600+'s.
I see the same problem on my board. The BIOS seems to be full of bugs so I have no faith in the MP table. Its showing problems that appear to include o Incorrect PCI initialisation o MP 1.4 table hang (ditto - MP 1.1 is ok 1.4 fails - your bug made me realise that I was seeing that same dependancy) o Won't boot if you plug in the HD led connector o Won't boot with a 33Mhz PCI card in a 64bit slot unless you use soft jumpers I'm hoping ASUS will issue a better BIOS soon
I updated to latest BIOS (1005d2e) and to latest version of kernel (2.4.9-31) from up2date: the board is running with MPS 1.4 enabled.
My lab found a solution that appears to take care of the booting issue for the SMP kernel. In lilo.conf (assuming you are using lilo) add the line: append="noapic" to the smp kernel information and run /sbin/lilo with ./lilo. Apparently there is some conflict between the A7M266-D bios and RedHat when it goes looking for the second CPU. However, we have a different problem: In RH v7.2 the KDE environment is rather unstable. It spontaneously crashes to the login screen, sometimes failing to login at all (enter name, password and system shows a few blank screens, then ends up back at login). If you manage to get into KDE several applications, including Konqueror, Konsole, Netscape, and linuxconf fail with " The application 'name' (name) crashed & caused the signal 11 (SIGSEGV) " (verbatim). Once the crash occurs sometimes the system is stable afterwards, but generally everything starts to collapse after that. I have tried two different copies of RH v7.2, one of v7.1, and Mandrake v8.1. Mandrake has been more stable, had the crash for Konqueror once, while the RH installations keep crashing. We've put RH v7.2 back on to try to figure out what the problem is and because the lilo.conf file has a different setup and we do not know how to edit it to let the smp kernel (enterprise) load properly. Asus A7M266-D 2 x AMD MP1800+ (retail) 2 x 512MB Crucial Registered DDR2100 C.L. 2.5 WD400BB Caviar 7200 RPM, ATA100 Abit Siluro GeForce2 MX200 32MB D-Link DFE-530TX+ Ethernet card Generic 44X CD-ROM Generic 3.5" 1.44 MB Floppy 431W EG465P-VE (FM) Enermax PS (ATX 2.03) RedHat v7.2
Boot hang was a bios problem, newer bios has the tables right. Also note you need a PS/2 mouse present in such a system or the very latest (as I write) BIOS for the tyan boards or you will get occasional hangs - also not a Linux bug