System: Dual Intel Pentium processor (see details below) Problem: Sometimes under high load conditions the machine would just simply lock up. Ping would work, and so would http, but only to the extent that my browser would say "site connected, waiting for reply". SSHD would behave stragely with the screen echo of characters lagging behing the keyboard by one keystroke. The "reboot" or "shutdown" command would stop working (the command would just return after printing its standard message that the system is going down for reboot). There would be no option but to press the reset button. Strangely though, none of the logs of any service would show anything untoward. This has happened many times at irregular intervals, but only once did I find this message in my /var/log/messages - "kernel: stuck on TLB IPI wait (CPU#0)". Some people are of the view that it might be a problem with the SMP part of the kernel. Would disabling SMP help? Thanks. System CPU/Mem details: Calibrating delay loop... 445.64 BogoMIPS Memory: 516828k/524224k available (1324k kernel code, 420k reserved, 5596k data, 56k init) Checking 386/387 coupling... OK, FPU using exception 16 error reporting. Checking 'hlt' instruction... OK. POSIX conformance testing by UNIFIX per-CPU timeslice cutoff: 100.15 usecs. CPU1: Intel Pentium III (Katmai) stepping 03 calibrating APIC timer ... ..... CPU clock speed is 447.6938 MHz. ..... system bus clock speed is 99.4873 MHz. Booting processor 0 eip 2000 Calibrating delay loop... 447.28 BogoMIPS OK. CPU0: Intel Pentium III (Katmai) stepping 03 Total of 2 processors activated (892.93 BogoMIPS). enabling symmetric IO mode... ...done. ENABLING IO-APIC IRQs init IO_APIC IRQs IO-APIC pin 0, 9, 10, 11, 16, 17, 18, 20, 22, 23 not connected. number of MP IRQ sources: 17. number of IO-APIC registers: 24. testing the IO APIC....................... .... register #00: 02000000 ....... : physical APIC id: 02 .... register #01: 00170011 ....... : max redirection entries: 0017 ....... : IO APIC version: 0011 .... register #02: 00000000 ....... : arbitration: 00 .... IRQ redirection table: NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: 00 000 00 1 0 0 0 0 0 0 00 01 000 00 0 0 0 0 0 1 1 59 02 0FF 0F 0 0 0 0 0 1 1 51 03 000 00 0 0 0 0 0 1 1 61 04 000 00 0 0 0 0 0 1 1 69 05 000 00 0 0 0 0 0 1 1 71 06 000 00 0 0 0 0 0 1 1 79 07 000 00 0 0 0 0 0 1 1 81 08 000 00 0 0 0 0 0 1 1 89 09 000 00 1 0 0 0 0 0 0 00 0a 000 00 1 0 0 0 0 0 0 00 0b 000 00 1 0 0 0 0 0 0 00 0c 000 00 0 0 0 0 0 1 1 91 0d 000 00 1 0 0 0 0 0 0 00 0e 000 00 0 0 0 0 0 1 1 99 0f 000 00 0 0 0 0 0 1 1 A1 10000 00 1 0 0 0 0 0 0 00 11 000 00 1 0 0 0 0 0 0 00 12 000 00 1 0 0 0 0 0 0 00 13 0FF 0F 1 1 0 1 0 1 1 A9 14 000 00 1 0 0 0 0 0 0 00 15 0FF 0F 1 1 0 1 0 1 1 B1 16 000 00 1 0 0 0 0 0 0 00 17 000 00 1 0 0 0 0 0 0 00 IRQ to pin mappings: IRQ0 -> 2 IRQ1 -> 1 IRQ3 -> 3 IRQ4 -> 4 IRQ5 -> 5 IRQ6 -> 6 IRQ7 -> 7 IRQ8 -> 8 IRQ12 -> 12 IRQ13 -> 13 IRQ14 -> 14 IRQ15 -> 15 IRQ19 -> 19 IRQ21 -> 21 .................................... done.
It typically indicates an SMP deadlock - does 6.2 still show this ?