Bug 9660

Summary: stuck on TLB IPI wait (CPU#0)
Product: [Retired] Red Hat Linux Reporter: Anand Surelia <anand>
Component: kernelAssignee: Michael K. Johnson <johnsonm>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 6.1CC: anand
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2000-08-08 19:52:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Anand Surelia 2000-02-21 21:14:21 UTC
System: Dual Intel Pentium processor (see details below)
Problem: Sometimes under high load conditions the machine would just
simply lock up. Ping would work, and so would http, but only to the extent
that my browser would say "site connected, waiting for reply". SSHD would
behave stragely with the screen echo of characters lagging behing the
keyboard by one keystroke. The "reboot" or "shutdown" command would stop
working (the command would just return after printing its standard message
that the system is going down for reboot). There would be no option but to
press the reset button.
Strangely though, none of the logs of any service would show anything
untoward. This has happened many times at irregular intervals, but only
once did I find this message in my /var/log/messages - "kernel: stuck on
TLB IPI wait (CPU#0)".
Some people are of the view that it might be a problem with the SMP part
of the kernel. Would disabling SMP help?
Thanks.

System CPU/Mem details:
Calibrating delay loop... 445.64 BogoMIPS
Memory: 516828k/524224k available (1324k kernel code, 420k reserved, 5596k
data,
 56k init)
Checking 386/387 coupling... OK, FPU using exception 16 error reporting.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
per-CPU timeslice cutoff: 100.15 usecs.
CPU1: Intel Pentium III (Katmai) stepping 03
calibrating APIC timer ...
..... CPU clock speed is 447.6938 MHz.
..... system bus clock speed is 99.4873 MHz.
Booting processor 0 eip 2000
Calibrating delay loop... 447.28 BogoMIPS
OK.
CPU0: Intel Pentium III (Katmai) stepping 03
Total of 2 processors activated (892.93 BogoMIPS).
enabling symmetric IO mode... ...done.
ENABLING IO-APIC IRQs
init IO_APIC IRQs
 IO-APIC pin 0, 9, 10, 11, 16, 17, 18, 20, 22, 23 not connected.
number of MP IRQ sources: 17.
number of IO-APIC registers: 24.
testing the IO APIC.......................
.... register #00: 02000000
.......    : physical APIC id: 02
.... register #01: 00170011
.......     : max redirection entries: 0017
.......     : IO APIC version: 0011
.... register #02: 00000000
.......     : arbitration: 00
.... IRQ redirection table:
 NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
 00 000 00  1    0    0   0   0    0    0    00
 01 000 00  0    0    0   0   0    1    1    59
 02 0FF 0F  0    0    0   0   0    1    1    51
 03 000 00  0    0    0   0   0    1    1    61
 04 000 00  0    0    0   0   0    1    1    69
 05 000 00  0    0    0   0   0    1    1    71
 06 000 00  0    0    0   0   0    1    1    79
 07 000 00  0    0    0   0   0    1    1    81
 08 000 00  0    0    0   0   0    1    1    89
 09 000 00  1    0    0   0   0    0    0    00
 0a 000 00  1    0    0   0   0    0    0    00
 0b 000 00  1    0    0   0   0    0    0    00
 0c 000 00  0    0    0   0   0    1    1    91
 0d 000 00  1    0    0   0   0    0    0    00
 0e 000 00  0    0    0   0   0    1    1    99
 0f 000 00  0    0    0   0   0    1    1    A1
 10000 00  1    0    0   0   0    0    0    00
 11 000 00  1    0    0   0   0    0    0    00
 12 000 00  1    0    0   0   0    0    0    00
 13 0FF 0F  1    1    0   1   0    1    1    A9
 14 000 00  1    0    0   0   0    0    0    00
 15 0FF 0F  1    1    0   1   0    1    1    B1
 16 000 00  1    0    0   0   0    0    0    00
 17 000 00  1    0    0   0   0    0    0    00
IRQ to pin mappings:
IRQ0 -> 2
IRQ1 -> 1
IRQ3 -> 3
IRQ4 -> 4
IRQ5 -> 5
IRQ6 -> 6
IRQ7 -> 7
IRQ8 -> 8
IRQ12 -> 12
IRQ13 -> 13
IRQ14 -> 14
IRQ15 -> 15
IRQ19 -> 19
IRQ21 -> 21
.................................... done.

Comment 1 Alan Cox 2000-08-08 19:52:23 UTC
It typically indicates an SMP deadlock - does 6.2 still show this ?