Bug 129015

Summary: smp kernel hangs with dual Xeon CPUs
Product: Red Hat Enterprise Linux 3 Reporter: John Klingler <john>
Component: kernelAssignee: Ernie Petrides <petrides>
Status: CLOSED WORKSFORME QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: frank.zimmermann, john, petrides, riel, yves.joan
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-08-30 19:23:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg output
none
dmesg output for dual Xeon system none

Description John Klingler 2004-08-03 00:06:02 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4)
Gecko/20030624 Netscape/7.1

Description of problem:
The 2.4.21-4.ELsmp and the newer 2.4.21-15.0.3.ELsmp kernels both hang
with dual Xeon processors every few hours. It seems random but is
probably not. A few minutes ago it hung when looking up a tag
definition with nedit. Several times it hung while compiling and
linking. If you don't run anything except normal system activities,
once in a while it will run overnight. Usually not.

When I say hung, I mean you can't ping it, there's no cursor and it
never comes back (if you consider two days sufficient to say "never").
You have to manually reset the system. 

It does not seem to happen with the 2.4.21-4.EL uniprocessor kernel. I
think you have a kernel race condition in your threads code. As you
may know, the Xeon appears as two CPUs to the kernel and can execute
two threads simultaneously (4 for a dual Xeon system). Perhaps this
creates a problem. 

I have tried 2.4.21-4.ELsmp on two dual Xeon systems with the same
results. 





Version-Release number of selected component (if applicable):
2.4.21-4.ELsmp and 2.4.21-15.0.3.ELsmp

How reproducible:
Always

Steps to Reproduce:
1.Run any compute-intensive or multithreaded task
2.
3.
    

Actual Results:  The system hangs. You cant ping it, there's no
cursor, it will not respond to Control+Alt+Delete, nothing. It must be
physically reset.

Additional info:

The 2.4.21-4.EL uniprocessor kernel does not appear to hang, although
I only tested it for 36 hours.

Comment 1 John Klingler 2004-08-04 00:20:35 UTC
One operation that frequently locks up the system is the final link
when making the X server. Once, when it froze during the link, I could
still ping the OS, however I could not ssh in and the ssh session I
already had was not responding. I waited for half an hour before
giving up and resetting. 

Comment 2 Frank Zimmermann 2004-08-06 08:19:51 UTC
Created attachment 102477 [details]
dmesg output

Comment 3 Frank Zimmermann 2004-08-06 08:23:56 UTC
Since I upgraded to 2.4.21-15.0.4.ELSmp I noticed exactly the same.
There is pretty little load on the machine and it hangs 1 time per
day/night.

It's a 
Maxdata Platinum 3000 dual processor, 440GX, 
2 x PIII 800 SECC2 100 FSB 256OD,

for details see dmesg output...


Comment 4 John Klingler 2004-08-06 20:25:00 UTC
Created attachment 102491 [details]
dmesg output for dual Xeon system

Comment 5 John Klingler 2004-08-06 20:27:45 UTC
Our systems don't seem to hang when running the uniprocessor kernel.
They also don't appear to hang in run level 3 with the smp kernel, the
main difference being the X server is not running. The problem also
does not seem to occur in runlevel 5, which runs the X server, if I
haven't been doing anything on the system since reset.

I am a developer and do a lot of debugging with ddd and recompiling
with gmake and gcc. Often, I leave the X server stopped in the
debugger when I go home. The system is always hung when I come in the
next morning. Even if I have shut down the X server and exited ddd
before I leave, it is still hung by the next morning. I'm starting to
wonder if gdb, which ddd is a front end for, is not screwing up
something in the kernel, setting a time bomb, so to speak.

Another possibility, mentioned by one of our hardware developers, is
that the motherboard BIOS is not setting up the MPS tables correctly,
which determines the way the OS thinks the CPUs communicate with each
other. We are using SuperMicro motherboards.

I have attached the dmesg output for one of these systems.

P.S. I could not reproduce the problem with vncserver, as was reported
by someone else. Perhaps it was some operation they were doing in vnc
that caused the problem.


Comment 6 John Klingler 2004-08-06 21:32:16 UTC
Comment on attachment 102491 [details]
dmesg output for dual Xeon system

Linux version 2.4.21-15.0.3.ELsmp (bhcompile.redhat.com) (gcc
version 3.2.3 20030502 (Red Hat Linux 3.2.3-37)) #1 SMP Tue Jun 29 18:04:47 EDT
2004
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
 BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000003fef0000 (usable)
 BIOS-e820: 000000003fef0000 - 000000003fef8000 (ACPI data)
 BIOS-e820: 000000003fef8000 - 000000003ff00000 (ACPI NVS)
 BIOS-e820: 000000003ff00000 - 000000003ff80000 (usable)
 BIOS-e820: 000000003ff80000 - 0000000040000000 (reserved)
 BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000ff800000 - 00000000ffc00000 (reserved)
 BIOS-e820: 00000000fff00000 - 0000000100000000 (reserved)
127MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000f6810
hm, page 000f6000 reserved twice.
hm, page 000f7000 reserved twice.
hm, page 0009f000 reserved twice.
hm, page 000a0000 reserved twice.
On node 0 totalpages: 262016
zone(0): 4096 pages.
zone(1): 225280 pages.
zone(2): 32640 pages.
ACPI: Searched entire block, no RSDP was found.
ACPI: RSDP located at physical address c00f6870
RSD PTR  v0 [PTLTD ]
__va_range(0x3fef41c5, 0x28): idx=33 mapped at fffdd000
ACPI table found: RSDT v1 [PTLTD    RSDT   1540.0]
__va_range(0x3fef7e60, 0x24): idx=33 mapped at fffdd000
__va_range(0x3fef7e60, 0x74): idx=33 mapped at fffdd000
ACPI table found: FACP v1 [INTEL  K_CANYON 1540.0]
__va_range(0x3fef7ed4, 0x24): idx=33 mapped at fffdd000
__va_range(0x3fef7ed4, 0xb4): idx=33 mapped at fffdd000
ACPI table found: APIC v1 [PTLTD	 APIC	1540.0]
__va_range(0x3fef7ed4, 0xb4): idx=33 mapped at fffdd000
LAPIC (acpi_id[0x0000] id[0x0] enabled[1])
CPU 0 (0x0000) enabledProcessor #0 Pentium 4(tm) XEON(tm) APIC version 20

LAPIC (acpi_id[0x0001] id[0x6] enabled[1])
CPU 1 (0x0600) enabledProcessor #6 Pentium 4(tm) XEON(tm) APIC version 20

LAPIC (acpi_id[0x0002] id[0x1] enabled[1])
CPU 2 (0x0100) enabledProcessor #1 Pentium 4(tm) XEON(tm) APIC version 20

LAPIC (acpi_id[0x0003] id[0x7] enabled[1])
CPU 3 (0x0700) enabledProcessor #7 Pentium 4(tm) XEON(tm) APIC version 20

IOAPIC (id[0x2] address[0xfec00000] global_irq_base[0x0])
IOAPIC (id[0x3] address[0xfec80000] global_irq_base[0x18])
IOAPIC (id[0x4] address[0xfec80400] global_irq_base[0x30])
IOAPIC (id[0x5] address[0xfec81000] global_irq_base[0x48])
IOAPIC (id[0x8] address[0xfec81400] global_irq_base[0x60])
INT_SRC_OVR (bus[0] irq[0x0] global_irq[0x2] polarity[0x1] trigger[0x1])
INT_SRC_OVR (bus[0] irq[0x9] global_irq[0x9] polarity[0x1] trigger[0x3])
LAPIC_NMI (acpi_id[0x0000] polarity[0x1] trigger[0x1] lint[0x1])
LAPIC_NMI (acpi_id[0x0001] polarity[0x1] trigger[0x1] lint[0x1])
LAPIC_NMI (acpi_id[0x0002] polarity[0x1] trigger[0x1] lint[0x1])
LAPIC_NMI (acpi_id[0x0003] polarity[0x1] trigger[0x1] lint[0x1])
4 CPUs total
Local APIC address fee00000
__va_range(0x3fef7f88, 0x24): idx=33 mapped at fffdd000
__va_range(0x3fef7f88, 0x28): idx=33 mapped at fffdd000
ACPI table found: BOOT v1 [PTLTD  $SBFTBL$ 1540.0]
__va_range(0x3fef7fb0, 0x24): idx=33 mapped at fffdd000
__va_range(0x3fef7fb0, 0x50): idx=33 mapped at fffdd000
ACPI table found: SPCR v1 [PTLTD  $UCRTBL$ 1540.0]
Enabling the CPU's according to the ACPI table
Intel MultiProcessor Specification v1.4
    Virtual Wire compatibility mode.
OEM ID:   Product ID: Kings Canyon APIC at: 0xFEE00000
I/O APIC #2 Version 32 at 0xFEC00000.
I/O APIC #3 Version 32 at 0xFEC80000.
I/O APIC #4 Version 32 at 0xFEC80400.
I/O APIC #5 Version 32 at 0xFEC81000.
I/O APIC #8 Version 32 at 0xFEC81400.
Processors: 4
xAPIC support is present
Enabling APIC mode: Flat.	Using 5 I/O APICs
Kernel command line: ro root=LABEL=/
Initializing CPU#0
Detected 2399.380 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 4784.12 BogoMIPS
Memory: 1024412k/1048064k available (1687k kernel code, 20004k reserved, 1290k
data, 228k init, 130496k highmem)
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
Inode cache hash table entries: 65536 (order: 7, 524288 bytes)
Mount cache hash table entries: 512 (order: 0, 4096 bytes)
Buffer cache hash table entries: 65536 (order: 6, 262144 bytes)
Page-cache hash table entries: 262144 (order: 8, 1048576 bytes)
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 0
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU:	 After generic, caps: bfebfbff 00000000 00000000 00000000
CPU:		 Common caps: bfebfbff 00000000 00000000 00000000
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
mtrr: v1.40 (20010327) Richard Gooch (rgooch.au)
mtrr: detected mtrr type: Intel
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 0
Intel machine check reporting enabled on CPU#0.
CPU:	 After generic, caps: bfebfbff 00000000 00000000 00000000
CPU:		 Common caps: bfebfbff 00000000 00000000 00000000
CPU0: Intel(R) Xeon(TM) CPU 2.40GHz stepping 07
per-CPU timeslice cutoff: 1462.99 usecs.
task migration cache decay timeout: 10 msecs.
enabled ExtINT on CPU#0
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Booting processor 1/1 eip 2000
Initializing CPU#1
masked ExtINT on CPU#1
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Calibrating delay loop... 4797.23 BogoMIPS
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 512K
ACPI tables and CPU MSR values mismatch about cpu number 
CPU: Physical Processor ID: 3
Intel machine check reporting enabled on CPU#1.
CPU:	 After generic, caps: bfebfbff 00000000 00000000 00000000
CPU:		 Common caps: bfebfbff 00000000 00000000 00000000
CPU1: Intel(R) Xeon(TM) CPU 2.40GHz stepping 07
Booting processor 2/6 eip 2000
Initializing CPU#2
masked ExtINT on CPU#2
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Calibrating delay loop... 4797.23 BogoMIPS
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 0
Intel machine check reporting enabled on CPU#2.
CPU:	 After generic, caps: bfebfbff 00000000 00000000 00000000
CPU:		 Common caps: bfebfbff 00000000 00000000 00000000
CPU2: Intel(R) Xeon(TM) CPU 2.40GHz stepping 07
Booting processor 3/7 eip 2000
Initializing CPU#3
masked ExtINT on CPU#3
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Calibrating delay loop... 4797.23 BogoMIPS
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 3
Intel machine check reporting enabled on CPU#3.
CPU:	 After generic, caps: bfebfbff 00000000 00000000 00000000
CPU:		 Common caps: bfebfbff 00000000 00000000 00000000
CPU3: Intel(R) Xeon(TM) CPU 2.40GHz stepping 07
Total of 4 processors activated (19175.83 BogoMIPS).
ENABLING IO-APIC IRQs
Setting 2 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 2 ... ok.
Setting 3 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 3 ... ok.
Setting 4 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 4 ... ok.
Setting 5 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 5 ... ok.
Setting 8 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 8 ... ok.
init IO_APIC IRQs
 IO-APIC (apicid-pin) 2-0, 2-10, 2-11, 2-17, 2-20, 2-21, 2-22, 2-23, 3-2, 3-3,
3-6, 3-7, 3-8, 3-9, 3-10, 3-11, 3-12, 3-13, 3-14, 3-15, 3-16, 3-17, 3-18, 3-19,
3-20, 3-21, 3-22, 3-23, 4-2, 4-3, 4-4, 4-5, 4-6, 4-7, 4-8, 4-9, 4-10, 4-11,
4-12, 4-13, 4-14, 4-15, 4-16, 4-17, 4-18, 4-19, 4-20, 4-21, 4-22, 4-23, 5-2,
5-3, 5-4, 5-5, 5-6, 5-7, 5-8, 5-9, 5-10, 5-11, 5-12, 5-13, 5-14, 5-15, 5-16,
5-17, 5-18, 5-19, 5-20, 5-21, 5-22, 5-23, 8-1, 8-2, 8-3, 8-5, 8-6, 8-7, 8-8,
8-9, 8-10, 8-11, 8-12, 8-13, 8-14, 8-15, 8-16, 8-17, 8-18, 8-19, 8-20, 8-21,
8-22, 8-23 not connected.
..TIMER: vector=0x31 pin1=2 pin2=0
number of MP IRQ sources: 27.
number of IO-APIC #2 registers: 24.
number of IO-APIC #3 registers: 24.
number of IO-APIC #4 registers: 24.
number of IO-APIC #5 registers: 24.
number of IO-APIC #8 registers: 24.
testing the IO APIC.......................

IO APIC #2......
.... register #00: 02008000
.......    : physical APIC id: 02
.......    : Delivery Type: 1
.......    : LTS	  : 0
.... register #01: 00178020
.......     : max redirection entries: 0017
.......     : PRQ implemented: 1
.......     : IO APIC version: 0020
.... register #02: 00000000
.......     : arbitration: 00
.... IRQ redirection table:
 NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:   
 00 000 00  1	 0    0   0   0    0	0    00
 01 00F 0F  0	 0    0   0   0    1	1    39
 02 00F 0F  0	 0    0   0   0    1	1    31
 03 00F 0F  0	 0    0   0   0    1	1    41
 04 00F 0F  0	 0    0   0   0    1	1    49
 05 00F 0F  0	 0    0   0   0    1	1    51
 06 00F 0F  0	 0    0   0   0    1	1    59
 07 00F 0F  0	 0    0   0   0    1	1    61
 08 00F 0F  0	 0    0   0   0    1	1    69
 09 00F 0F  0	 0    0   0   0    1	1    71
 0a 000 00  1	 0    0   0   0    0	0    00
 0b 000 00  1	 0    0   0   0    0	0    00
 0c 00F 0F  0	 0    0   0   0    1	1    79
 0d 00F 0F  0	 0    0   0   0    1	1    81
 0e 00F 0F  0	 0    0   0   0    1	1    89
 0f 00F 0F  0	 0    0   0   0    1	1    91
 10 00F 0F  1	 1    0   1   0    1	1    99
 11 000 00  1	 0    0   0   0    0	0    00
 12 00F 0F  1	 1    0   1   0    1	1    A1
 13 00F 0F  1	 1    0   1   0    1	1    A9
 14 000 00  1	 0    0   0   0    0	0    00
 15 000 00  1	 0    0   0   0    0	0    00
 16 000 00  1	 0    0   0   0    0	0    00
 17 000 00  1	 0    0   0   0    0	0    00

IO APIC #3......
.... register #00: 03000000
.......    : physical APIC id: 03
.......    : Delivery Type: 0
.......    : LTS	  : 0
.... register #01: 00178020
.......     : max redirection entries: 0017
.......     : PRQ implemented: 1
.......     : IO APIC version: 0020
.... register #02: 03000000
.......     : arbitration: 03
.... IRQ redirection table:
 NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:   
 00 00F 0F  1	 1    0   1   0    1	1    B1
 01 00F 0F  1	 1    0   1   0    1	1    B9
 02 000 00  1	 0    0   0   0    0	0    00
 03 000 00  1	 0    0   0   0    0	0    00
 04 00F 0F  1	 1    0   1   0    1	1    C1
 05 00F 0F  1	 1    0   1   0    1	1    C9
 06 000 00  1	 0    0   0   0    0	0    00
 07 000 00  1	 0    0   0   0    0	0    00
 08 000 00  1	 0    0   0   0    0	0    00
 09 000 00  1	 0    0   0   0    0	0    00
 0a 000 00  1	 0    0   0   0    0	0    00
 0b 000 00  1	 0    0   0   0    0	0    00
 0c 000 00  1	 0    0   0   0    0	0    00
 0d 000 00  1	 0    0   0   0    0	0    00
 0e 000 00  1	 0    0   0   0    0	0    00
 0f 000 00  1	 0    0   0   0    0	0    00
 10 000 00  1	 0    0   0   0    0	0    00
 11 000 00  1	 0    0   0   0    0	0    00
 12 000 00  1	 0    0   0   0    0	0    00
 13 000 00  1	 0    0   0   0    0	0    00
 14 000 00  1	 0    0   0   0    0	0    00
 15 000 00  1	 0    0   0   0    0	0    00
 16 000 00  1	 0    0   0   0    0	0    00
 17 000 00  1	 0    0   0   0    0	0    00

IO APIC #4......
.... register #00: 04000000
.......    : physical APIC id: 04
.......    : Delivery Type: 0
.......    : LTS	  : 0
.... register #01: 00178020
.......     : max redirection entries: 0017
.......     : PRQ implemented: 1
.......     : IO APIC version: 0020
.... register #02: 04000000
.......     : arbitration: 04
.... IRQ redirection table:
 NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:   
 00 00F 0F  1	 1    0   1   0    1	1    D1
 01 00F 0F  1	 1    0   1   0    1	1    D9
 02 000 00  1	 0    0   0   0    0	0    00
 03 000 00  1	 0    0   0   0    0	0    00
 04 000 00  1	 0    0   0   0    0	0    00
 05 000 00  1	 0    0   0   0    0	0    00
 06 000 00  1	 0    0   0   0    0	0    00
 07 000 00  1	 0    0   0   0    0	0    00
 08 000 00  1	 0    0   0   0    0	0    00
 09 000 00  1	 0    0   0   0    0	0    00
 0a 000 00  1	 0    0   0   0    0	0    00
 0b 000 00  1	 0    0   0   0    0	0    00
 0c 000 00  1	 0    0   0   0    0	0    00
 0d 000 00  1	 0    0   0   0    0	0    00
 0e 000 00  1	 0    0   0   0    0	0    00
 0f 000 00  1	 0    0   0   0    0	0    00
 10 000 00  1	 0    0   0   0    0	0    00
 11 000 00  1	 0    0   0   0    0	0    00
 12 000 00  1	 0    0   0   0    0	0    00
 13 000 00  1	 0    0   0   0    0	0    00
 14 000 00  1	 0    0   0   0    0	0    00
 15 000 00  1	 0    0   0   0    0	0    00
 16 000 00  1	 0    0   0   0    0	0    00
 17 000 00  1	 0    0   0   0    0	0    00

IO APIC #5......
.... register #00: 05000000
.......    : physical APIC id: 05
.......    : Delivery Type: 0
.......    : LTS	  : 0
.... register #01: 00178020
.......     : max redirection entries: 0017
.......     : PRQ implemented: 1
.......     : IO APIC version: 0020
.... register #02: 05000000
.......     : arbitration: 05
.... IRQ redirection table:
 NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:   
 00 00F 0F  1	 1    0   1   0    1	1    E1
 01 00F 0F  1	 1    0   1   0    1	1    E9
 02 000 00  1	 0    0   0   0    0	0    00
 03 000 00  1	 0    0   0   0    0	0    00
 04 000 00  1	 0    0   0   0    0	0    00
 05 000 00  1	 0    0   0   0    0	0    00
 06 000 00  1	 0    0   0   0    0	0    00
 07 000 00  1	 0    0   0   0    0	0    00
 08 000 00  1	 0    0   0   0    0	0    00
 09 000 00  1	 0    0   0   0    0	0    00
 0a 000 00  1	 0    0   0   0    0	0    00
 0b 000 00  1	 0    0   0   0    0	0    00
 0c 000 00  1	 0    0   0   0    0	0    00
 0d 000 00  1	 0    0   0   0    0	0    00
 0e 000 00  1	 0    0   0   0    0	0    00
 0f 000 00  1	 0    0   0   0    0	0    00
 10 000 00  1	 0    0   0   0    0	0    00
 11 000 00  1	 0    0   0   0    0	0    00
 12 000 00  1	 0    0   0   0    0	0    00
 13 000 00  1	 0    0   0   0    0	0    00
 14 000 00  1	 0    0   0   0    0	0    00
 15 000 00  1	 0    0   0   0    0	0    00
 16 000 00  1	 0    0   0   0    0	0    00
 17 000 00  1	 0    0   0   0    0	0    00

IO APIC #8......
.... register #00: 08000000
.......    : physical APIC id: 08
.......    : Delivery Type: 0
.......    : LTS	  : 0
.... register #01: 00178020
.......     : max redirection entries: 0017
.......     : PRQ implemented: 1
.......     : IO APIC version: 0020
.... register #02: 08000000
.......     : arbitration: 08
.... IRQ redirection table:
 NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:   
 00 00F 0F  1	 1    0   1   0    1	1    32
 01 000 00  1	 0    0   0   0    0	0    00
 02 000 00  1	 0    0   0   0    0	0    00
 03 000 00  1	 0    0   0   0    0	0    00
 04 00F 0F  1	 1    0   1   0    1	1    3A
 05 000 00  1	 0    0   0   0    0	0    00
 06 000 00  1	 0    0   0   0    0	0    00
 07 000 00  1	 0    0   0   0    0	0    00
 08 000 00  1	 0    0   0   0    0	0    00
 09 000 00  1	 0    0   0   0    0	0    00
 0a 000 00  1	 0    0   0   0    0	0    00
 0b 000 00  1	 0    0   0   0    0	0    00
 0c 000 00  1	 0    0   0   0    0	0    00
 0d 000 00  1	 0    0   0   0    0	0    00
 0e 000 00  1	 0    0   0   0    0	0    00
 0f 000 00  1	 0    0   0   0    0	0    00
 10 000 00  1	 0    0   0   0    0	0    00
 11 000 00  1	 0    0   0   0    0	0    00
 12 000 00  1	 0    0   0   0    0	0    00
 13 000 00  1	 0    0   0   0    0	0    00
 14 000 00  1	 0    0   0   0    0	0    00
 15 000 00  1	 0    0   0   0    0	0    00
 16 000 00  1	 0    0   0   0    0	0    00
 17 000 00  1	 0    0   0   0    0	0    00
IRQ to pin mappings:
IRQ0 -> 0:2
IRQ1 -> 0:1
IRQ3 -> 0:3
IRQ4 -> 0:4
IRQ5 -> 0:5
IRQ6 -> 0:6
IRQ7 -> 0:7
IRQ8 -> 0:8
IRQ9 -> 0:9
IRQ12 -> 0:12
IRQ13 -> 0:13
IRQ14 -> 0:14
IRQ15 -> 0:15
IRQ16 -> 0:16
IRQ18 -> 0:18
IRQ19 -> 0:19
IRQ24 -> 1:0
IRQ25 -> 1:1
IRQ28 -> 1:4
IRQ29 -> 1:5
IRQ48 -> 2:0
IRQ49 -> 2:1
IRQ72 -> 3:0
IRQ73 -> 3:1
IRQ96 -> 4:0
IRQ100 -> 4:4
.................................... done.
Using local APIC timer interrupts.
calibrating APIC timer ...
..... CPU clock speed is 2399.3290 MHz.
..... host bus clock speed is 133.2958 MHz.
cpu: 0, clocks: 1332958, slice: 266591
CPU0<T0:1332944,T1:1066352,D:1,S:266591,C:1332958>
cpu: 1, clocks: 1332958, slice: 266591
cpu: 3, clocks: 1332958, slice: 266591
cpu: 2, clocks: 1332958, slice: 266591
CPU1<T0:1332944,T1:799760,D:2,S:266591,C:1332958>
CPU2<T0:1332944,T1:533168,D:3,S:266591,C:1332958>
CPU3<T0:1332944,T1:266576,D:4,S:266591,C:1332958>
cpu_sibling_map[0] = 2
cpu_sibling_map[1] = 3
cpu_sibling_map[2] = 0
cpu_sibling_map[3] = 1
mapping CPU#0's runqueue to CPU#2's runqueue.
mapping CPU#1's runqueue to CPU#3's runqueue.
zapping low mappings.
Process timing init...done.
Starting migration thread for cpu 0
Starting migration thread for cpu 1
Starting migration thread for cpu 2
Starting migration thread for cpu 3
PCI: PCI BIOS revision 2.10 entry at 0xfd875, last bus=10
PCI: Using configuration type 1
PCI: Probing PCI hardware
PCI: Ignoring BAR0-3 of IDE controller 00:1f.1
Transparent bridge - Intel Corp. 82801BA/CA/DB/EB PCI Bridge
PCI: Discovered primary peer bus 10 [IRQ]
PCI: Discovered primary peer bus 11 [IRQ]
PCI: Discovered primary peer bus 12 [IRQ]
PCI: Using IRQ router PIIX [8086/2480] at 00:1f.0
PCI->APIC IRQ transform: (B0,I29,P0) -> 16
PCI->APIC IRQ transform: (B0,I29,P1) -> 19
PCI->APIC IRQ transform: (B0,I29,P2) -> 18
PCI->APIC IRQ transform: (B3,I0,P0) -> 48
PCI->APIC IRQ transform: (B3,I1,P0) -> 49
PCI->APIC IRQ transform: (B4,I2,P0) -> 28
PCI->APIC IRQ transform: (B4,I2,P1) -> 29
PCI->APIC IRQ transform: (B5,I0,P0) -> 24
PCI->APIC IRQ transform: (B5,I1,P0) -> 25
PCI->APIC IRQ transform: (B7,I1,P0) -> 96
PCI->APIC IRQ transform: (B7,I2,P0) -> 100
PCI->APIC IRQ transform: (B9,I0,P0) -> 72
PCI->APIC IRQ transform: (B9,I1,P0) -> 73
isapnp: Scanning for PnP cards...
isapnp: No Plug & Play device found
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
apm: BIOS version 1.2 Flags 0x03 (Driver version 1.16)
apm: disabled - APM is not SMP safe.
Total HugeTLB memory allocated, 0
Starting kswapd
allocated 32 pages and 32 bhs reserved for the highmem bounces
VFS: Disk quotas vdquot_6.5.1
aio_setup: num_physpages = 65504
aio_setup: sizeof(struct page) = 60
Hugetlbfs mounted.
pty: 2048 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with MANY_PORTS MULTIPORT SHARE_IRQ
SERIAL_PCI ISAPNP enabled
ttyS0 at 0x03f8 (irq = 4) is a 16550A
ttyS1 at 0x02f8 (irq = 3) is a 16550A
Real Time Clock Driver v1.10e
NET4: Frame Diverter 0.46
RAMDISK driver initialized: 256 RAM disks of 8192K size 1024 blocksize
Uniform Multi-Platform E-IDE driver Revision: 7.00beta4-2.4
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ICH3: IDE controller at PCI slot 00:1f.1
PCI: Enabling device 00:1f.1 (0005 -> 0007)
ICH3: chipset revision 2
ICH3: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0x2060-0x2067, BIOS settings: hda:pio, hdb:pio
    ide1: BM-DMA at 0x2068-0x206f, BIOS settings: hdc:pio, hdd:pio
hda: WDC WD200BB-75CLB0, ATA DISK drive
blk: queue c0501740, I/O limit 4095Mb (mask 0xffffffff)
hdc: CDU5211, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
hda: attached ide-disk driver.
hda: host protected area => 1
hda: 39102336 sectors (20020 MB) w/2048KiB Cache, CHS=2434/255/63, UDMA(100)
ide-floppy driver 0.99.newide
Partition check:
 hda: hda1 hda2 hda3
ide-floppy driver 0.99.newide
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
Initializing Cryptographic API
NET4: Linux TCP/IP 1.0 for NET4.0
IP: routing cache hash table of 8192 buckets, 64Kbytes
TCP: Hash tables configured (established 262144 bind 65536)
Linux IP multicast router 0.06 plus PIM-SM
Initializing IPsec netlink socket
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
RAMDISK: Compressed image found at block 0
Freeing initrd memory: 163k freed
VFS: Mounted root (ext2 filesystem).
Journalled Block Device driver loaded
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
kjournald starting.  Commit interval 5 seconds
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
Freeing unused kernel memory: 228k freed
usb.c: registered new driver usbdevfs
usb.c: registered new driver hub
usb-uhci.c: $Revision: 1.275 $ time 18:13:42 Jun 29 2004
usb-uhci.c: High bandwidth mode enabled
PCI: Setting latency timer of device 00:1d.0 to 64
usb-uhci.c: USB UHCI at I/O 0x2000, IRQ 16
usb-uhci.c: Detected 2 ports
usb.c: new USB bus registered, assigned bus number 1
hub.c: USB hub found
hub.c: 2 ports detected
PCI: Setting latency timer of device 00:1d.1 to 64
usb-uhci.c: USB UHCI at I/O 0x2020, IRQ 19
usb-uhci.c: Detected 2 ports
usb.c: new USB bus registered, assigned bus number 2
hub.c: USB hub found
hub.c: 2 ports detected
PCI: Setting latency timer of device 00:1d.2 to 64
usb-uhci.c: USB UHCI at I/O 0x2040, IRQ 18
usb-uhci.c: Detected 2 ports
usb.c: new USB bus registered, assigned bus number 3
hub.c: USB hub found
hub.c: 2 ports detected
usb-uhci.c: v1.275:USB Universal Host Controller Interface driver
usb.c: registered new driver hiddev
usb.c: registered new driver hid
hid-core.c: v1.8.1 Andreas Gal, Vojtech Pavlik <vojtech>
hid-core.c: USB HID support drivers
mice: PS/2 mouse device common for all mice
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,2), internal journal
Adding Swap: 2040244k swap-space (priority -1)
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,1), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
ip_tables: (C) 2000-2002 Netfilter core team
Intel(R) PRO/1000 Network Driver - version 5.2.30.1-k1
Copyright (c) 1999-2004 Intel Corporation.
divert: allocating divert_blk for eth0
eth0: Intel(R) PRO/1000 Network Connection
divert: allocating divert_blk for eth1
eth1: Intel(R) PRO/1000 Network Connection
ip_tables: (C) 2000-2002 Netfilter core team
e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex
Installing knfsd (copyright (C) 1996 okir.de).

Comment 7 John Klingler 2004-08-12 21:28:33 UTC
Our QA engineer found he could reproduce the problem in about 30
seconds by executing the command "dd if=/dev/hda of=/dev/null",
assuming the hard drive is on /dev/hda. We also found that could only
be reproduced if the Xserver is running. We have no idea why that is
the case. 

QA also discovered the 2.4.20-20.9smp kernel does not have this problem. 

(He recompiled it with CONFIG_SCSI_MULTI_LUN=y to meet our needs but
do not think that will be necessary for most users.)

Comment 9 Ernie Petrides 2005-06-07 01:19:10 UTC
Hello, John.  I could not reproduce this problem.  Are recent kernels
(such as U5 2.4.21-32.EL or the recent E6 security erratum 2.4.21-32.0.1.EL
released in advisory RHSA-2005:472) still causing a problem?

If so, could you please capture the (likely) console panic/oops output
on a serial console connection?  Also, were your problems only able to be
reproduced with a custom-built kernel (with CONFIG_SCSI_MULTI_LUN enabled)?

Thanks in advance for the additional info.  -ernie


Comment 10 John Klingler 2005-06-07 17:40:01 UTC
Hi, Ernie. I'm sorry I didn't think of the Oops message. The 2.4.21-15.0.3.ELsmp
kernel was not customized in any way or recompiled. It was a clean install from
the CD, not an upgrade. It was custom, "Install Everything" using the default
partitioning and static network configuration. Perhaps the system you tested on
had a minimal install (workstation). If so, there would be several services and
applications such as the canna server that would not be configured to start at
boot. 

Did you try the method os reproducing it quickly discovered by our QA, "dd
if=/dev/hda of=/dev/null"? If that didn't reproduce it, other than the runtime
background processes configured by a full install, I can only think of hardware
or BIOS causes. 

John

Comment 11 Ernie Petrides 2005-06-07 18:21:54 UTC
Hi, John.  I did use a test system with a full install and just now
verified that the canna server does start.  I also ran the "dd" test
exactly as specified in comment #7 and did not incur any problems.

I agree with you that there are likely to be hardware and/or BIOS
differences between our systems.  If you could provide the serial
console output from a crash, this would help us to narrow in on
the problem (which might possibly be fixed in later udpates).

So, reverting state to NEEDINFO until we can get crash info or a
generic way to reproduce the problem here (or until you discover
that the problem has already been resolved in a recent update).


Comment 12 Ernie Petrides 2005-08-30 19:23:33 UTC
Closing due to lack of response.