Bug 32917 - RH7.1 hangs on 5+ CPU system
Summary: RH7.1 hangs on 5+ CPU system
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.1
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: Brock Organ
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2001-03-23 21:49 UTC by Wendy Hung
Modified: 2007-04-18 16:32 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2001-06-26 17:07:47 UTC
Embargoed:


Attachments (Terms of Use)

Description Wendy Hung 2001-03-23 21:49:37 UTC
Installed RH7.1 QA0319 on Netfinity 8500R (8-way system).  SMP kernel autodetected and auto-installed.
Upon reboot, system hangs at processor initialization screen (prior to kernel boot: prompt screen).

Not a hard hang -- Num Lock still reponds. 
Uni-processor kernel boots OK.  Bug occurs with 5 or more CPUs.

Comment 1 Matt Wilson 2001-03-23 22:11:40 UTC
did this get to the lilo screen?  or did it hang before lilo?  If it made it to
lilo, what messages were printed to the console?


Comment 2 Wendy Hung 2001-03-23 22:19:53 UTC
Correction: hang occurs after selecting the 'linux' kernel screen, but before the login screen.
Screen messages refer to Starting up CPUs.


Comment 3 Arjan van de Ven 2001-03-23 22:24:26 UTC
The 0319 snapshot contains a lot of debugging code to catch memory-allocation
related errors. We recently found a very important bug in the kernel that
somehow mostly triggered on SMP machines with 4 or more cpus. We have since
fixed this bug in kernel 2.4.2-0.1.35. Hopefully this kernel will be available
to betatesters soon (QA is testing it right now) either as a "kernel rpm" or
as a "full snapshot". It would be very much appreciated if you could test
such a new kernel once it becomes available. 
(This kernel is not yet present in the 0322 snaphot, but should be in any newer
 ones if/when they become available)

Comment 4 Glen Foster 2001-03-24 16:44:15 UTC
I have placed the 0.1.35 kernel rpms on ftp.beta.redhat.com for your
examination/use.  Connect via ftp to ftp.beta.redhat.com, login as user "beta". 
From there, the relative paths for all the RPMs are:

pub/errata/7.1/SRPMS/kernel-2.4.2-0.1.35.src.rpm
pub/errata/7.1/i386/devfsd-2.4.2-0.1.35.i386.rpm
pub/errata/7.1/i386/kernel-2.4.2-0.1.35.i386.rpm
pub/errata/7.1/i386/kernel-BOOT-2.4.2-0.1.35.i386.rpm
pub/errata/7.1/i386/kernel-doc-2.4.2-0.1.35.i386.rpm
pub/errata/7.1/i386/kernel-headers-2.4.2-0.1.35.i386.rpm
pub/errata/7.1/i386/kernel-source-2.4.2-0.1.35.i386.rpm
pub/errata/7.1/i586/kernel-2.4.2-0.1.35.i586.rpm
pub/errata/7.1/i586/kernel-smp-2.4.2-0.1.35.i586.rpm
pub/errata/7.1/i686/kernel-2.4.2-0.1.35.i686.rpm
pub/errata/7.1/i686/kernel-enterprise-2.4.2-0.1.35.i686.rpm
pub/errata/7.1/i686/kernel-smp-2.4.2-0.1.35.i686.rpm


Comment 5 Wendy Hung 2001-03-26 22:47:13 UTC
Installed the kernel-headers and kernel-smp rpms.  Ran mkinitrd and edited /etc/lilo.conf.
Upon reboot into the 2.4.2-0.1.35smp kernel, same hang as reported.
Messages on screen:

...
Asserting INIT
Waiting for send to finish...
+ Deasserting INIT
 Waiting for send to finish...
+# Startup loops:2
Sending STARTUP #1
After apic_write,
Startup point 1
Waiting for send to finish...
+Sending STARTUP #2
After apic_write,
Starting point 1
Waiting for send to finish...
+After Startup
Before Callout 1


Comment 6 Wendy Hung 2001-03-29 18:05:18 UTC
Same bug with QA0327.2   (2.4.2-0.1.40smp kernel)

Comment 7 Arjan van de Ven 2001-03-29 18:09:21 UTC
Does a non-redhat 2.4 kernel boot ?
If not, this sounds like a bios bug


Comment 8 Wendy Hung 2001-03-29 21:26:07 UTC
Yes, a non-redhat 2.4.2 kernel boots successfully and sees all 8 processors.


Comment 9 Wendy Hung 2001-04-06 17:04:01 UTC
Fixed in QA0404 2.4.2-0.1.49smp kernel.

Comment 10 Wendy Hung 2001-05-22 14:19:28 UTC
Not fixed in RH 7.1 Gold.

Comment 11 Arjan van de Ven 2001-05-22 14:23:45 UTC
2.4.2-0.1.49 and the gold kernel are virtually identical (except for the
corruption fix)..........

I'm getting confused here.


Comment 12 Arjan van de Ven 2001-06-26 13:36:13 UTC
If this is still hapening with the released errata kernel I'd like to know

Comment 13 Wendy Hung 2001-06-26 17:07:42 UTC
Bug fixed in kernel errata (2.4.3-12smp)
http://www.redhat.com/support/errata/RHSA-2001-084.html

Also works using RH 7.1 SBE (2.4.3-6smp)


Note You need to log in before you can comment on or make changes to this bug.