Bug 480844 - CONFIG_X86_BIGSMP not set causes hang on Nehalem DP systems
Summary: CONFIG_X86_BIGSMP not set causes hang on Nehalem DP systems
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 10
Hardware: i386
OS: Linux
high
urgent
Target Milestone: ---
Assignee: Kyle McMartin
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-01-20 20:43 UTC by John Villalovos
Modified: 2015-09-01 03:53 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-02-09 21:19:20 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description John Villalovos 2009-01-20 20:43:49 UTC
We have encountered a fedora10 32-bit installation issue on a NHM DP system with HT on. When booting from Fedora 10 32-bit installation DVD, the kernel initializes the first 8 cpus and then hangs.

Recent mainline linux kernel changes now mandate that the 32bit kernel config files to turn on CONFIG_X86_BIGSMP to boot successfully on a system with more than 8 cpus. All Intel Nehalem DualProcessor systems have 16 logical cpu's.

We checked the latest kernel update for Fedora 10 from http://mirrors.usc.edu/pub/linux/distributions/fedora/linux/updates/10/SRPMS/kernel-2.6.27.9-159.fc10.src.rpm

And it still doesn't have CONFIG_X86_BIGSMP set for 32bit kernels.

We would like it if you can enable this config option for all 32bit kernels? Otherwise fedora 10 installation will fail to work on platforms (Platforms containing NHM, Dunnigton, Tigerton, Tulsa etc) having > 8 logical cpu's.

Comment 1 Kyle McMartin 2009-02-08 01:30:16 UTC
Can you provide a console log? Even without BIGSMP, the machine should not be hanging, but should be limited to 8 cpus (quite likely 4 real and 4 ht...)

Can you try booting with maxcpus set to less than 8? Say, maxcpus set to 4? If this doesn't succeed either how about nosmp?

I realize some of these boxes are likely pre-release, so feel free to take this bug private if need be. (That said, is it intentional that you're going i386? F-11 will likely be defaulting to x86_64 kernels in compat mode even for i386 installs where appropriate.)

regards, Kyle

Comment 2 John Villalovos 2009-02-09 18:16:16 UTC
Suresh,

Can you answer the question in Comment 1 ?

Comment 3 Suresh Siddha 2009-02-09 19:13:33 UTC
I don't have the console logs in hand. But I believe redhat already has the systems (and this can even happen on any old x86 platform having more than 8 logical cpus) and it should be very easy to collect the console log. Perhaps John can help.

maxcpus=8 boot option is one of the workarounds for this problem.

If we don't set BIGSMP and if we find more than 8 cpus, we will be able to bring them online, but during their apic initialization, multiple cpu's will have configured same apic LDR values. This will confuse the interrupt handling creating weird hangs etc.

Perhaps we can add more defensive code to either panic with the appropriate message or just ignore more than 8 cpus. But thats a different thing.

Comment 4 Kyle McMartin 2009-02-09 20:03:14 UTC
I'm more than happy to turn this on in Fedora, but it sounds like a pretty serious upstream bug to not do the right thing when we overflow.

I'll commit this to Fedora this afternoon so it should be in the next builds.

cheers, Kyle

Comment 5 Kyle McMartin 2009-02-09 21:19:20 UTC
Ok, it's been committed to rawhide, F-10 and F-10-2_6_27, after I checked that it wouldn't have any adverse effects (hopefully...) We had NR_CPUS set to 32 on i386 anyway, so it was likely just an oversight.

I'll try to see if I can sort out access to such a machine internally to look into why it doesn't gracefully handle BIGSMP being unset.

cheers, Kyle

Comment 6 Michael K Johnson 2009-04-22 02:07:22 UTC
Trivial patch limiting the configuration to what is bootable sent upstream, see:
http://thread.gmane.org/gmane.linux.kernel/825782

That does not preclude a better patch that prevents the boot from hanging,
but at least avoids the misconfiguration.


Note You need to log in before you can comment on or make changes to this bug.