Red Hat Bugzilla – Bug 480844
CONFIG_X86_BIGSMP not set causes hang on Nehalem DP systems
Last modified: 2015-08-31 23:53:32 EDT
We have encountered a fedora10 32-bit installation issue on a NHM DP system with HT on. When booting from Fedora 10 32-bit installation DVD, the kernel initializes the first 8 cpus and then hangs.
Recent mainline linux kernel changes now mandate that the 32bit kernel config files to turn on CONFIG_X86_BIGSMP to boot successfully on a system with more than 8 cpus. All Intel Nehalem DualProcessor systems have 16 logical cpu's.
We checked the latest kernel update for Fedora 10 from http://mirrors.usc.edu/pub/linux/distributions/fedora/linux/updates/10/SRPMS/kernel-184.108.40.206-159.fc10.src.rpm
And it still doesn't have CONFIG_X86_BIGSMP set for 32bit kernels.
We would like it if you can enable this config option for all 32bit kernels? Otherwise fedora 10 installation will fail to work on platforms (Platforms containing NHM, Dunnigton, Tigerton, Tulsa etc) having > 8 logical cpu's.
Can you provide a console log? Even without BIGSMP, the machine should not be hanging, but should be limited to 8 cpus (quite likely 4 real and 4 ht...)
Can you try booting with maxcpus set to less than 8? Say, maxcpus set to 4? If this doesn't succeed either how about nosmp?
I realize some of these boxes are likely pre-release, so feel free to take this bug private if need be. (That said, is it intentional that you're going i386? F-11 will likely be defaulting to x86_64 kernels in compat mode even for i386 installs where appropriate.)
Can you answer the question in Comment 1 ?
I don't have the console logs in hand. But I believe redhat already has the systems (and this can even happen on any old x86 platform having more than 8 logical cpus) and it should be very easy to collect the console log. Perhaps John can help.
maxcpus=8 boot option is one of the workarounds for this problem.
If we don't set BIGSMP and if we find more than 8 cpus, we will be able to bring them online, but during their apic initialization, multiple cpu's will have configured same apic LDR values. This will confuse the interrupt handling creating weird hangs etc.
Perhaps we can add more defensive code to either panic with the appropriate message or just ignore more than 8 cpus. But thats a different thing.
I'm more than happy to turn this on in Fedora, but it sounds like a pretty serious upstream bug to not do the right thing when we overflow.
I'll commit this to Fedora this afternoon so it should be in the next builds.
Ok, it's been committed to rawhide, F-10 and F-10-2_6_27, after I checked that it wouldn't have any adverse effects (hopefully...) We had NR_CPUS set to 32 on i386 anyway, so it was likely just an oversight.
I'll try to see if I can sort out access to such a machine internally to look into why it doesn't gracefully handle BIGSMP being unset.
Trivial patch limiting the configuration to what is bootable sent upstream, see:
That does not preclude a better patch that prevents the boot from hanging,
but at least avoids the misconfiguration.