Bug 1051284 - Unable to boot more than 16 CPUs on SGI Altix UV1000
Summary: Unable to boot more than 16 CPUs on SGI Altix UV1000
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 22
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Prarit Bhargava
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-01-10 00:52 UTC by John Paul Adrian Glaubitz
Modified: 2015-03-25 12:38 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-03-25 12:38:24 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description John Paul Adrian Glaubitz 2014-01-10 00:52:33 UTC
Hi!

The physics department of my university has inherited an SGI Altix UV1000 server which utilitizes an external NUMA bus called "NUMA Link" to interconnect 32 blades into one machine with 64 physical CPUs (Intel Xeon X7560), with 8 cores per CPU and Hyper-Threading totalling to 1024 logical CPUs and a total of 2 TiB shared NUMA memory.
 
The machine originally ships with either SuSE Linux Enterprise Server or Red Hat Enterprise Linux. Our machine was originally installed with SLES11SP3 and ran SuSE's kernel 3.0.76. We would like to install a non-commercial Linux like Debian or Fedora, for example, since we don't have a support contract and usually run Debian stable.

Unfortunately, getting a machine with so many CPUs to boot isn't trivial. It takes very long for the kernel to boot and there are many pitfalls and it took me quite some time to get beyond the boot loader.

None of the current Linux distributions I tried so far booted the machine with NUMA Link enabled out of the box. They all had in common that they crashed right after GRUB, specifically at the message "Booting the kernel" message with earlyprintk enabled. Disabling NUMA Link results in the machine booting with only two physical CPUs (one blade) and 64 GiB RAM which showed no problems with any Linux distribution tested.

Now, since SLES11SP3 works fine on this machine, I gave the reinstallation a shot and to my surprise, SLES11SP3 works fine with NUMA Link enabled.

Out of curiosity, I replaced the SLES-shipped kernel, 3.0.76, with a current kernel taken from Debian unstable (Linux 3.12-1-amd64) and Fedora Rawhide (3.13.0-0.rc7.git2.1.fc21.x86_64) and it turns out, these kernels actually both boot on the UV1000 with NUMA Link enabled and therefore all 1024 logical CPUs and 2 TiB RAM. Since SLES uses ELILO instead of GRUB, I suppose there is a problem with GRUB booting on such machines, but I will try to file a different bug report on GRUB once I know more.

Anyway, unfortunately, both Debian's 3.12 and Rawhide's fail to boot more than 16 logical CPUs and I have no idea why:
[   17.314880] smpboot: CPU0: Intel(R) Xeon(R) CPU           X7560  @ 2.27GHz (fam: 06, model: 2e, stepping: 06)
[   17.435458] Performance Events: PEBS fmt1+, 16-deep LBR, Nehalem events, Intel PMU driver.
[   17.445789] perf_event_intel: CPU erratum AAJ80 worked around
[   17.452416] perf_event_intel: CPUID marked event: 'bus cycles' unavailable
[   17.460320] ... version:                3
[   17.465008] ... bit width:              48
[   17.469791] ... generic registers:      4
[   17.474479] ... value mask:             0000ffffffffffff
[   17.480622] ... max period:             000000007fffffff
[   17.486754] ... fixed-purpose events:   3
[   17.491440] ... event mask:             000000070000000f
[   17.968186] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
[   18.071663] x86: Booting SMP configuration:
[   18.076554] .... node   #0, CPUs:            #1    #2    #3    #4    #5    #6    #7
[   18.257836] .... node   #1, CPUs:      #8    #9   #10   #11   #12   #13   #14   #15
[   18.532809] .... node   #2, CPUs:     #16
[   23.611007] smpboot: CPU16: Not responding
[   23.618018]    #17
[   28.693540] smpboot: CPU17: Not responding
[   28.700498]    #18
[   33.776001] smpboot: CPU18: Not responding
[   33.782959]    #19

And so on. This problem does not occur with the 3.0.76 kernel shipped by Novell in SLES11SP3.

Since Linux 3.13 just received some improvements for such large NUMA systems lately, I thought it would be very interesting to have such a machine ready at hand and help test the code.

Current kernel command line is:

Kernel command line: BOOT_IMAGE=scsi0:\efi\SuSE\vmlinuz-3.13.0-0.rc7.git2.1.fc21.x86_64 root=/dev/disk/by-id/scsi-3600508e00000000050f581bcee87700b-part2  resume=/dev/disk/by-id/scsi-3600508e00000000050f581bcee87700b-part3 splash=silent crashkernel=512M showopts earlyprintk=ttyS0,115200n8 console=ttyS0,115200n8  add_efi_memmap

Are there other options that should be added that could help? I have seen a long list specified by RedHat [1]. However, since SuSE's kernel boots just fine without most of these, I don't think this is the issue here (Even though the original configuration by SGI had most of these enabled).

The machine is still booting, so I can't upload any log files yet. I will attach these later once the machine has finished booting (it's currently trying to bring up CPU #366 out of 1024).

Adrian

> [1] https://access.redhat.com/site/articles/42548

Comment 1 Josh Boyer 2014-01-10 01:02:20 UTC
Being perfectly honest, you are likely better off either interacting directly with upstream on this, or by trying the RHEL7 Beta that was just released and seeing how that fairs.  The Fedora kernel team don't have much experience with huge iron, and we don't really have access to such hardware with enough time to really debug.

The one thing I will note is that rawhide kernels have a large number of debug options enabled.  That can really slow down a lot of things.  You might want to try kernel-3.13.0-0.rc7.git0.1.fc21 as that one does not have the debug options enabled.

Comment 2 John Paul Adrian Glaubitz 2014-01-10 01:09:20 UTC
Thanks for your answer, Josh!

(In reply to Josh Boyer from comment #1)
> Being perfectly honest, you are likely better off either interacting
> directly with upstream on this

My idea was to report it to Fedora since Fedora is very close to upstream or upstream itself in many cases.

I just found out that there is actually a bugtracker now a kernel.org, so I'll move my bug report there!

> or by trying the RHEL7 Beta that was just
> released and seeing how that fairs.

I have tried the RHEL7 beta but without luck yet. Trying the installation media directly with NUMA Link enabled fails. However, I didn't do a full install which is normally performed in single-node mode, so it might work after rebooting the installed system with NUMA Link enabled.

> The Fedora kernel team don't have much
> experience with huge iron, and we don't really have access to such hardware
> with enough time to really debug.

I would guess so.

> The one thing I will note is that rawhide kernels have a large number of
> debug options enabled.  That can really slow down a lot of things.  You
> might want to try kernel-3.13.0-0.rc7.git0.1.fc21 as that one does not have
> the debug options enabled.

Thanks, very good to know.

I will let this bug report open for a short while and eventually move it to kernel.org. Didn't know this existed, so sorry for getting it here where it doesn't belong.

Adrian

Comment 3 Josh Boyer 2014-01-10 12:41:03 UTC
No apologies necessary.  This bug can stay open as long as the issue is present as it is something that clearly impacts Fedora.  I just wanted to set proper expectations on it rather than let it sit here without comment.  Thanks.

Comment 4 Prarit Bhargava 2014-01-10 12:51:37 UTC
Adrian, it might be interesting to use the CMC to disable all the sockets except socket 0 & socket 1 and reboot (so you only get 16 cpus) to see if that works.  I also know that there is a lot of SGI UV specific code in the kernel but I'm not 100% if it is all enabled in Fedora.  I'll take a look and get back to you.

Also, are you sure you're using the 64-bit version of Fedora?  I'd hate to find out you were booting the 32-bit version ;)

P.

Comment 5 Prarit Bhargava 2014-01-10 13:02:57 UTC
(In reply to Prarit Bhargava from comment #4)
> Adrian, it might be interesting to use the CMC to disable all the sockets
> except socket 0 & socket 1 and reboot (so you only get 16 cpus) to see if
> that works.  I also know that there is a lot of SGI UV specific code in the
> kernel but I'm not 100% if it is all enabled in Fedora.  I'll take a look
> and get back to you.

The current Fedora .config has

# CONFIG_X86_UV is not set

... that's why the system does not properly boot.

Josh, I have no objection and I cannot see any issues with enabling this in Fedora FWIW.  I'll post a patch on fedora kernel devel list and we can go from there.

Unfortunately Adrian, this means that F20 does not support UV :(.

P.

Comment 6 John Paul Adrian Glaubitz 2014-01-10 13:11:41 UTC
(In reply to Prarit Bhargava from comment #5)
> The current Fedora .config has
> 
> # CONFIG_X86_UV is not set
> 
> ... that's why the system does not properly boot.

Jepp, that's what I figured out yesterday as well while debugging the smpboot code in the kernel, but I was way too tired to leave a comment here afterwards.

> Josh, I have no objection and I cannot see any issues with enabling this in
> Fedora FWIW.  I'll post a patch on fedora kernel devel list and we can go
> from there.

Oh, it's not so pressing. I don't think there are many people around who have such a machine standing in their server rooms that would justify enabling it by default.

> Unfortunately Adrian, this means that F20 does not support UV :(.

That's not a problem at all. By filing this bug report, I was inspired to have a closer look at the sources myself and figured out that this particular option is not enabled in most distribution kernels.

Adrian

Comment 7 Prarit Bhargava 2014-01-10 13:19:54 UTC
(In reply to John Paul Adrian Glaubitz from comment #6)
>
> Oh, it's not so pressing. I don't think there are many people around who
> have such a machine standing in their server rooms that would justify
> enabling it by default.
> 

Can't hurt anything to turn it on ;).  I just posted a small patch upstream to enable it in the x86_64 config.

P.

Comment 8 Josh Boyer 2014-01-10 13:53:55 UTC
(In reply to Prarit Bhargava from comment #7)
> (In reply to John Paul Adrian Glaubitz from comment #6)
> >
> > Oh, it's not so pressing. I don't think there are many people around who
> > have such a machine standing in their server rooms that would justify
> > enabling it by default.
> > 
> 
> Can't hurt anything to turn it on ;).  I just posted a small patch upstream
> to enable it in the x86_64 config.

It probably won't hurt, but will it really help?  John said RHEL7 failed, and these options are enabled there.

Comment 9 John Paul Adrian Glaubitz 2014-01-10 14:03:20 UTC
(In reply to Josh Boyer from comment #8)
> It probably won't hurt, but will it really help?  John said RHEL7 failed,
> and these options are enabled there.

Yes, but I blame it on myself not correctly following the installation  instructions.

I didn't do a full RHEL install, instead, I just tried booting the boot ISO with NUMA Link enabled which is not how it's supposed to be done.

SGI's knowledge base explicitly states to disable NUMA Link during installation and re-enable it once everything is set up.

My guess is that the bootloader on the installation media doesn't support the UV out of the box. Isn't SYSLINUX used on these instead of GRUB (I didn't pay attention to that)?

I'm pretty sure that RHEL runs on the UV as it's been certified by SGI (at least for the 6.x versions).

Thanks for all your input so far. Thinking out loud in public helps ;).

Adrian

Comment 10 Josh Boyer 2014-01-10 14:05:54 UTC
Ok, thanks for explaining.  I'll look at enabling the options in rawhide later today.

Comment 11 John Paul Adrian Glaubitz 2014-01-10 19:59:41 UTC
Hi Josh!

(In reply to Josh Boyer from comment #10)
> Ok, thanks for explaining.  I'll look at enabling the options in rawhide
> later today.

I saw your commit, thanks!

However, I don't think it's likely there will be any subtantial amount of users for this. These systems cost several millions and usually come with a support contract for the hardware and the software installed which is usually an enterpise distribution like RHEL or SLES.

I only know one other department which is also located at my university who have such a machine and they are actually planning to deploy CentOS which has a UV-enabled kernel as well being the free fork of RHEL.

The machine is now up and running, using the SLES kernel and Debian Wheezy, but I might be doing some tests with Fedora in the future and report back since I think there are still issues with GRUB, I'm currently using ELILO.

In any case, if I stumble accross other bugs, I'll file them. For anyone curios, htop actually isn't really suited for so many cores [1] ;).

Adrian

> [1] http://i.imgur.com/LPcHUdS.png

Comment 12 Jaroslav Reznik 2015-03-03 15:22:53 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 22 development cycle.
Changing version to '22'.

More information and reason for this action is here:
https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora22


Note You need to log in before you can comment on or make changes to this bug.