Bug 239766 - [QC] kernel fails to boot on LS41 with maxcpus=1
[QC] kernel fails to boot on LS41 with maxcpus=1
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel (Show other bugs)
All Linux
medium Severity urgent
: ---
: ---
Assigned To: Steven Rostedt
Depends On:
  Show dependency treegraph
Reported: 2007-05-11 03:09 EDT by IBM Bug Proxy
Modified: 2008-02-27 14:57 EST (History)
0 users

See Also:
Fixed In Version: 2.6.21-14.el5rt
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2007-05-31 18:42:00 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
maxcpus-ignore-offline-cpus.patch (4.93 KB, text/plain)
2007-05-21 12:50 EDT, IBM Bug Proxy
no flags Details

External Trackers
Tracker ID Priority Status Summary Last Updated
IBM Linux Technology Center 34431 None None None Never

  None (edit)
Description IBM Bug Proxy 2007-05-11 03:09:46 EDT
LTC Owner is: dvhltc@us.ibm.com
LTC Originator is: dvhltc@us.ibm.com

Reported by perf team, needs to be validated and possibly fixed.  This does not
seem to be a problem on an x460, as reported by the BULL team.


I was able to boot with maxcpus=1 on elm3b102 (LS41):

dvhart@elm3b102:~$ uname -a
Linux elm3b102.beaverton.ibm.com 2.6.16-rtj12.11.3smp #1 SMP PREEMPT Tue Apr 24
14:08:21 PDT 2007 i686 athlon i386 GNU/Linux

dvhart@elm3b102:~$ cat /proc/cmdline 
ro root=LABEL=/ console=tty0 console=ttyS1,19200 crashkernel=64M@16M maxcpus=1

dvhart@elm3b102:~$ cat /proc/cpuinfo 
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 65
model name      : Dual-Core AMD Opteron(tm) Processor 8212
stepping        : 2
cpu MHz         : 2000.276
cache size      : 1024 KB
physical id     : 0
siblings        : 1
core id         : 0
cpu cores       : 1
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext
3dnow pni cx16 lahf_lm cmp_legacy svm cr8legacy ts fid vid ttp tm stc
bogomips        : 4002.97

I'll try to get Mark Peloquin to describe the approach he took that failed, and
how it failed.


maxcpus=2 also works



Setting maxcpus=1 on elm3b210 causes the system to hang on boot with the last
message being:

pci_hotplug: PCI Hot Plug PCI Core Version: 0.5

The entry as seen in Grub on boot is:

kernel /boot/vmlinuz-2.6.20-0119.rt8 ro root=LABEL=/ console=tty0,
console=ttyS1,19200 maxcpus=1

This appears to be quite a different kernel, the uname -a output is:

Linux elm3b210.beaverton.ibm.com 2.6.20-0119.rt8 #1 SMP PREEMPT Thu Feb 15
15:53:15 CET 2007 x86_64 x86_64 x86_64 GNU/Linux

It looks like the tests done by Darren were on a x86 2.6.16 base while my system
has a x86_64 2.6.20 base.


confirmed on rhel5-rt 2.6.20-0119.rt8, trying with 2.6.21-rt now.


2.6.21-2.el5rt fails in the same place, trying stock rhel5 kernel.


maxcpus=1 works on stock RHEL5 (2.6.18-8.el5).  This limitation with the -rt
kernels is blocking -rt scalability analysis.


Also reproduced on an LS21. Adding initcall_debug to the boot line got the
following extra output:

pci_hotplug: PCI Hot Plug PCI Core version: 0.5
initcall 0xffffffff817a42a4: pci_hotplug_init+0x0/0x5a() returned 0.
initcall 0xffffffff817a42a4 ran for 24 msecs: pci_hotplug_init+0x0/0x5a()
Calling initcall 0xffffffff817a46f2: fb_console_init+0x0/0x12b()
initcall 0xffffffff817a46f2: fb_console_init+0x0/0x12b() returned 0.
initcall 0xffffffff817a46f2 ran for 0 msecs: fb_console_init+0x0/0x12b()
Calling initcall 0xffffffff817a4c37: acpi_reserve_resources+0x0/0xeb()
initcall 0xffffffff817a4c37: acpi_reserve_resources+0x0/0xeb() returned 0.
initcall 0xffffffff817a4c37 ran for 0 msecs: acpi_reserve_resources+0x0/0xeb()
Calling initcall 0xffffffff817a5abd: acpi_fan_init+0x0/0x5e()
initcall 0xffffffff817a5abd: acpi_fan_init+0x0/0x5e() returned 0.
initcall 0xffffffff817a5abd ran for 0 msecs: acpi_fan_init+0x0/0x5e()
Calling initcall 0xffffffff817a5bf8: irqrouter_init_sysfs+0x0/0x38()
initcall 0xffffffff817a5bf8: irqrouter_init_sysfs+0x0/0x38() returned 0.
initcall 0xffffffff817a5bf8 ran for 0 msecs: irqrouter_init_sysfs+0x0/0x38()
Calling initcall 0xffffffff817a5d8f: acpi_processor_init+0x0/0xdf()

Comment 1 IBM Bug Proxy 2007-05-19 03:15:34 EDT
----- Additional Comments From dvhltc@us.ibm.com  2007-05-19 03:12 EDT -------
I tested mainline 2.6.21 and it does boot with maxcpus=1.  2.6.21-rt1, 2, and 4
all hang at:

Calling initcall 0xffffffff817a41df: acpi_processor_init+0x0/0xdf()

when booted with initcall_debug.  I traced this only as far as the call to
acpi_bus_register_driver.  So this was definitely introduced by the -rt patch. 
I'm not sure if I should try and see when it was introduced (as 2.6.16-rt22 does
not fail) or if I should head "down the acpi rabbit hole" as John S. put it... 
Comment 2 IBM Bug Proxy 2007-05-21 12:50:46 EDT
Created attachment 155114 [details]
Comment 3 IBM Bug Proxy 2007-05-21 12:50:50 EDT
----- Additional Comments From dvhltc@us.ibm.com  2007-05-21 12:43 EDT -------
Ignore bogus acpi info

Thomas Gleixner provided the attached patch.  When I first booted with this
patch I received the following in a loop:

irq 9: nobody cared (try booting with the "irqpoll" option)

Call Trace:
 [<ffffffff8106d5a4>] dump_trace+0xaa/0x32a
 [<ffffffff8106d865>] show_trace+0x41/0x5c
 [<ffffffff8106d895>] dump_stack+0x15/0x17
 [<ffffffff810c50b8>] __report_bad_irq+0x38/0x87
 [<ffffffff810c52cb>] note_interrupt+0x1c4/0x1fc
 [<ffffffff810c458d>] thread_simple_irq+0x6c/0x7e
 [<ffffffff810c4dc3>] do_irqd+0x14a/0x3e4
 [<ffffffff81033d3a>] kthread+0xf5/0x128
 [<ffffffff8105ff68>] child_rip+0xa/0x12

[<ffffffff8117736e>] (acpi_irq+0x0/0x1b)

I then tried to boot with acpi=noirq and I got all the way to a login prompt. 
As we have seen this "nobody cared" and child_rip dump issues before - I think
these are independent issues that should be tracked in their own bugs. 
Comment 4 IBM Bug Proxy 2007-05-21 13:05:49 EDT
----- Additional Comments From dvhltc@us.ibm.com  2007-05-21 12:58 EDT -------
Ingo has included tglx's patch in 2.6.21-rt5 
Comment 6 IBM Bug Proxy 2007-05-24 13:10:48 EDT

           What    |Removed                     |Added
             Status|ASSIGNED                    |FIXEDAWAITINGTEST
         Resolution|                            |FIX_ALREADY_AVAIL

------- Additional Comments From jstultz@us.ibm.com (prefers email at johnstul@us.ibm.com)  2007-05-24 13:04 EDT -------
Verified fixed in 2.6.21-14.el5rt. 

Note You need to log in before you can comment on or make changes to this bug.