Bug 432979

Summary: [RHEL5 RT] kernel-rt-debug hangs while booting
Product: Red Hat Enterprise MRG Reporter: Jeff Burke <jburke>
Component: realtime-kernelAssignee: Jon Masters <jcm>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 1.0CC: bhu, williams
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
URL: http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=1958249
Whiteboard:
Fixed In Version: 2.6.24.7-60.el5rt Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-05-28 16:57:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
fix to rt_mutex code for printk in atomic (don't really take locks if in printk)
none
Patch to reduce ftrace printk output none

Description Jeff Burke 2008-02-15 16:10:56 UTC
Description of problem:
 Since upgrading to 2.6.24 The rt-debug variant of the kernel no longer works.

Version-Release number of selected component (if applicable):
 2.6.24.1-24rt

How reproducible:
 Always

Steps to Reproduce:
1. Just try and boot a rt-debug kernel
  
Actual results:
Initializing cgroup subsys cpuset
Linux version 2.6.24.1-24.el5rtdebug (brewbuilder.redhat.com)
(gcc version 4.1.1 20070105 (Red Hat 4.1.1-52)) #1 SMP PREEMPT RT Mon Feb 11
17:37:33 EST 2008
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009d000 (usable)
 BIOS-e820: 000000000009d000 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000007ffa4980 (usable)
 BIOS-e820: 000000007ffa4980 - 000000007ffa7800 (ACPI data)
 BIOS-e820: 000000007ffa7800 - 0000000080000000 (reserved)
 BIOS-e820: 00000000f4000000 - 00000000fc000000 (reserved)
 BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
1151MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 0009d140
NX (Execute Disable) protection: active
Zone PFN ranges:
  DMA             0 ->     4096
  Normal       4096 ->   229376
  HighMem    229376 ->   524196
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
    0:        0 ->   524196
DMI 2.3 present.
ACPI: RSDP 000FDFC0, 0014 (r0 IBM   )
ACPI: RSDT 7FFA7780, 0030 (r1 IBM    SERLEWIS     1000 IBM  45444F43)
ACPI: FACP 7FFA76C0, 0084 (r2 IBM    SERLEWIS     1000 IBM  45444F43)
ACPI: DSDT 7FFA4980, 2B02 (r1 IBM    SERLEWIS     1000 INTL  2002025)
ACPI: FACS 7FFA7500, 0040
ACPI: APIC 7FFA7600, 0090 (r1 IBM    SERLEWIS     1000 IBM  45444F43)
ACPI: SRAT 7FFA7540, 00C0 (r1 AMD    HAMMER          1 AMD         1)
ACPI: PM-Timer IO Port: 0x488
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 15:1 APIC version 16
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x02] enabled)
Processor #2 15:1 APIC version 16
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1 15:1 APIC version 16
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
Processor #3 15:1 APIC version 16
ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x03] dfl dfl lint[0x1])
ACPI: IOAPIC (id[0x0e] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 14, version 17, address 0xfec00000, GSI 0-15
ACPI: IOAPIC (id[0x0d] address[0xfec02000] gsi_base[16])
IOAPIC[1]: apic_id 13, version 17, address 0xfec02000, GSI 16-31
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
Enabling APIC mode:  Flat.  Using 2 I/O APICs
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 88000000 (gap: 80000000:74000000)
swsusp: Registered nosave memory region: 000000000009d000 - 00000000000a0000
swsusp: Registered nosave memory region: 00000000000a0000 - 00000000000e0000
swsusp: Registered nosave memory region: 00000000000e0000 - 0000000000100000
Real-Time Preemption Support (C) 2004-2007 Ingo Molnar
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 508327
Kernel command line: ro root=/dev/VolGroup00/LogVol00 console=ttyS1,19200
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
WARNING: experimental RCU implementation.
CPU 0 irqstacks, hard=c078c000 soft=c068d000
PID hash table entries: 4096 (order: 12, 16384 bytes)
Detected 1800.108 MHz processor.
Console: colour VGA+ 80x25
console [ttyS1] enabled
num_possible_cpus(): 4
CPU#0: allocated 2097152 bytes trace buffer.
CPU#0: allocated 2097152 bytes max-trace buffer.
CPU#1: allocated 2097152 bytes trace buffer.
CPU#1: allocated 2097152 bytes max-trace buffer.
CPU#2: allocated 2097152 bytes trace buffer.
CPU#2: allocated 2097152 bytes max-trace buffer.
CPU#3: allocated 2097152 bytes trace buffer.
CPU#3: allocated 2097152 bytes max-trace buffer.
allocated 8388608 bytes out-trace buffer.
tracer: a total of 25165824 bytes allocated.
Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
... MAX_LOCKDEP_SUBCLASSES:      8
... MAX_LOCK_DEPTH:             30
... MAX_LOCKDEP_KEYS:         2048
... CLASSHASH_SIZE:           1024
... MAX_LOCKDEP_ENTRIES:     16384
... MAX_LOCKDEP_CHAINS:      32768
... CHAINHASH_SIZE:          16384
 memory used by lock dependency info: 1600 kB
 per task-struct memory footprint: 1680 bytes
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 1991564k/2096784k available (2511k kernel code, 103936k reserved, 1557k
data, 556k init, 1179280k highmem)
virtual kernel memory layout:
    fixmap  : 0xfe4d0000 - 0xfffff000   (27836 kB)
    pkmap   : 0xfe200000 - 0xfe400000   (2048 kB)
    vmalloc : 0xf8800000 - 0xfe1fe000   (  89 MB)
    lowmem  : 0xc0000000 - 0xf8000000   ( 896 MB)
      .init : 0xc05ff000 - 0xc068a000   ( 556 kB)
      .data : 0xc0473d16 - 0xc05f9284   (1557 kB)
      .text : 0xc0200000 - 0xc0473d16   (2511 kB)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 3602.89 BogoMIPS (lpj=1801449)
Security Framework initialized
SELinux:  Initializing.
selinux_register_security:  Registering secondary module capability
Capability LSM initialized as secondary
Mount-cache hash table entries: 512
Initializing cgroup subsys debug
Initializing cgroup subsys ns
Initializing cgroup subsys cpuacct
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 0(2) -> Core 0
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
Compat vDSO mapped to ffffe000.
Checking 'hlt' instruction... OK.
lockdep: not fixing up alternatives.
ACPI: Core revision 20070126

Expected results:
 Should boot

Additional info:
jcm is currently working on this issue.
This may or may not be related:
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=1958247
All CPUs are in the idle loop??

Comment 1 Jeff Burke 2008-02-20 15:20:10 UTC
Update from Jon 02/19/08 05:23 AM

So the situation with the debug kernel is:

*). We know it's caused due to the serial console (8250).
*). We race with ourselves to get port->lock in serial8250_console_write
*). We then try to call __rt_mutex_adjust_prio to adjust the prio of the
idle task, which triggers a few WARN_ONs that we can't print because
they need printk in order to actually get printed to the console.

Here's some helpful output for discussion purposes later. Note that I
cheat to display most of the output by dynamically switching vprintk
from the regular one to one that drives early_printk directly, then I
switch it back again immediately so as to not interfere too much.

We'll find this, but it's annoying me now.

Key:

	- "trying to lock" - printed each time we try to lock the 8250.
	- priority: task->prio of each running task

JCM: trying to lock: 120.
jcm: rest_init: init_idle_bootup_task.
jcm: rest_init: init_idle_bootup_task.
jcm: rest_init: __preempt_enable_no_resched.
jcm: rest_init: schedule.
kernel_init: enter.
jcm: going to smp_prepare_cpus.
jcm: smp_boot_cpus: enter.
jcm: going to call printk.
JCM: RT DIAGNOSTIC.
***** CURRENT TASK *****
no locks held by swapper/1.
***** ALL TASKS *****
*****TASK*****no locks held by swapper/1.
*****TASK*****no locks held by kthreadd/2.
JCM: DONE DIAGNOSTIC.
JCM: trying to lock: 121.
JCM: SCHEDULER PROBLEM.
JCM: p != rq->curr.
JCM: WARN_ON: p->pi_blocked_on.
jcm: rest_init: preempt_disable.
jcm: rest_init: cpu_idle.
jcm: cpu_idle: enter.
jcm: going to display task info!
JCM: RT status.
1 lock held by swapper/1:
 #0:  (&port_lock_key){--..}, at: [<c038e8ef>] serial8250_console_write
+0xea/0xf
priority: 120.
no locks held by kthreadd/2.
priority: 115.
<6>  task                PC stack   pid father
swapper       D [dd84f020] 00000046  2436     1      0
       dd850c9c 00000092 00000006 00000046 dd84f020 00000006 00000000
00000002
       00000008 dd84f718 dd84f020 00000000 dd84f020 dd84f290 c2f5ee80
dd850000
       c0577200 c0204400 dd850c58 fffb72e5 c0235b93 dd850c4c dd850c6c
c0206943
Call Trace:
 [<c046de66>] schedule+0xef/0x10e
 [<c046e7f0>] rt_spin_lock_slowlock+0x100/0x194
 [<c046eeaa>] __rt_spin_lock+0x49/0x4c
 [<c046f038>] rt_spin_lock+0x4f/0x5d
 [<c038e8ef>] serial8250_console_write+0xea/0x1ef
 [<c02309ff>] __call_console_drivers+0xb8/0xe4
 [<c0230a87>] _call_console_drivers+0x5c/0x60
 [<c0230c1a>] release_console_sem+0x152/0x19e
 [<c023113a>] normal_vprintk+0x2ba/0x309
 [<c02311bd>] vprintk+0x34/0x36
 [<c02311df>] printk+0x20/0x22
 [<c060a34d>] native_smp_prepare_cpus+0xdc/0x606
 [<c0600461>] kernel_init+0x91/0x4f5
 [<c020502b>] kernel_thread_helper+0x7/0x10
 =======================
---------------------------
| preempt count: 00000002 ]
| 2-level deep critical section nesting:
----------------------------------------
.. [<c046d469>] .... __schedule+0x2f/0x887
.....[<c046de66>] ..   ( <= schedule+0xef/0x10e)
.. [<c046f777>] .... __spin_lock+0x19/0x67
.....[<c046d566>] ..   ( <= __schedule+0x12c/0x887)

kthreadd      S [dd854040] 00000002  3496     2      0
       dd855f74 00000086 00000000 00000002 c2f5ee80 00000246 00000000
2a7f32a0
       00000000 dd855ef8 00000246 00000000 dd854040 dd8542b0 c2f5ee80
dd855000
       c0577200 c046dcff 00000f2b 00000000 00000000 00000000 c0250154
c046dd4a
Call Trace:
 [<c046de66>] schedule+0xef/0x10e
 [<c02434ca>] kthreadd+0x89/0x12e
 [<c020502b>] kernel_thread_helper+0x7/0x10
 =======================
---------------------------
| preempt count: 00000002 ]
| 2-level deep critical section nesting:
----------------------------------------
.. [<c046d469>] .... __schedule+0x2f/0x887
.....[<c046de66>] ..   ( <= schedule+0xef/0x10e)
.. [<c046f777>] .... __spin_lock+0x19/0x67
.....[<c046d566>] ..   ( <= __schedule+0x12c/0x887)


Comment 2 Jon Masters 2008-02-28 20:30:47 UTC
This was caused by a bug in the printk while atomic modifications to the
rt_mutex code. Steven posted a fix to this yesterday, clark is applying.

Comment 3 Jon Masters 2008-02-28 20:33:54 UTC
Created attachment 296255 [details]
fix to rt_mutex code for printk in atomic (don't really take locks if in printk)

Comment 4 Jon Masters 2008-02-28 20:34:21 UTC
(patch from rostedt)

Comment 5 Clark Williams 2008-05-02 22:08:16 UTC
We're still having problems with the debug kernel variant. rostedt sent me a
patch to reduce some prink spew from ftrace that may help. This will go into
2.6.24.4-49+

Clark

Comment 6 Clark Williams 2008-05-02 22:09:49 UTC
Created attachment 304428 [details]
Patch to reduce ftrace printk output

Comment 7 Clark Williams 2008-05-27 22:06:45 UTC
Have we seen any improvement booting debug kernels with -49+?

Comment 8 Jeff Burke 2008-05-28 16:06:43 UTC
Clark,
  I have not see this specific failure in a while. I believe that the patch from
Comment #4 did fix the issue.

Jeff

Comment 9 Clark Williams 2008-05-28 16:57:04 UTC
Ok, lets close this one.