Bug 432979
Summary: | [RHEL5 RT] kernel-rt-debug hangs while booting | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Jeff Burke <jburke> | ||||||
Component: | realtime-kernel | Assignee: | Jon Masters <jcm> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||
Severity: | high | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 1.0 | CC: | bhu, williams | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
URL: | http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=1958249 | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | 2.6.24.7-60.el5rt | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2008-05-28 16:57:04 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Jeff Burke
2008-02-15 16:10:56 UTC
Update from Jon 02/19/08 05:23 AM So the situation with the debug kernel is: *). We know it's caused due to the serial console (8250). *). We race with ourselves to get port->lock in serial8250_console_write *). We then try to call __rt_mutex_adjust_prio to adjust the prio of the idle task, which triggers a few WARN_ONs that we can't print because they need printk in order to actually get printed to the console. Here's some helpful output for discussion purposes later. Note that I cheat to display most of the output by dynamically switching vprintk from the regular one to one that drives early_printk directly, then I switch it back again immediately so as to not interfere too much. We'll find this, but it's annoying me now. Key: - "trying to lock" - printed each time we try to lock the 8250. - priority: task->prio of each running task JCM: trying to lock: 120. jcm: rest_init: init_idle_bootup_task. jcm: rest_init: init_idle_bootup_task. jcm: rest_init: __preempt_enable_no_resched. jcm: rest_init: schedule. kernel_init: enter. jcm: going to smp_prepare_cpus. jcm: smp_boot_cpus: enter. jcm: going to call printk. JCM: RT DIAGNOSTIC. ***** CURRENT TASK ***** no locks held by swapper/1. ***** ALL TASKS ***** *****TASK*****no locks held by swapper/1. *****TASK*****no locks held by kthreadd/2. JCM: DONE DIAGNOSTIC. JCM: trying to lock: 121. JCM: SCHEDULER PROBLEM. JCM: p != rq->curr. JCM: WARN_ON: p->pi_blocked_on. jcm: rest_init: preempt_disable. jcm: rest_init: cpu_idle. jcm: cpu_idle: enter. jcm: going to display task info! JCM: RT status. 1 lock held by swapper/1: #0: (&port_lock_key){--..}, at: [<c038e8ef>] serial8250_console_write +0xea/0xf priority: 120. no locks held by kthreadd/2. priority: 115. <6> task PC stack pid father swapper D [dd84f020] 00000046 2436 1 0 dd850c9c 00000092 00000006 00000046 dd84f020 00000006 00000000 00000002 00000008 dd84f718 dd84f020 00000000 dd84f020 dd84f290 c2f5ee80 dd850000 c0577200 c0204400 dd850c58 fffb72e5 c0235b93 dd850c4c dd850c6c c0206943 Call Trace: [<c046de66>] schedule+0xef/0x10e [<c046e7f0>] rt_spin_lock_slowlock+0x100/0x194 [<c046eeaa>] __rt_spin_lock+0x49/0x4c [<c046f038>] rt_spin_lock+0x4f/0x5d [<c038e8ef>] serial8250_console_write+0xea/0x1ef [<c02309ff>] __call_console_drivers+0xb8/0xe4 [<c0230a87>] _call_console_drivers+0x5c/0x60 [<c0230c1a>] release_console_sem+0x152/0x19e [<c023113a>] normal_vprintk+0x2ba/0x309 [<c02311bd>] vprintk+0x34/0x36 [<c02311df>] printk+0x20/0x22 [<c060a34d>] native_smp_prepare_cpus+0xdc/0x606 [<c0600461>] kernel_init+0x91/0x4f5 [<c020502b>] kernel_thread_helper+0x7/0x10 ======================= --------------------------- | preempt count: 00000002 ] | 2-level deep critical section nesting: ---------------------------------------- .. [<c046d469>] .... __schedule+0x2f/0x887 .....[<c046de66>] .. ( <= schedule+0xef/0x10e) .. [<c046f777>] .... __spin_lock+0x19/0x67 .....[<c046d566>] .. ( <= __schedule+0x12c/0x887) kthreadd S [dd854040] 00000002 3496 2 0 dd855f74 00000086 00000000 00000002 c2f5ee80 00000246 00000000 2a7f32a0 00000000 dd855ef8 00000246 00000000 dd854040 dd8542b0 c2f5ee80 dd855000 c0577200 c046dcff 00000f2b 00000000 00000000 00000000 c0250154 c046dd4a Call Trace: [<c046de66>] schedule+0xef/0x10e [<c02434ca>] kthreadd+0x89/0x12e [<c020502b>] kernel_thread_helper+0x7/0x10 ======================= --------------------------- | preempt count: 00000002 ] | 2-level deep critical section nesting: ---------------------------------------- .. [<c046d469>] .... __schedule+0x2f/0x887 .....[<c046de66>] .. ( <= schedule+0xef/0x10e) .. [<c046f777>] .... __spin_lock+0x19/0x67 .....[<c046d566>] .. ( <= __schedule+0x12c/0x887) This was caused by a bug in the printk while atomic modifications to the rt_mutex code. Steven posted a fix to this yesterday, clark is applying. Created attachment 296255 [details]
fix to rt_mutex code for printk in atomic (don't really take locks if in printk)
(patch from rostedt) We're still having problems with the debug kernel variant. rostedt sent me a patch to reduce some prink spew from ftrace that may help. This will go into 2.6.24.4-49+ Clark Created attachment 304428 [details]
Patch to reduce ftrace printk output
Have we seen any improvement booting debug kernels with -49+? Clark, I have not see this specific failure in a while. I believe that the patch from Comment #4 did fix the issue. Jeff Ok, lets close this one. |