Bug 236375
Summary: | [RHEL5 RT] BUG: scheduling with irqs disabled: | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Jeff Burke <jburke> | ||||
Component: | realtime-kernel | Assignee: | Steven Rostedt <srostedt> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 1.0 | ||||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
URL: | http://rhts.lab.boston.redhat.com/cgi-bin/rhts/test_log.cgi?id=1240940 | ||||||
Whiteboard: | |||||||
Fixed In Version: | 2.6.21-31.el5rt | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2007-06-20 16:04:22 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Jeff Burke
2007-04-13 14:38:25 UTC
This also happens on the 2.6.21-6.el5rt kernel <May/17 02:03 pm>BUG: scheduling with irqs disabled: crash02/0x00000000/18587 <May/17 02:03 pm>caller is rt_spin_lock_slowlock+0xfe/0x1a1 <May/17 02:03 pm>Call Trace: <May/17 02:03 pm> [<ffffffff8106d59a>] dump_trace+0xaa/0x32a <May/17 02:03 pm> [<ffffffff8106d85b>] show_trace+0x41/0x5c <May/17 02:03 pm> [<ffffffff8106d88b>] dump_stack+0x15/0x17 <May/17 02:03 pm> [<ffffffff8106468c>] schedule+0x82/0x102 <May/17 02:03 pm> [<ffffffff8106566a>] rt_spin_lock_slowlock+0xfe/0x1a1 <May/17 02:03 pm> [<ffffffff81065f46>] rt_spin_lock+0x1f/0x21 <May/17 02:03 pm> [<ffffffff8109aeb2>] force_sig_info+0x28/0xb4 <May/17 02:03 pm> [<ffffffff81068c5d>] do_page_fault+0x801/0x813 <May/17 02:03 pm> [<ffffffff81066a9d>] error_exit+0x0/0x84 <May/17 02:03 pm> [<0000000000401a87>] <May/17 02:05 pm>BUG: scheduling with irqs disabled: crash02/0x00000000/18590 <May/17 02:05 pm>caller is rt_spin_lock_slowlock+0xfe/0x1a1 <May/17 02:05 pm>Call Trace: <May/17 02:05 pm> [<ffffffff8106d59a>] dump_trace+0xaa/0x32a <May/17 02:05 pm> [<ffffffff8106d85b>] show_trace+0x41/0x5c <May/17 02:05 pm> [<ffffffff8106d88b>] dump_stack+0x15/0x17 <May/17 02:05 pm> [<ffffffff8106468c>] schedule+0x82/0x102 <May/17 02:05 pm> [<ffffffff8106566a>] rt_spin_lock_slowlock+0xfe/0x1a1 <May/17 02:05 pm> [<ffffffff81065f46>] rt_spin_lock+0x1f/0x21 <May/17 02:05 pm> [<ffffffff8109aeb2>] force_sig_info+0x28/0xb4 <May/17 02:05 pm> [<ffffffff81068c5d>] do_page_fault+0x801/0x813 <May/17 02:05 pm> [<ffffffff81066a9d>] error_exit+0x0/0x84 <May/17 02:05 pm> [<0000000000401a87>] <May/17 02:07 pm>BUG: scheduling with irqs disabled: crash02/0x00000000/18589 <May/17 02:07 pm>caller is rt_spin_lock_slowlock+0xfe/0x1a1 <May/17 02:07 pm>Call Trace: <May/17 02:07 pm> [<ffffffff8106d59a>] dump_trace+0xaa/0x32a <May/17 02:07 pm> [<ffffffff8106d85b>] show_trace+0x41/0x5c <May/17 02:07 pm> [<ffffffff8106d88b>] dump_stack+0x15/0x17 <May/17 02:07 pm> [<ffffffff8106468c>] schedule+0x82/0x102 <May/17 02:07 pm> [<ffffffff8106566a>] rt_spin_lock_slowlock+0xfe/0x1a1 <May/17 02:07 pm> [<ffffffff81065f46>] rt_spin_lock+0x1f/0x21 <May/17 02:07 pm> [<ffffffff8109aeb2>] force_sig_info+0x28/0xb4 <May/17 02:07 pm> [<ffffffff81068c5d>] do_page_fault+0x801/0x813 <May/17 02:07 pm> [<ffffffff81066a9d>] error_exit+0x0/0x84 <May/17 02:07 pm> [<0000000000401a87>] <May/17 02:07 pm>BUG: scheduling with irqs disabled: crash02/0x00000000/18586 <May/17 02:07 pm>caller is rt_spin_lock_slowlock+0xfe/0x1a1 <May/17 02:07 pm>Call Trace: <May/17 02:07 pm> [<ffffffff8106d59a>] dump_trace+0xaa/0x32a <May/17 02:07 pm> [<ffffffff8106d85b>] show_trace+0x41/0x5c <May/17 02:07 pm> [<ffffffff8106d88b>] dump_stack+0x15/0x17 <May/17 02:07 pm> [<ffffffff8106468c>] schedule+0x82/0x102 <May/17 02:07 pm> [<ffffffff8106566a>] rt_spin_lock_slowlock+0xfe/0x1a1 <May/17 02:07 pm> [<ffffffff81065f46>] rt_spin_lock+0x1f/0x21 <May/17 02:07 pm> [<ffffffff8109aeb2>] force_sig_info+0x28/0xb4 <May/17 02:07 pm> [<ffffffff81068c5d>] do_page_fault+0x801/0x813 <May/17 02:07 pm> [<ffffffff81066a9d>] error_exit+0x0/0x84 <May/17 02:07 pm> [<0000000000401a87>] <May/17 02:08 pm>BUG: scheduling with irqs disabled: crash02/0x00000000/18587 <May/17 02:08 pm>caller is rt_spin_lock_slowlock+0xfe/0x1a1 <May/17 02:08 pm>Call Trace: <May/17 02:08 pm> [<ffffffff8106d59a>] dump_trace+0xaa/0x32a <May/17 02:08 pm> [<ffffffff8106d85b>] show_trace+0x41/0x5c <May/17 02:08 pm> [<ffffffff8106d88b>] dump_stack+0x15/0x17 <May/17 02:08 pm> [<ffffffff8106468c>] schedule+0x82/0x102 <May/17 02:08 pm> [<ffffffff8106566a>] rt_spin_lock_slowlock+0xfe/0x1a1 <May/17 02:08 pm> [<ffffffff81065f46>] rt_spin_lock+0x1f/0x21 <May/17 02:08 pm> [<ffffffff8109aeb2>] force_sig_info+0x28/0xb4 <May/17 02:08 pm> [<ffffffff81068c5d>] do_page_fault+0x801/0x813 <May/17 02:08 pm> [<ffffffff81066a9d>] error_exit+0x0/0x84 <May/17 02:08 pm> [<0000000000401a87>] <May/17 02:11 pm>BUG: scheduling with irqs disabled: crash02/0x00000000/18589 <May/17 02:11 pm>caller is rt_spin_lock_slowlock+0xfe/0x1a1 <May/17 02:11 pm>Call Trace: <May/17 02:11 pm> [<ffffffff8106d59a>] dump_trace+0xaa/0x32a <May/17 02:11 pm> [<ffffffff8106d85b>] show_trace+0x41/0x5c <May/17 02:11 pm> [<ffffffff8106d88b>] dump_stack+0x15/0x17 <May/17 02:11 pm> [<ffffffff8106468c>] schedule+0x82/0x102 <May/17 02:11 pm> [<ffffffff8106566a>] rt_spin_lock_slowlock+0xfe/0x1a1 <May/17 02:11 pm> [<ffffffff81065f46>] rt_spin_lock+0x1f/0x21 <May/17 02:11 pm> [<ffffffff8109aeb2>] force_sig_info+0x28/0xb4 <May/17 02:11 pm> [<ffffffff81068c5d>] do_page_fault+0x801/0x813 <May/17 02:11 pm> [<ffffffff81066a9d>] error_exit+0x0/0x84 <May/17 02:11 pm> [<0000000000401a87>] <May/17 02:11 pm>logger: /usr/bin/rhts-test-runner.sh 1080 hearbeat... <May/17 02:12 pm>logger: /usr/bin/rhts-test-runner.sh 1140 hearbeat... <May/17 02:13 pm>logger: /usr/bin/rhts-test-runner.sh 1200 hearbeat... <May/17 02:13 pm>BUG: scheduling with irqs disabled: crash02/0x00000000/18587 <May/17 02:13 pm>caller is rt_spin_lock_slowlock+0xfe/0x1a1 <May/17 02:13 pm>Call Trace: <May/17 02:13 pm> [<ffffffff8106d59a>] dump_trace+0xaa/0x32a <May/17 02:13 pm> [<ffffffff8106d85b>] show_trace+0x41/0x5c <May/17 02:13 pm> [<ffffffff8106d88b>] dump_stack+0x15/0x17 <May/17 02:13 pm> [<ffffffff8106468c>] schedule+0x82/0x102 <May/17 02:13 pm> [<ffffffff8106566a>] rt_spin_lock_slowlock+0xfe/0x1a1 <May/17 02:13 pm> [<ffffffff81065f46>] rt_spin_lock+0x1f/0x21 <May/17 02:13 pm> [<ffffffff8109aeb2>] force_sig_info+0x28/0xb4 <May/17 02:13 pm> [<ffffffff81068c5d>] do_page_fault+0x801/0x813 <May/17 02:13 pm> [<ffffffff81066a9d>] error_exit+0x0/0x84 <May/17 02:13 pm> [<0000000000401a87>] http://rhts.lab.boston.redhat.com/cgi-bin/rhts/test_log.cgi?id=68172 This was on ibm-defiant.rhts.boston.redhat.com running the racer test. In 2.6.21-11.el5rt I see the following output: BUG: scheduling with irqs disabled: IRQ-9/0x00000001/105 caller is rt_spin_lock_slowlock+0xfe/0x1a1 Call Trace: [<ffffffff8106d5da>] dump_trace+0xaa/0x32a [<ffffffff8106d89b>] show_trace+0x41/0x5c [<ffffffff8106d8cb>] dump_stack+0x15/0x17 [<ffffffff8106468c>] schedule+0x82/0x102 [<ffffffff8106566a>] rt_spin_lock_slowlock+0xfe/0x1a1 [<ffffffff81065f46>] rt_spin_lock+0x1f/0x21 [<ffffffff811bab3f>] serial8250_console_write+0x61/0x148 [<ffffffff81091b94>] __call_console_drivers+0x68/0x79 [<ffffffff81091c14>] _call_console_drivers+0x6f/0x73 [<ffffffff81017357>] release_console_sem+0x139/0x1cf [<ffffffff810923fb>] vprintk+0x2c0/0x314 [<ffffffff810924b6>] printk+0x67/0x69 [<ffffffff8106db5b>] printk_address+0x93/0x9d [<ffffffff8106dc58>] print_trace_address+0xc/0xe [<ffffffff8106d4ac>] dump_trace_unwind+0x56/0xda [<ffffffff810b1b94>] unwind_init_running+0x20/0x22 [<ffffffff8106d5da>] dump_trace+0xaa/0x32a [<ffffffff8106d89b>] show_trace+0x41/0x5c [<ffffffff8106d8cb>] dump_stack+0x15/0x17 [<ffffffff810c5007>] __report_bad_irq+0x38/0x87 [<ffffffff810c521a>] note_interrupt+0x1c4/0x1fc [<ffffffff810c452c>] thread_simple_irq+0x6c/0x7e [<ffffffff810c4c79>] do_irqd+0x13b/0x394 [<ffffffff81033d40>] kthread+0xf5/0x128 [<ffffffff8105ff68>] child_rip+0xa/0x12 BUG: scheduling while atomic: IRQ-9/0x00000001/105, CPU#0 Call Trace: [<ffffffff8106d5da>] dump_trace+0xaa/0x32a [<ffffffff8106d89b>] show_trace+0x41/0x5c [<ffffffff8106d8cb>] dump_stack+0x15/0x17 [<ffffffff810636b8>] __sched_text_start+0x98/0xd20 [<ffffffff810646ec>] schedule+0xe2/0x102 [<ffffffff8106566a>] rt_spin_lock_slowlock+0xfe/0x1a1 [<ffffffff81065f46>] rt_spin_lock+0x1f/0x21 [<ffffffff811bab3f>] serial8250_console_write+0x61/0x148 [<ffffffff81091b94>] __call_console_drivers+0x68/0x79 [<ffffffff81091c14>] _call_console_drivers+0x6f/0x73 [<ffffffff81017357>] release_console_sem+0x139/0x1cf [<ffffffff810923fb>] vprintk+0x2c0/0x314 [<ffffffff810924b6>] printk+0x67/0x69 [<ffffffff8106db5b>] printk_address+0x93/0x9d [<ffffffff8106dc58>] print_trace_address+0xc/0xe [<ffffffff8106d4ac>] dump_trace_unwind+0x56/0xda [<ffffffff810b1b94>] unwind_init_running+0x20/0x22 [<ffffffff8106d5da>] dump_trace+0xaa/0x32a [<ffffffff8106d89b>] show_trace+0x41/0x5c [<ffffffff8106d8cb>] dump_stack+0x15/0x17 [<ffffffff810c5007>] __report_bad_irq+0x38/0x87 [<ffffffff810c521a>] note_interrupt+0x1c4/0x1fc [<ffffffff810c452c>] thread_simple_irq+0x6c/0x7e [<ffffffff810c4c79>] do_irqd+0x13b/0x394 [<ffffffff81033d40>] kthread+0xf5/0x128 [<ffffffff8105ff68>] child_rip+0xa/0x12 [<ffffffff8106d89b>] show_trace+0x41/0x5c syscalls/lseek/l [<ffffffff8106d8cb>] dump_stack+0x15/0x17 comment #3 is a different bug (different source of the problem, although it gives the same error). I put in a patch to do a might_sleep() in rt_spin_lock_fastlock (sent to LKML too). With the patch, I get the "scheduling with irqs disabled" everytime I have a program crash (purposely). So this is *not* fixed, and is *not* a dup of 237463. Created attachment 155954 [details]
fix for irqs disabled while sending segfault
Now it's solved. In do_page_fault if the faulting address from user code is > PAGE_OFFSET (say (void*)-1) then the code will jump right to bad_area_nosemaphore. The code then checks if it was userspace that caused the fault, and if so it sends a signal. But it skipped the local_irq_enable() part, so interrupts are disabled here. So faulting at something like (void*)1 would not cause a problem (IRQS would be enabled since the condition of address > PAGE_OFFSET fails). But if the user code faults at (void*)-1 interrupts would never be enabled, since the test for address > PAGE_OFFSET succeeds. Attached is a patch to always enable interrupts for user faults. this patch is now part of the -rt patch series |