Bug 236375

Summary: [RHEL5 RT] BUG: scheduling with irqs disabled:
Product: Red Hat Enterprise MRG Reporter: Jeff Burke <jburke>
Component: realtime-kernelAssignee: Steven Rostedt <srostedt>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 1.0   
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
URL: http://rhts.lab.boston.redhat.com/cgi-bin/rhts/test_log.cgi?id=1240940
Whiteboard:
Fixed In Version: 2.6.21-31.el5rt Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-06-20 16:04:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
fix for irqs disabled while sending segfault none

Description Jeff Burke 2007-04-13 14:38:25 UTC
Description of problem:
 On dl360-01.rhts.boston.redhat.com - x86_64 While running test
/kernel/distribution/ltp/20070228 the following BUG happens.

BUG: scheduling with irqs disabled: crash02/0x00000000/16756
caller is rt_spin_lock_slowlock+0x104/0x1b5

Version-Release number of selected component (if applicable):
 2.6.20-12.el5rt

How reproducible:
 Unknown

Actual results:

Call Trace:
 [<ffffffff8106d603>] dump_trace+0xb6/0x3ce
 [<ffffffff8106d962>] show_trace+0x47/0x68
 [<ffffffff8106d998>] dump_stack+0x15/0x17
 [<ffffffff810646a4>] schedule+0x82/0x102
 [<ffffffff81065691>] rt_spin_lock_slowlock+0x104/0x1b5
 [<ffffffff81065f8e>] rt_spin_lock+0x1f/0x21
 [<ffffffff8109b352>] force_sig_info+0x28/0xb4
 [<ffffffff81068c95>] do_page_fault+0x801/0x813
 [<ffffffff81066aed>] error_exit+0x0/0x84
 [<0000000000401a87>]

Expected results:
 Test should pass

Additional info:

Comment 2 Jeff Burke 2007-05-18 00:25:54 UTC
This also happens on the 2.6.21-6.el5rt kernel

<May/17 02:03 pm>BUG: scheduling with irqs disabled: crash02/0x00000000/18587
<May/17 02:03 pm>caller is rt_spin_lock_slowlock+0xfe/0x1a1
<May/17 02:03 pm>Call Trace:
<May/17 02:03 pm> [<ffffffff8106d59a>] dump_trace+0xaa/0x32a
<May/17 02:03 pm> [<ffffffff8106d85b>] show_trace+0x41/0x5c
<May/17 02:03 pm> [<ffffffff8106d88b>] dump_stack+0x15/0x17
<May/17 02:03 pm> [<ffffffff8106468c>] schedule+0x82/0x102
<May/17 02:03 pm> [<ffffffff8106566a>] rt_spin_lock_slowlock+0xfe/0x1a1
<May/17 02:03 pm> [<ffffffff81065f46>] rt_spin_lock+0x1f/0x21
<May/17 02:03 pm> [<ffffffff8109aeb2>] force_sig_info+0x28/0xb4
<May/17 02:03 pm> [<ffffffff81068c5d>] do_page_fault+0x801/0x813
<May/17 02:03 pm> [<ffffffff81066a9d>] error_exit+0x0/0x84
<May/17 02:03 pm> [<0000000000401a87>]
<May/17 02:05 pm>BUG: scheduling with irqs disabled: crash02/0x00000000/18590
<May/17 02:05 pm>caller is rt_spin_lock_slowlock+0xfe/0x1a1
<May/17 02:05 pm>Call Trace:
<May/17 02:05 pm> [<ffffffff8106d59a>] dump_trace+0xaa/0x32a
<May/17 02:05 pm> [<ffffffff8106d85b>] show_trace+0x41/0x5c
<May/17 02:05 pm> [<ffffffff8106d88b>] dump_stack+0x15/0x17
<May/17 02:05 pm> [<ffffffff8106468c>] schedule+0x82/0x102
<May/17 02:05 pm> [<ffffffff8106566a>] rt_spin_lock_slowlock+0xfe/0x1a1
<May/17 02:05 pm> [<ffffffff81065f46>] rt_spin_lock+0x1f/0x21
<May/17 02:05 pm> [<ffffffff8109aeb2>] force_sig_info+0x28/0xb4
<May/17 02:05 pm> [<ffffffff81068c5d>] do_page_fault+0x801/0x813
<May/17 02:05 pm> [<ffffffff81066a9d>] error_exit+0x0/0x84
<May/17 02:05 pm> [<0000000000401a87>]
<May/17 02:07 pm>BUG: scheduling with irqs disabled: crash02/0x00000000/18589
<May/17 02:07 pm>caller is rt_spin_lock_slowlock+0xfe/0x1a1
<May/17 02:07 pm>Call Trace:
<May/17 02:07 pm> [<ffffffff8106d59a>] dump_trace+0xaa/0x32a
<May/17 02:07 pm> [<ffffffff8106d85b>] show_trace+0x41/0x5c
<May/17 02:07 pm> [<ffffffff8106d88b>] dump_stack+0x15/0x17
<May/17 02:07 pm> [<ffffffff8106468c>] schedule+0x82/0x102
<May/17 02:07 pm> [<ffffffff8106566a>] rt_spin_lock_slowlock+0xfe/0x1a1
<May/17 02:07 pm> [<ffffffff81065f46>] rt_spin_lock+0x1f/0x21
<May/17 02:07 pm> [<ffffffff8109aeb2>] force_sig_info+0x28/0xb4
<May/17 02:07 pm> [<ffffffff81068c5d>] do_page_fault+0x801/0x813
<May/17 02:07 pm> [<ffffffff81066a9d>] error_exit+0x0/0x84
<May/17 02:07 pm> [<0000000000401a87>]
<May/17 02:07 pm>BUG: scheduling with irqs disabled: crash02/0x00000000/18586
<May/17 02:07 pm>caller is rt_spin_lock_slowlock+0xfe/0x1a1
<May/17 02:07 pm>Call Trace:
<May/17 02:07 pm> [<ffffffff8106d59a>] dump_trace+0xaa/0x32a
<May/17 02:07 pm> [<ffffffff8106d85b>] show_trace+0x41/0x5c
<May/17 02:07 pm> [<ffffffff8106d88b>] dump_stack+0x15/0x17
<May/17 02:07 pm> [<ffffffff8106468c>] schedule+0x82/0x102
<May/17 02:07 pm> [<ffffffff8106566a>] rt_spin_lock_slowlock+0xfe/0x1a1
<May/17 02:07 pm> [<ffffffff81065f46>] rt_spin_lock+0x1f/0x21
<May/17 02:07 pm> [<ffffffff8109aeb2>] force_sig_info+0x28/0xb4
<May/17 02:07 pm> [<ffffffff81068c5d>] do_page_fault+0x801/0x813
<May/17 02:07 pm> [<ffffffff81066a9d>] error_exit+0x0/0x84
<May/17 02:07 pm> [<0000000000401a87>]
<May/17 02:08 pm>BUG: scheduling with irqs disabled: crash02/0x00000000/18587
<May/17 02:08 pm>caller is rt_spin_lock_slowlock+0xfe/0x1a1
<May/17 02:08 pm>Call Trace:
<May/17 02:08 pm> [<ffffffff8106d59a>] dump_trace+0xaa/0x32a
<May/17 02:08 pm> [<ffffffff8106d85b>] show_trace+0x41/0x5c
<May/17 02:08 pm> [<ffffffff8106d88b>] dump_stack+0x15/0x17
<May/17 02:08 pm> [<ffffffff8106468c>] schedule+0x82/0x102
<May/17 02:08 pm> [<ffffffff8106566a>] rt_spin_lock_slowlock+0xfe/0x1a1
<May/17 02:08 pm> [<ffffffff81065f46>] rt_spin_lock+0x1f/0x21
<May/17 02:08 pm> [<ffffffff8109aeb2>] force_sig_info+0x28/0xb4
<May/17 02:08 pm> [<ffffffff81068c5d>] do_page_fault+0x801/0x813
<May/17 02:08 pm> [<ffffffff81066a9d>] error_exit+0x0/0x84
<May/17 02:08 pm> [<0000000000401a87>]
<May/17 02:11 pm>BUG: scheduling with irqs disabled: crash02/0x00000000/18589
<May/17 02:11 pm>caller is rt_spin_lock_slowlock+0xfe/0x1a1
<May/17 02:11 pm>Call Trace:
<May/17 02:11 pm> [<ffffffff8106d59a>] dump_trace+0xaa/0x32a
<May/17 02:11 pm> [<ffffffff8106d85b>] show_trace+0x41/0x5c
<May/17 02:11 pm> [<ffffffff8106d88b>] dump_stack+0x15/0x17
<May/17 02:11 pm> [<ffffffff8106468c>] schedule+0x82/0x102
<May/17 02:11 pm> [<ffffffff8106566a>] rt_spin_lock_slowlock+0xfe/0x1a1
<May/17 02:11 pm> [<ffffffff81065f46>] rt_spin_lock+0x1f/0x21
<May/17 02:11 pm> [<ffffffff8109aeb2>] force_sig_info+0x28/0xb4
<May/17 02:11 pm> [<ffffffff81068c5d>] do_page_fault+0x801/0x813
<May/17 02:11 pm> [<ffffffff81066a9d>] error_exit+0x0/0x84
<May/17 02:11 pm> [<0000000000401a87>]
<May/17 02:11 pm>logger: /usr/bin/rhts-test-runner.sh 1080 hearbeat...
<May/17 02:12 pm>logger: /usr/bin/rhts-test-runner.sh 1140 hearbeat...
<May/17 02:13 pm>logger: /usr/bin/rhts-test-runner.sh 1200 hearbeat...
<May/17 02:13 pm>BUG: scheduling with irqs disabled: crash02/0x00000000/18587
<May/17 02:13 pm>caller is rt_spin_lock_slowlock+0xfe/0x1a1
<May/17 02:13 pm>Call Trace:
<May/17 02:13 pm> [<ffffffff8106d59a>] dump_trace+0xaa/0x32a
<May/17 02:13 pm> [<ffffffff8106d85b>] show_trace+0x41/0x5c
<May/17 02:13 pm> [<ffffffff8106d88b>] dump_stack+0x15/0x17
<May/17 02:13 pm> [<ffffffff8106468c>] schedule+0x82/0x102
<May/17 02:13 pm> [<ffffffff8106566a>] rt_spin_lock_slowlock+0xfe/0x1a1
<May/17 02:13 pm> [<ffffffff81065f46>] rt_spin_lock+0x1f/0x21
<May/17 02:13 pm> [<ffffffff8109aeb2>] force_sig_info+0x28/0xb4
<May/17 02:13 pm> [<ffffffff81068c5d>] do_page_fault+0x801/0x813
<May/17 02:13 pm> [<ffffffff81066a9d>] error_exit+0x0/0x84
<May/17 02:13 pm> [<0000000000401a87>]

http://rhts.lab.boston.redhat.com/cgi-bin/rhts/test_log.cgi?id=68172
This was on ibm-defiant.rhts.boston.redhat.com running the racer test.

Comment 3 Jeff Burke 2007-05-23 13:03:21 UTC
In 2.6.21-11.el5rt I see the following output:
BUG: scheduling with irqs disabled: IRQ-9/0x00000001/105
caller is rt_spin_lock_slowlock+0xfe/0x1a1

Call Trace:
 [<ffffffff8106d5da>] dump_trace+0xaa/0x32a
 [<ffffffff8106d89b>] show_trace+0x41/0x5c
 [<ffffffff8106d8cb>] dump_stack+0x15/0x17
 [<ffffffff8106468c>] schedule+0x82/0x102
 [<ffffffff8106566a>] rt_spin_lock_slowlock+0xfe/0x1a1
 [<ffffffff81065f46>] rt_spin_lock+0x1f/0x21
 [<ffffffff811bab3f>] serial8250_console_write+0x61/0x148
 [<ffffffff81091b94>] __call_console_drivers+0x68/0x79
 [<ffffffff81091c14>] _call_console_drivers+0x6f/0x73
 [<ffffffff81017357>] release_console_sem+0x139/0x1cf
 [<ffffffff810923fb>] vprintk+0x2c0/0x314
 [<ffffffff810924b6>] printk+0x67/0x69
 [<ffffffff8106db5b>] printk_address+0x93/0x9d
 [<ffffffff8106dc58>] print_trace_address+0xc/0xe
 [<ffffffff8106d4ac>] dump_trace_unwind+0x56/0xda
 [<ffffffff810b1b94>] unwind_init_running+0x20/0x22
 [<ffffffff8106d5da>] dump_trace+0xaa/0x32a
 [<ffffffff8106d89b>] show_trace+0x41/0x5c
 [<ffffffff8106d8cb>] dump_stack+0x15/0x17
 [<ffffffff810c5007>] __report_bad_irq+0x38/0x87
 [<ffffffff810c521a>] note_interrupt+0x1c4/0x1fc
 [<ffffffff810c452c>] thread_simple_irq+0x6c/0x7e
 [<ffffffff810c4c79>] do_irqd+0x13b/0x394
 [<ffffffff81033d40>] kthread+0xf5/0x128
 [<ffffffff8105ff68>] child_rip+0xa/0x12

BUG: scheduling while atomic: IRQ-9/0x00000001/105, CPU#0

Call Trace:
 [<ffffffff8106d5da>] dump_trace+0xaa/0x32a
 [<ffffffff8106d89b>] show_trace+0x41/0x5c
 [<ffffffff8106d8cb>] dump_stack+0x15/0x17
 [<ffffffff810636b8>] __sched_text_start+0x98/0xd20
 [<ffffffff810646ec>] schedule+0xe2/0x102
 [<ffffffff8106566a>] rt_spin_lock_slowlock+0xfe/0x1a1
 [<ffffffff81065f46>] rt_spin_lock+0x1f/0x21
 [<ffffffff811bab3f>] serial8250_console_write+0x61/0x148
 [<ffffffff81091b94>] __call_console_drivers+0x68/0x79
 [<ffffffff81091c14>] _call_console_drivers+0x6f/0x73
 [<ffffffff81017357>] release_console_sem+0x139/0x1cf
 [<ffffffff810923fb>] vprintk+0x2c0/0x314
 [<ffffffff810924b6>] printk+0x67/0x69
 [<ffffffff8106db5b>] printk_address+0x93/0x9d
 [<ffffffff8106dc58>] print_trace_address+0xc/0xe
 [<ffffffff8106d4ac>] dump_trace_unwind+0x56/0xda
 [<ffffffff810b1b94>] unwind_init_running+0x20/0x22
 [<ffffffff8106d5da>] dump_trace+0xaa/0x32a
 [<ffffffff8106d89b>] show_trace+0x41/0x5c
 [<ffffffff8106d8cb>] dump_stack+0x15/0x17
 [<ffffffff810c5007>] __report_bad_irq+0x38/0x87
 [<ffffffff810c521a>] note_interrupt+0x1c4/0x1fc
 [<ffffffff810c452c>] thread_simple_irq+0x6c/0x7e
 [<ffffffff810c4c79>] do_irqd+0x13b/0x394
 [<ffffffff81033d40>] kthread+0xf5/0x128
 [<ffffffff8105ff68>] child_rip+0xa/0x12

 [<ffffffff8106d89b>] show_trace+0x41/0x5c
syscalls/lseek/l [<ffffffff8106d8cb>] dump_stack+0x15/0x17

Comment 4 Steven Rostedt 2007-06-01 23:13:51 UTC
comment #3 is a different bug (different source of the problem, although it
gives the same error).

I put in a patch to do a might_sleep() in rt_spin_lock_fastlock (sent to LKML
too).  With the patch, I get the "scheduling with irqs disabled" everytime I
have a program crash (purposely).  So this is *not* fixed, and is *not* a dup of
237463.


Comment 5 Steven Rostedt 2007-06-01 23:27:06 UTC
Created attachment 155954 [details]
fix for irqs disabled while sending segfault

Comment 6 Steven Rostedt 2007-06-01 23:27:53 UTC
Now it's solved.

In do_page_fault if the faulting address from user code is > PAGE_OFFSET (say
(void*)-1) then the code will jump right to bad_area_nosemaphore.  The code then
checks if it was userspace that caused the fault, and if so it sends a signal. 
But it skipped the local_irq_enable() part, so interrupts are disabled here.

So faulting at something like (void*)1 would not cause a problem (IRQS would be
enabled since the condition of address > PAGE_OFFSET fails).  But if the user
code faults at (void*)-1 interrupts would never be enabled, since the test for
address > PAGE_OFFSET succeeds.

Attached is a patch to always enable interrupts for user faults.



Comment 7 Clark Williams 2007-06-20 16:04:22 UTC
this patch is now part of the -rt patch series