Bug 236423

Summary: [RHEL5 RT] irq 16: nobody cared (try booting with the "irqpoll" option)
Product: Red Hat Enterprise MRG Reporter: Jeff Burke <jburke>
Component: realtime-kernelAssignee: Michal Schmidt <mschmidt>
Status: CLOSED DUPLICATE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 1.0CC: mschmidt
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
URL: http://rhts.lab.boston.redhat.com/cgi-bin/rhts/test_log.cgi?id=1244985
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-04-28 12:50:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg output
none
/proc/interrupts
none
lspci -vxxx output none

Description Jeff Burke 2007-04-13 20:03:20 UTC
Description of problem:
 While running ltp tests on dl360-01.rhts.boston.redhat.com the system aborts
the test. The log is scrolling with the following message.

<Apr/13 01:04 pm>irq 16: nobody cared (try booting with the "irqpoll" option)

Version-Release number of selected component (if applicable):
 2.6.20-14.el5rt

How reproducible:
 75%

Steps to Reproduce:
1. Run the ltp test on dl360-01.rhts.boston.redhat.com
  
Actual results:
<Apr/13 01:04 pm>irq 16: nobody cared (try booting with the "irqpoll" option)
<Apr/13 01:04 pm> [<c04051b3>] dump_trace+0x63/0x1eb
<Apr/13 01:04 pm> [<c0405358>] show_trace_log_lvl+0x1d/0x3a
<Apr/13 01:04 pm> [<c0405987>] show_trace+0x12/0x14
<Apr/13 01:04 pm> [<c0405a1a>] dump_stack+0x16/0x18
<Apr/13 01:04 pm> [<c045e714>] __report_bad_irq+0x39/0x81
<Apr/13 01:04 pm> [<c045e8e8>] note_interrupt+0x18c/0x1bc
<Apr/13 01:04 pm> [<c045de56>] thread_simple_irq+0x5e/0x6c
<Apr/13 01:04 pm> [<c045e500>] do_irqd+0xe1/0x299
<Apr/13 01:04 pm> [<c043bdb2>] kthread+0xb3/0xdd
<Apr/13 01:04 pm> [<c0404d97>] kernel_thread_helper+0x7/0x10
<Apr/13 01:04 pm> =======================
<Apr/13 01:04 pm>---------------------------
<Apr/13 01:04 pm>| preempt count: 00000001 ]
<Apr/13 01:04 pm>| 1-level deep critical section nesting:
<Apr/13 01:04 pm>----------------------------------------
<Apr/13 01:04 pm>.. [<c0634ad8>] .... __spin_lock_irq+0x1a/0x2b
<Apr/13 01:04 pm>.....[<c045de41>] ..   ( <= thread_simple_irq+0x49/0x6c)

<Apr/13 01:04 pm>handlers:
<Apr/13 01:04 pm>[<c059377e>] (usb_hcd_irq+0x0/0x52)
<Apr/13 01:04 pm>turning off IO-APIC fast mode.
<Apr/13 01:06 pm>irq 16: nobody cared (try booting with the "irqpoll" option)
<Apr/13 01:06 pm> [<c04051b3>] dump_trace+0x63/0x1eb
<Apr/13 01:06 pm> [<c0405358>] show_trace_log_lvl+0x1d/0x3a
<Apr/13 01:06 pm> [<c0405987>] show_trace+0x12/0x14
<Apr/13 01:06 pm> [<c0405a1a>] dump_stack+0x16/0x18
<Apr/13 01:06 pm> [<c045e714>] __report_bad_irq+0x39/0x81
<Apr/13 01:06 pm> [<c045e8e8>] note_interrupt+0x18c/0x1bc
<Apr/13 01:06 pm> [<c045de56>] thread_simple_irq+0x5e/0x6c
<Apr/13 01:06 pm> [<c045e500>] do_irqd+0xe1/0x299
<Apr/13 01:06 pm> [<c043bdb2>] kthread+0xb3/0xdd
<Apr/13 01:06 pm> [<c0404d97>] kernel_thread_helper+0x7/0x10
<Apr/13 01:06 pm> =======================
<Apr/13 01:06 pm>---------------------------
<Apr/13 01:06 pm>| preempt count: 00000001 ]
<Apr/13 01:06 pm>| 1-level deep critical section nesting:
<Apr/13 01:06 pm>----------------------------------------
<Apr/13 01:06 pm>.. [<c0634ad8>] .... __spin_lock_irq+0x1a/0x2b
<Apr/13 01:06 pm>.....[<c045de41>] ..   ( <= thread_simple_irq+0x49/0x6c)

<Apr/13 01:06 pm>handlers:
<Apr/13 01:06 pm>[<c059377e>] (usb_hcd_irq+0x0/0x52)
<Apr/13 01:08 pm>irq 16: nobody cared (try booting with the "irqpoll" option)
<Apr/13 01:08 pm> [<c04051b3>] dump_trace+0x63/0x1eb
<Apr/13 01:08 pm> [<c0405358>] show_trace_log_lvl+0x1d/0x3a
<Apr/13 01:08 pm> [<c0405987>] show_trace+0x12/0x14<Apr/13 01:08 pm>
<Apr/13 01:08 pm> [<c0405a1a>] dump_stack+0x16/0x18
<Apr/13 01:08 pm> [<c045e714>] __report_bad_irq+0x39/0x81
<Apr/13 01:08 pm> [<c045e8e8>] note_interrupt+0x18c/0x1bc
<Apr/13 01:08 pm> [<c045de56>] thread_simple_irq+0x5e/0x6c
<Apr/13 01:08 pm> [<c045e500>] do_irqd+0xe1/0x299
<Apr/13 01:08 pm> [<c043bdb2>] kthread+0xb3/0xdd
<Apr/13 01:08 pm> [<c0404d97>] kernel_thread_helper+0x7/0x10
<Apr/13 01:08 pm> =======================
<Apr/13 01:08 pm>---------------------------
<Apr/13 01:08 pm>| preempt count: 00000001 ]
<Apr/13 01:08 pm>| 1-level deep critical section nesting:
<Apr/13 01:08 pm>----------------------------------------
<Apr/13 01:08 pm>.. [<c0634ad8>] .... __spin_lock_irq+0x1a/0x2b
<Apr/13 01:08 pm>.....[<c045de41>] ..   ( <= thread_simple_irq+0x49/0x6c)

Expected results:
 This test should pass

Additional info:
Here is a link to the kernel
 http://porkchop.redhat.com/brewroot/packages/kernel-rt/2.6.20/14.el5rt/

Here is information on how to setup a local system to use the RHTS devel
environment.

You can down load the tests from here:
http://rhts.lab.boston.redhat.com/rpms/development/noarch/noarch/rh-tests-kernel-distribution-ltp-20070228-20070228-10.noarch.rpm

This test it written for the RHTS environment. So you can install the
RHTS devel package following the directions from here:
 http://intranet.corp.redhat.com/ic/intranet/RHTSMainPage.html
then install the test and do a make run.

Or if you do _not_ want to install the RHTS devel stuff comment out the
following:
Makefile
# Include a global make rules file
include /usr/share/rhts/lib/rhts-make.include

runtest.sh
# Source the common test script helpers
. /usr/bin/rhts_environment.sh

then do a make run.
Jeff

Comment 1 Jeff Burke 2007-04-13 20:06:32 UTC
Created attachment 152581 [details]
dmesg output

Comment 2 Jeff Burke 2007-04-13 20:07:14 UTC
Created attachment 152582 [details]
/proc/interrupts

Comment 3 Jeff Burke 2007-04-13 20:07:49 UTC
Created attachment 152583 [details]
lspci -vxxx output

Comment 4 Michal Schmidt 2007-04-18 13:46:10 UTC
Jeff,
I have access to a DL360, but it's an older model than yours (2xPentium III
CPUs, no PCI Express,...) and it doesn't have this issue. Can I get access to
dl360-01.rhts.boston.redhat.com so I can experiment with it?
Does booting with acpi=noirq make the issue go away? Bug 236150 is another
"irq:nobody cared" bug and it is mentioned there as a workaround.

Comment 5 Michal Schmidt 2007-04-20 09:11:05 UTC
I have just become aware of the RHTS reservation system. I'll experiment with
the machine myself.

Comment 6 Michal Schmidt 2007-04-28 12:46:41 UTC
On dl360-01.rhts.boston.redhat.com only usb_uhci should generate IRQ 16. What 
we're seeing here is interrupts from IRQ 20 (cciss driver) duplicated on IRQ 
16 for unknown reason. One does not need LTP to reproduce it. Any disk 
activity suffices (eg. cat /dev/cciss/c0d0 > /dev/null). usb_uhci's interrupt 
handler of course always returns IRQ_NONE, so after every 100000 
interrupts "irq 16: nobody cared" message is printed.
The interrupts are delivered to IRQ 16 if and only if threaded hardirqs are 
being used. This can be verified by compiling the kernel with:
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_SOFTIRQS is not set
CONFIG_PREEMPT_HARDIRQS=y
When booted with "hardirq-preempt=off", the interrupt count on IRQ 16 remains 
zero all the time. After enabling threaded hardirqs:
echo 1 > /proc/sys/kernel/hardirq_preemption
the problem appears. If hardirq_preemption is then disabled again, IRQ 16 
acitivity ceases.
I haven't yet found the reason why this happens.

Comment 7 Michal Schmidt 2007-04-28 12:50:14 UTC
This is the same problem as in bug 236150, only the IRQ numbers are different 
(IRQ 20->16 vs. IRQ 16->9).

*** This bug has been marked as a duplicate of 236150 ***