Bug 247379 - "irqpoll": misrouted interrupts deadlocks
Summary: "irqpoll": misrouted interrupts deadlocks
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.0
Hardware: All
OS: Linux
low
low
Target Milestone: ---
: ---
Assignee: Dave Anderson
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks: 425461
TreeView+ depends on / blocked
 
Reported: 2007-07-08 11:47 UTC by Vasily Averin
Modified: 2008-05-21 14:45 UTC (History)
2 users (show)

Fixed In Version: RHBA-2008-0314
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-05-21 14:45:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
NMI in serial console logs (28.55 KB, text/plain)
2007-07-08 11:49 UTC, Vasily Averin
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2008:0314 0 normal SHIPPED_LIVE Updated kernel packages for Red Hat Enterprise Linux 5.2 2008-05-20 18:43:34 UTC

Description Vasily Averin 2007-07-08 11:47:23 UTC
Pavel Emelianov from OpenVZ/Virtuozzo linux kernel team has noticed the
following issue:

"While testing kernel on machine with "irqpoll" option I've caught such a
lockup:

__do_IRQ()
   spin_lock(&desc->lock);
           desc->chip->ack(); /* IRQ is ACKed */
note_interrupt()
misrouted_irq()
handle_IRQ_event()
           if (...)
      local_irq_enable_in_hardirq();
/* interrupts are enabled from now */
...
__do_IRQ() /* same IRQ we've started from */
   spin_lock(&desc->lock); /* LOCKUP */

Looking at misrouted_irq() code I've found that a potential deadlock like
this can also take place:

1CPU:
__do_IRQ()
   spin_lock(&desc->lock); /* irq = A */
misrouted_irq()
   for (i = 1; i < NR_IRQS; i++) {
      spin_lock(&desc->lock); /* irq = B */
      if (desc->status & IRQ_INPROGRESS) {

2CPU:
__do_IRQ()
   spin_lock(&desc->lock); /* irq = B */
misrouted_irq()
   for (i = 1; i < NR_IRQS; i++) {
      spin_lock(&desc->lock); /* irq = A */
      if (desc->status & IRQ_INPROGRESS) {

As the second lock on both CPUs is taken before checking that this irq is
being handled in another processor this may cause a deadlock.  This issue
is only theoretical."

This issue is fixed in linux mainstream by the following patches:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=f72fa707604c015a6625e80f269506032d5430dc
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b42172fc7b569a0ef2b0fa38d71382969074c0e2

I've reproduced this issue when I used RHEL5 kerenl as kdump-kernel (kdump  uses
"irqpoll" option by default).

Comment 1 Vasily Averin 2007-07-08 11:49:34 UTC
Created attachment 158728 [details]
NMI in serial console logs

Comment 2 Dave Anderson 2007-10-30 20:26:16 UTC
Vasily,

> This issue is fixed in linux mainstream by the following patches:
> ...

Can you please confirm:

(1) The resultant RHEL5 patch should look like this:

--- linux-2.6.18.noarch/kernel/irq/handle.c.orig
+++ linux-2.6.18.noarch/kernel/irq/handle.c
@@ -233,10 +233,10 @@ fastcall unsigned int __do_IRQ(unsigned
                spin_unlock(&desc->lock);

                action_ret = handle_IRQ_event(irq, regs, action);
-
-               spin_lock(&desc->lock);
                if (!noirqdebug)
                        note_interrupt(irq, desc, action_ret, regs);
+
+               spin_lock(&desc->lock);
                if (likely(!(desc->status & IRQ_PENDING)))
                        break;
                desc->status &= ~IRQ_PENDING;

and: 

(2) If I provide you a test kernel, that you will be able to reproduce
    the conditions in order to confirm that the patch works?

Thanks,
  Dave Anderson



Comment 3 Vasily Averin 2007-10-31 08:54:18 UTC
Dave,

OpenVZ kernels uses such patch long time ago and I can confirm that it should
fix this issue.

Unfortunately we are unable to reproduce this issue. Of course we can load your
testkernel on the failed node, but its work can guarantee nothing.

Thank you,
Vasily Averin

Comment 4 Dave Anderson 2007-10-31 13:49:43 UTC
Ok, I figured that was probably the case...

Anyway, I appreciate your attaching the console log
of a confirmed case in a RHEL5-based kernel.

Thanks,
  Dave




Comment 5 RHEL Program Management 2007-11-01 18:25:29 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 6 Dave Anderson 2007-11-05 15:28:13 UTC
(In reply to comment #3)
> ... 
> Unfortunately we are unable to reproduce this issue. Of course we can load
> your testkernel on the failed node, but its work can guarantee nothing.
> 
> Thank you,
> Vasily Averin

Hi Vasily,

I understand that this cannot be reproduced on demand, but for the sake
of this bugzilla, can you exercise this test kernel, and verify the patch?

An x86_64 binary rpm, the source rpm, and the linux-kernel-test.patch file
that is applied to the base kernel can be found here:

  http://people.redhat.com/anderson/BZ_247379

Thanks,
  Dave



Comment 7 Vasily Averin 2007-11-12 13:09:48 UTC
Patch looks correct.
Kernel has been running for 3 days already, it is stable and doesn't crash.

Comment 8 Dave Anderson 2007-11-12 14:42:02 UTC
Vasily, 

Thanks for taking the time to test it and check the patch.

Just to be clear on the the specific NMI watchdog timeout
scenario indicated in the attachment from comment #1, can
you confirm the following (or offer up the correct explanation): 

1. The "hald-addon-stor" task was doing a close(), presumably
on the ide cd rom device, which caused it to issue a commands
via ide_outsl() -- when an interrupt of some other device occurred.

(It appears to have been a timer/clock interrupt?)

2. The incoming IRQ took its desc->lock, ACK'd, and then handled,
its IRQ, and then called note_interrupt().

3. note_interrupt() in turn called misrouted_irq(), which
saw the ide-cd interrupt, and called handle_IRQ_event() for
that device.

4. handle_IRQ_event() re-enabled interrupts via local_irq_enable_in_hardirq().

5. we then see the trail through ide_intr(), which eventually
leads to the printk() of "hda: cdrom_pc_intr: The drive appears confused..."

6. the printk() temporarily disables interrupts, but when complete,
re-enables whatever flags status was in play at the time; as soon as
that happened, another (timer?) interrupt of the same type as the 
original kicked in, causing the lock-up.

That make sense?

 



Comment 9 Vasily Averin 2007-11-13 07:19:22 UTC
Your explanation is correct.
I would add that irq number should be saved in some register. I don't have
vmlinux  for your kernel but assume that it is RCX. RCX=0 ==> it was irq0, i.e
timer interrupt.

Comment 10 Vasily Averin 2007-11-13 07:35:55 UTC
Also I would note that it's very likely that first interrupt was irq0 too:

note_interrupt() calls misrouted_irq() under following conditions:
kernel/irq/spurious.c:note_interrupt()::
if ((irqfixup == 2 && irq == 0) || action_ret == IRQ_NONE)

We have irqfixup == 2 (irqpoll) and very likely irq=0

Comment 11 Dave Anderson 2007-11-13 13:51:26 UTC
I'll see if I can verify the IRQ number by the exception
frame contents.  I was also looking at the stale remnants
on the stack, i.e., the lock_timer_base, _activate_task, 
enqueue_task...

Thanks again for your help.



Comment 14 Don Zickus 2007-12-17 19:36:16 UTC
in 2.6.18-61.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 17 errata-xmlrpc 2008-05-21 14:45:44 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html



Note You need to log in before you can comment on or make changes to this bug.