Bug 65322 - Interrupts only routing to single CPU in Intel E7500-based smp system
Summary: Interrupts only routing to single CPU in Intel E7500-based smp system
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.3
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: Brian Brock
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2002-05-21 23:55 UTC by Roderick Constance
Modified: 2007-04-18 16:42 UTC (History)
3 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2002-10-26 04:13:51 UTC


Attachments (Terms of Use)
steps to reproduce (7.35 KB, text/plain)
2002-05-22 00:06 UTC, Roderick Constance
no flags Details
dmesg (17.93 KB, text/plain)
2002-05-22 00:07 UTC, Roderick Constance
no flags Details
/proc/cpuinfo (1.64 KB, text/plain)
2002-05-22 00:08 UTC, Roderick Constance
no flags Details
lspci (1.87 KB, text/plain)
2002-05-22 00:08 UTC, Roderick Constance
no flags Details

Description Roderick Constance 2002-05-21 23:55:50 UTC
Description of Problem:

Interrupts are only routed to the lowest numbered processor instead of being 
distributed to all processors.  I can force an interrupt to be routed to an 
individual processor (using /proc/irq & smp_affinity) if only that _single_ 
processor is selected.  If more than one processor is selected, the interrupt 
is routed to the lowest numbered processor.

This is in a Intel E7500-based (Tyan S2720) system with two 2.2GHz P4 Xeon 
CPUs.  Turning on/off HyperThreading doesn't matter.

I couldn't find any information of this in Intel's documention for the E7500 
chipset.

Version-Release number of selected component (if applicable):
RedHat 7.3 errata (2.4.18-4smp)


How Reproducible:
Always

Steps to Reproduce:
1. see attachment
2 [details]. 
3. 

Actual Results:
see attachment


Expected Results:
I expected interrupts to be routed to different CPUs when smp_affinity is 
ffffffff

Additional Information:

Comment 1 Roderick Constance 2002-05-22 00:06:25 UTC
Created attachment 58144 [details]
steps to reproduce

Comment 2 Roderick Constance 2002-05-22 00:07:26 UTC
Created attachment 58145 [details]
dmesg

Comment 3 Roderick Constance 2002-05-22 00:08:03 UTC
Created attachment 58146 [details]
/proc/cpuinfo

Comment 4 Roderick Constance 2002-05-22 00:08:44 UTC
Created attachment 58147 [details]
lspci

Comment 5 Arjan van de Ven 2002-05-22 08:52:33 UTC
Thanks to intel for changing the apic rules... Note that w2k has the same issues.
We have a patch to fix this; question is if it's important enough (since it's a
patch that's not risk free)

Comment 6 Philip Pokorny 2002-05-22 17:18:45 UTC
Can you attach the patch to this bug?  What are the risks?  Any pointers to
Intel PDF's/chipset documentation that describes the changed behavior?

In your professional opinion, what is the performance impact of all interrupts
being handed by a single CPU?

Comment 7 Arjan van de Ven 2002-05-22 18:28:04 UTC
Under normal workloads there won't be any measureable difference (in fact, it
can be slightly better due to better cache use). However it is measureable in a
specweb benchmark with 4 GigE cards.

Comment 8 Roderick Constance 2002-05-24 23:14:33 UTC
I've seen Ingo Molnar's apic-route patch on SourceForge for a 2.4.18 Linus 
kernel.  Is this similar to the RedHat patch?  Does RedHat have a patch for 
2.4.18-4 (7.3 errata kernel)?  If so, can you attach it?  Thanks


Comment 9 Ingo Molnar 2002-05-25 06:08:38 UTC
the irqbalance patch is in the 2.5 kernel series, but it obviously has not seen
the kind of extensive testing like the stable 2.4 kernel series. This is why
such a patch, which changes so fundamental parts of the irq code, should be
treated with extreme care.

having said that, i got pretty good feedback wrt. irqbalance, it improves
performance on a number of (non-P4) server workloads, besides solving the P4 irq
routing bug. (due to the added and automatic cache-affinity properties of the
method.)

Comment 10 Jim Wright 2002-10-26 04:13:43 UTC
looks like the 2.4.18-17.7.x release has this patch.  just confirmed on a bigmem
machine.  ok to close.


Note You need to log in before you can comment on or make changes to this bug.