Bug 245247

Summary: Intel quad-core network IRQ blocked by another device w/ 3com 3c59x on RT kernel
Product: Red Hat Enterprise MRG Reporter: John Shakshober <dshaks>
Component: realtime-kernelAssignee: Luis Claudio R. Goncalves <lgoncalv>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 1.0CC: agospoda, dshaks, lgoncalv, williams
Target Milestone: ---Keywords: Regression
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: 2.6.21-38rt Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-09-24 18:47:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg output from system
none
patch 1 / 3 - genirq-cleanup-mismerge-artifact
none
patch 2 / 3 - genirq-suppress-resend-of-level-interrupts
none
patch 3 / 3 - genirq-mark-io_apic-level-interrupts-to-avoid-resend
none
genirq patches from tglx adapted to rhel-rt none

Description John Shakshober 2007-06-21 21:08:48 UTC
Description of problem:

Ran 2.6.21-31.el5rt on 8-cpu, 4 socket, dual core system with performance suite
successfully.  Was unable to complete performance suite on 8-cpu 2 socket,
quad-core with a 3com 3c59x controller without the network hanging.

Version-Release number of selected component (if applicable):

2.6.21-31.el5rt

How reproducible:

Every time.

Steps to Reproduce:
1. boot system
2. run load or simple wait 2 hours
3. network card IRQ blocked by another device ... 
  
Actual results:

eth3: Resetting the Tx ring pointer.
eth3: no IPv6 routers present
NETDEV WATCHDOG: eth3: transmit timed out
eth3: transmit timed out, tx_status 00 status e681.
  diagnostics: net 0ccc media 8880 dma 0000003a fifo 8000
eth3: Interrupt posted but not delivered -- IRQ blocked by another device?
  Flags; bus-master 1, dirty 32(0) current 32(0)
  Transmit list 00000000 vs. ffff810037f11200.
  0: @ffff810037f11200  length 80000166 status 00010166
  1: @ffff810037f112a0  length 80000166 status 00010166
  2: @ffff810037f11340  length 80000154 status 00010154
  3: @ffff810037f113e0  length 80000036 status 00010036
  4: @ffff810037f11480  length 80000118 status 00010118
  5: @ffff810037f11520  length 8000005a status 0001005a
  6: @ffff810037f115c0  length 800000f8 status 0c0100f8
  7: @ffff810037f11660  length 80000154 status 00010154
  8: @ffff810037f11700  length 800000c5 status 0c0100c5
  9: @ffff810037f117a0  length 80000083 status 0c010083
  10: @ffff810037f11840  length 8000002a status 0001002a
  11: @ffff810037f118e0  length 80000118 status 00010118
  12: @ffff810037f11980  length 80000046 status 00010046
  13: @ffff810037f11a20  length 80000154 status 00010154
  14: @ffff810037f11ac0  length 8000002a status 8001002a
  15: @ffff810037f11b60  length 8000002a status 8001002a
eth3: Resetting the Tx ring pointer.

Expected results:

NEtwork card stays functional?

Additional info:

Comment 1 John Shakshober 2007-06-21 21:08:49 UTC
Created attachment 157571 [details]
dmesg output from system

Comment 2 Luis Claudio R. Goncalves 2007-06-25 16:30:38 UTC
Could you please attach the folowing:

1. /var/log/messages - preferably from boot to the network hang
2. the output of lspci and lspci -v
3. lsmod output
4. "ps -eo pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:14,comm" during a hang



Comment 4 John Shakshober 2007-06-28 21:10:17 UTC
Output from 2.6.21-31.el5rt ... moving to rt32

[root@perf4 ~]# uname -a
Linux perf4.lab.boston.redhat.com 2.6.21-31.el5rt #1 SMP PREEMPT RT Mon Jun 18
16:44:12 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
[root@perf4 ~]# lspci
00:00.0 Host bridge: Intel Corporation 5000P Chipset Memory Controller Hub (rev 92)
00:02.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port
2-3 (rev 92)
00:04.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port
4-5 (rev 92)
00:06.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port
6-7 (rev 92)
00:08.0 System peripheral: Intel Corporation 5000 Series Chipset DMA Engine (rev 92)
00:10.0 Host bridge: Intel Corporation 5000 Series Chipset Error Reporting
Registers (rev 92)
00:10.1 Host bridge: Intel Corporation 5000 Series Chipset Error Reporting
Registers (rev 92)
00:10.2 Host bridge: Intel Corporation 5000 Series Chipset Error Reporting
Registers (rev 92)
00:11.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers
(rev 92)
00:13.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers
(rev 92)
00:15.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev 92)
00:16.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev 92)
00:1c.0 PCI bridge: Intel Corporation 631xESB/632xESB/3100 Chipset PCI Express
Root Port 1 (rev 09)
00:1d.0 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB
Controller #1 (rev 09)
00:1d.1 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB
Controller #2 (rev 09)
00:1d.2 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB
Controller #3 (rev 09)
00:1d.7 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset EHCI USB2
Controller (rev 09)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d9)
00:1f.0 ISA bridge: Intel Corporation 631xESB/632xESB/3100 Chipset LPC Interface
Controller (rev 09)
00:1f.2 IDE interface: Intel Corporation 631xESB/632xESB/3100 Chipset SATA
Storage Controller IDE (rev 09)
00:1f.3 SMBus: Intel Corporation 631xESB/632xESB/3100 Chipset SMBus Controller
(rev 09)
01:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Upstream Port
(rev 01)
01:00.3 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express to PCI-X
Bridge (rev 01)
02:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream
Port E1 (rev 01)
02:02.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream
Port E3 (rev 01)
03:00.0 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge A (rev 09)
03:00.2 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge B (rev 09)
04:02.0 SCSI storage controller: Adaptec AIC-7902B U320 (rev 10)
04:02.1 SCSI storage controller: Adaptec AIC-7902B U320 (rev 10)
05:01.0 Fibre Channel: Emulex Corporation Thor-X LightPulse Fibre Channel Host
Adapter (rev 01)
05:01.1 Fibre Channel: Emulex Corporation Thor-X LightPulse Fibre Channel Host
Adapter (rev 01)
06:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet
Controller (Copper) (rev 01)
06:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet
Controller (Copper) (rev 01)
07:01.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 78)
0b:01.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)
[root@perf4 ~]# lsmod
Module                  Size  Used by
hidp                   88832  2
l2cap                  92672  5 hidp
bluetooth             129028  2 hidp,l2cap
sunrpc                234472  1
loop                   53136  0
video                  54288  0
sbs                    51544  0
i2c_ec                 39552  1 sbs
dock                   45640  0
button                 42912  0
battery                45320  0
asus_acpi              53548  0
ac                     39688  0
ipv6                  493792  34
parport_pc             65192  1
lp                     49040  0
parport                77708  2 parport_pc,lp
ata_piix               50820  0
floppy                105000  0
sg                     73256  0
pcspkr                 36992  0
i2c_i801               43420  0
ata_generic            42884  0
libata                165664  2 ata_piix,ata_generic
i2c_core               60160  2 i2c_ec,i2c_i801
shpchp                 70556  0
lpfc                  219720  0
scsi_transport_fc      77444  1 lpfc
3c59x                  82612  0
e1000                 166208  0
mii                    39424  1 3c59x
serio_raw              41348  0
ide_cd                 76960  0
cdrom                  70696  1 ide_cd
dm_snapshot            52216  0
dm_zero                35456  0
dm_mirror              57280  0
dm_mod                103952  8 dm_snapshot,dm_zero,dm_mirror
aic79xx               212444  2
scsi_transport_spi     63360  1 aic79xx
sd_mod                 56704  3
scsi_mod              203576  7
sg,libata,lpfc,scsi_transport_fc,aic79xx,scsi_transport_spi,sd_mod
ext3                  177936  2
jbd                   101616  1 ext3
ehci_hcd               70412  0
ohci_hcd               56708  0
uhci_hcd               60960  0

Comment 5 Luis Claudio R. Goncalves 2007-07-11 21:38:21 UTC
Just to consolidate what we have so far:

* 2.6.21-25el5rt does not present the bug (shak)
* 2.6.21-31el5rt and later presented the bug (shak and lclaudio)
* Similar reports have been found in the internet (ranging from 2.4 to 2.6
kernels). They always _seem_ to involve IOAPIC. (lclaudio)
* between -25 and -31 several changes took place. Some of them:   (acme)
  - jump from patch-2.6.21-rt7 to patch-2.6.21.5-rt10
  - addition of preempt-irqs-x86-64-ioapic-mask-quirk.patch
* The bug was reproduced in 2.6.21-31elrt vanilla - kernel-rt-vanilla (lclaudio)


Comment 6 Tim Burke 2007-07-26 17:26:09 UTC
On Thu, Jul 26, 2007 at 09:17:37AM -0400, Tim Burke wrote:
| - Luis - Bugzilla Bug 245247
| <https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=245247>: Intel
| quad-core network IRQ blocked by another device w/ 3com 3c59x on RT kernel
| Vanilla kernel reproduced - which driver/subsystem is suspect?  (ie, is 
| there someone else we should drag in?)

Just to add more detail, this bug appears in 32 and 64bit land, UP and SMP,
RT and vanilla, most distros have BZ entries on this one and none have real
solutions. The black magic workarounds works for some cases and not for
others. 

Jeff Burke asked eng people to install Shak's 3com NIC in one machine of
the RHTS test farm, so that I can have more control over the machine and do
further testing.

Comment 7 Arnaldo Carvalho de Melo 2007-08-01 22:05:15 UTC
Testing with 2.6.22 final I couldn't reproduce this problem, details:

[root@mica ~]# uname -a
Linux mica.ghostprotocols.net 2.6.22 #1 SMP PREEMPT Wed Jul 25 12:05:55 BRT 2007
x86_64 x86_64 x86_64 GNU/Linux

-bash-3.00# lspci | grep Tornado
00:09.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 74)

-bash-3.00# cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 8
model name      : AMD Athlon(tm) XP 2000+
stepping        : 0
cpu MHz         : 1665.982
cache size      : 256 KB

Machine has only one processor.

-bash-3.00# cat /proc/interrupts
           CPU0
  0:    6427617    XT-PIC-XT        timer
  1:        490    XT-PIC-XT        i8042
  2:          0    XT-PIC-XT        cascade
  3:          2    XT-PIC-XT        ehci_hcd:usb1
  5:          0    XT-PIC-XT        uhci_hcd:usb4
 10:  112575891    XT-PIC-XT        eth0, uhci_hcd:usb3
 11:          0    XT-PIC-XT        uhci_hcd:usb2
 12:          4    XT-PIC-XT        i8042
 14:     395679    XT-PIC-XT        ide0
NMI:       6802
LOC:    6427728
ERR:      38196
MIS:          0
-bash-3.00#

dmesg bits:

3c59x: Donald Becker and others.
0000:00:09.0: 3Com PCI 3c905C Tornado at f880cf80.
eth0:  setting full-duplex.

Ran a mix of netcat + scp + nfsclient + ttcp + ping flooding with a mix of
packet sizes, almost wire speed was achieved, only 5 or six of these messages
were seen:

eth0: Too much work in interrupt, status e401.

Tests ran for several hours, will change the slot and try with a RT kernel over
night.

Comment 8 Arnaldo Carvalho de Melo 2007-08-01 22:09:02 UTC
Oops, this is the machine with the tornado NIC:

-bash-3.00# uname -a
Linux tonchinha.ghostprotocols.net 2.6.22 #12 SMP PREEMPT Wed Aug 1 11:51:23 BRT
2007 i686 AMD Athlon(tm) XP 2000+ unknown GNU/Linux

mica.ghostprotocols.net is the poweredge 1950 used to do most of the network
stressing on this athlon with the tornado NIC.

Comment 9 Luis Claudio R. Goncalves 2007-08-06 21:28:26 UTC
Although I have also failed to reproduce this bug at home, I am testing right
now two possible solutions:

http://lkml.org/lkml/2007/8/6/30

and 

http://lkml.org/lkml/2007/8/6/301
with the patch
http://cvs.fedora.redhat.com/viewcvs/*checkout*/rpms/kernel/F-7/linux-2.6-irq-dont-mask-interrupts-_reversed_.patch?root=extras

The last patch didn't applied cleanly.

I have been conducting tests in a C2D, 2.6.21-34.el5rt and -31.el5rt x86_64.


Comment 10 Luis Claudio R. Goncalves 2007-08-10 14:38:01 UTC
None of the above patches worked for me.

In that two-socket quadcore (8 CPU) of the bug report I was able to reproduce
the problem in less than 10 seconds using two 'ping -f -s 60000 <other-ip>'.

I have also tried mixing module parameter combinations, with no luck. I tried
different combinations for max_interrupt_work, rx_copybreak, watchdog, use_mmio
and the others. No matter what I was able to trigger the bug in less than 10s.

I had performed several tests with 'debug=7' and identified a sequence that
seems to happen before the bug gets triggered:

   boomerang_interrupt. status=0xe401
   eth3: interrupt, status e401, latency 2 ticks.
   eth3: In interrupt loop, status e401.
   boomerang_interrupt->boomerang_rx
   boomerang_rx(): status e001
   Receiving packet size 1514 status 200085ea.
   Receiving packet size 1514 status 200085ea.
   eth3: In interrupt loop, status e601.
   boomerang_interrupt->boomerang_rx
   boomerang_rx(): status e201
   boomerang_interrupt: wake queue
   eth3: exiting interrupt, status e000.
   boomerang_start_xmit()

Two interactions of a do/while loop in 3c59x.c, function boomerang_interrupt(),
between lines 2301 and 2379. After this sequence, there are a few more packets
being receoved and then:

   NETDEV WATCHDOG: eth3: transmit timed out
   eth3: transmit timed out, tx_status 00 status e601.
     diagnostics: net 0ccc media 8880 dma 0000003a fifo 0000
   eth3: Interrupt posted but not delivered -- IRQ blocked by another device?
   boomerang_interrupt. status=0x8601
   eth3: interrupt, status 8601, latency 255 ticks.
   eth3: In interrupt loop, status 8601.
   boomerang_interrupt->boomerang_rx
   boomerang_rx(): status 8201
   Receiving packet size 1514 status 200085ea.
   Receiving packet size 1514 status 200085ea.

   ...

   boomerang_interrupt: wake queue
   eth3: In interrupt loop, status 8401.
   boomerang_interrupt->boomerang_rx
   boomerang_rx(): status 8001
   Receiving packet size 1514 status 200085ea.
   Receiving packet size 60 status 803c.
   Receiving packet size 60 status 803c.
   Receiving packet size 60 status 803c.
   eth3: exiting interrupt, status 8000.
     Flags; bus-master 1, dirty 30595(3) current 30595(3)
     Transmit list 00000000 vs. ffff810037f0a3e0.
     0: @ffff810037f0a200  length 800005ea status 000105ea
     1: @ffff810037f0a2a0  length 800005ea status 800105ea

   ...


Comment 11 Luis Claudio R. Goncalves 2007-08-10 14:53:16 UTC
While studying the new logs, Arnaldo pointed me to the following discussions in
@netdev, @linux-net and @LKML:

  2.6.20->2.6.21 - networking dies after random time
  http://www.mail-archive.com/linux-net@vger.kernel.org/msg01428.html
  http://marc.info/?l=linux-kernel&m=118202978609968&w=2

  2.6.23-rc2: WARNING at kernel/irq/resend.c:70 check_irq_resend()
  http://lkml.org/lkml/2007/8/8/399

Right now it seems that ne2k_pci with 8390, 3c59x, 8139cp and probably sky2 are
suffering from the same problem. Discussions are heading towards a probable bug
  or subtle misbehavior in lower layers, irq resending code, that appears as the
bug in this Bugzilla entry. No clear answer as of now.


Comment 12 Luis Claudio R. Goncalves 2007-08-15 21:47:33 UTC
Testted almos all the patches from those threads and new threads forked fro
them. No good results yet. I will follow my own tests, that are in a different
direction.

Comment 13 Luis Claudio R. Goncalves 2007-09-11 13:29:38 UTC
Created attachment 192481 [details]
patch 1 / 3 - genirq-cleanup-mismerge-artifact

As the discussions about irq resend evolved, Thomas Gleixner came with a
solution that I will test here. There are reports of this fixing the real
problem, not only the symptoms. Patches attached.


For a complete explanation, please refer to these two posts:

http://lists-archives.org/linux-kernel/13398532-genirq-mark-io_apic-level-interrupts-to-avoid-resend.html


and

http://lists-archives.org/linux-kernel/13399129-genirq-mark-io_apic-level-interrupts-to-avoid-resend.html

Comment 14 Luis Claudio R. Goncalves 2007-09-11 13:30:21 UTC
Created attachment 192491 [details]
patch 2 / 3 - genirq-suppress-resend-of-level-interrupts

Comment 15 Luis Claudio R. Goncalves 2007-09-11 13:31:13 UTC
Created attachment 192501 [details]
patch 3 / 3 - genirq-mark-io_apic-level-interrupts-to-avoid-resend

Comment 16 Luis Claudio R. Goncalves 2007-09-11 16:58:49 UTC
Created attachment 192701 [details]
genirq patches from tglx adapted to rhel-rt

Clark, here are the paches in a tar.gz file.

Comment 17 Luis Claudio R. Goncalves 2007-09-24 18:47:04 UTC
As there was no more noise on the subject upstream and the testers in the
threads were all happy with the fixes, I will close this bug.
This solution was also added to Fedora 7 kernel.

Please, reopen this bug if you hit the problem again using 2.6.21-38rt or a
newer rhel-rt kernel.