Red Hat Bugzilla – Bug 330841
skge driver: interrupt stuck on SK-9821
Last modified: 2014-06-29 18:59:34 EDT
Description of problem:
The skge driver produces an extremely high number of interrupts (about 180,000
per second) for my SK-9821 adapters, without any network traffic. This starts
about 20 minutes after loading the driver and keeps the CPU busy at 20% load.
The network is usable nevertheless, but slow.
Version-Release number of selected component (if applicable):
all version RHEL5-2.6.18 kernels . I tried a 184.108.40.206 kernel; it did not show
always. All 3 available machines show the same symptoms.
Steps to Reproduce:
2. wait about 20 minutes, watching /proc/interrupts
3. see number of interrupts starting to rise extremely
interrupts for skge are generated at a rate of about 180,000 per second, without
any network traffic
interrupts should depend on network traffic
The sk98lin driver (available with RHEL4, or when I compile the RHEL5 kernel
after enabling that driver as a module) does not show this problem.
Sorry, man. sk98lin isn't enabled because it had been replaced by skge. Sounds
like we need to fixup skge. Based soley on the description from this customer
it sounds like this patch might help:
Author: Stephen Hemminger <firstname.lastname@example.org>
Date: Fri Mar 16 14:01:27 2007 -0700
skge: mask irqs when device down
Wheen a port on the skge driver is not used, it should
mask off interrupts from theat port.
Signed-off-by: Stephen Hemminger <email@example.com>
Signed-off-by: Jeff Garzik <firstname.lastname@example.org>
I need to do some new test kernels today, so maybe I'll add this one.
Sorry the 'man' reference was to a previous private comment -- please excuse the
(In reply to comment #0)
> Description of problem:
> The skge driver produces an extremely high number of interrupts (about 180,000
> per second) for my SK-9821 adapters, without any network traffic.
Was this while the interface was down, or just with the link up and no traffic?
No problem with the 'man' - in Germany Kay is a male name.
To answer your question: this was with the link up and (almost) no traffic. So
the description "skge: mask irqs when device down" does not seem to apply here.
(In reply to comment #6)
> No problem with the 'man' - in Germany Kay is a male name.
Good to know. :-)
> To answer your question: this was with the link up and (almost) no traffic. So
> the description "skge: mask irqs when device down" does not seem to apply here.
Thanks for the feedback, I'll sift through the git logs and see if anything else
looks like a good candidate to resolve this. If your card is working fine in
220.127.116.11 then the fix must be around somewhere.
Kay, this is an extremely old bug, but I would like to fix it.
Can you still reproduce the problem with the skge device?
I was just looking at some possible patches and this may have been fixed in 2.6.18-128.el5.
Have you tried the 5.3 kernel and did it fix the issue?
Created attachment 366474 [details]
Please disregard my last comment. This was not resolved in the 2.6.18-128.el5. This patch may fix it however. I will add it to my test kernels and post a link here when builds have been completed. Until then, feel free to build this patch against the RHEL5 skge driver and test it.
The bug still exists on 2.6.18-164.el5
(In reply to comment #11)
> The bug still exists on 2.6.18-164.el5
I would expect that. I will build a test kernel sometime in the next week, but feel free to build test the patch in comment #10 against 2.6.18-164, to see if it helps.
My test kernels have been updated to include a patch for this bugzilla.
Please test them and report back your results.
I downloaded the kernel and am running it now:
dikay@loop3:-people/dikay% uname -a
Linux loop3 2.6.18-174.el5.gtest.77 #1 SMP Tue Nov 17 15:42:39 EST 2009 i686 i686 i386 GNU/Linux
however I still see the number of skge interrupts rising at millions per second - the output is after one hour of uptime, on an otherwise idle machine:
dikay@loop3:-people/dikay% cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
0: 9977531 0 0 0 IO-APIC-edge timer
1: 174 1446 73 6209 IO-APIC-edge i8042
6: 31 0 13 0 IO-APIC-edge floppy
7: 2 0 0 0 IO-APIC-edge parport0
8: 1399005 0 0 0 IO-APIC-edge rtc
9: 1 0 0 0 IO-APIC-level acpi
14: 5615 38641 0 30909 IO-APIC-edge ide0
15: 26 11128 5234 77779 IO-APIC-edge ide1
169: 627 8322 134 69166 IO-APIC-level uhci_hcd:usb1
177: 0 0 0 0 IO-APIC-level uhci_hcd:usb2
185: 15 0 0 0 IO-APIC-level aic7xxx
193: 21731 0 1479663036 0 IO-APIC-level skge
201: 2823 0 0 0 IO-APIC-level Intel 82801BA-ICH2
209: 5896 53787 793335 0 IO-APIC-level nvidia
NMI: 0 0 0 0
LOC: 9976477 9976475 9976474 9976473
This is a really old bug, but I'd like to clear it out. If you are still interested and willing to test, the patch I am about to attach should resolve this.
Created attachment 517486 [details]
This patch seems to have the best chance to resolve the problem reported. This is a combination of the following upstream patches:
and based on the reporters feedback that 18.104.22.168 worked fine, these two patches together seem like the best option without performing a full backport.
The reporter tested 29365c900963d4986b74a0dadea46872bf283d76 previously, but it did not resolve the problem. I think it should be included if we update skge at all, so I kept it and added 78bc218663e3bd6cbbaf6a363d2f88f17541adfb as I think it could be the real problem.
I cannot reproduce this on the skge system we have locally. My system is an older nForce motherboard and CPU and doesn't actually have more than 1 core, so that might be a factor.
I hesitate to do this, but I'm going to close this as insufficient data. Please reopen if you find the patch in comment #16 resolves your issue or need anything else. If you do that soon I can try to get that added to the next RHEL update.