Description of problem: The skge driver produces an extremely high number of interrupts (about 180,000 per second) for my SK-9821 adapters, without any network traffic. This starts about 20 minutes after loading the driver and keeps the CPU busy at 20% load. The network is usable nevertheless, but slow. Version-Release number of selected component (if applicable): all version RHEL5-2.6.18 kernels . I tried a 2.6.22.9 kernel; it did not show these symptoms. How reproducible: always. All 3 available machines show the same symptoms. Steps to Reproduce: 1. boot 2. wait about 20 minutes, watching /proc/interrupts 3. see number of interrupts starting to rise extremely Actual results: interrupts for skge are generated at a rate of about 180,000 per second, without any network traffic Expected results: interrupts should depend on network traffic Additional info: The sk98lin driver (available with RHEL4, or when I compile the RHEL5 kernel after enabling that driver as a module) does not show this problem.
Sorry, man. sk98lin isn't enabled because it had been replaced by skge. Sounds like we need to fixup skge. Based soley on the description from this customer it sounds like this patch might help: commit 4ebabfcb1d6af5191ef5c8305717ccbc24979f6c Author: Stephen Hemminger <shemminger> Date: Fri Mar 16 14:01:27 2007 -0700 skge: mask irqs when device down Wheen a port on the skge driver is not used, it should mask off interrupts from theat port. Signed-off-by: Stephen Hemminger <shemminger> Signed-off-by: Jeff Garzik <jeff> I need to do some new test kernels today, so maybe I'll add this one.
Sorry the 'man' reference was to a previous private comment -- please excuse the informality. :)
(In reply to comment #0) > Description of problem: > The skge driver produces an extremely high number of interrupts (about 180,000 > per second) for my SK-9821 adapters, without any network traffic. Was this while the interface was down, or just with the link up and no traffic?
No problem with the 'man' - in Germany Kay is a male name. To answer your question: this was with the link up and (almost) no traffic. So the description "skge: mask irqs when device down" does not seem to apply here.
(In reply to comment #6) > No problem with the 'man' - in Germany Kay is a male name. Good to know. :-) > To answer your question: this was with the link up and (almost) no traffic. So > the description "skge: mask irqs when device down" does not seem to apply here. Thanks for the feedback, I'll sift through the git logs and see if anything else looks like a good candidate to resolve this. If your card is working fine in 2.6.22.9 then the fix must be around somewhere.
Kay, this is an extremely old bug, but I would like to fix it. Can you still reproduce the problem with the skge device?
I was just looking at some possible patches and this may have been fixed in 2.6.18-128.el5. Have you tried the 5.3 kernel and did it fix the issue?
Created attachment 366474 [details] skge-screaming-interrupt.patch Please disregard my last comment. This was not resolved in the 2.6.18-128.el5. This patch may fix it however. I will add it to my test kernels and post a link here when builds have been completed. Until then, feel free to build this patch against the RHEL5 skge driver and test it.
The bug still exists on 2.6.18-164.el5
(In reply to comment #11) > The bug still exists on 2.6.18-164.el5 I would expect that. I will build a test kernel sometime in the next week, but feel free to build test the patch in comment #10 against 2.6.18-164, to see if it helps.
My test kernels have been updated to include a patch for this bugzilla. http://people.redhat.com/agospoda/#rhel5 Please test them and report back your results.
I downloaded the kernel and am running it now: dikay@loop3:-people/dikay% uname -a Linux loop3 2.6.18-174.el5.gtest.77 #1 SMP Tue Nov 17 15:42:39 EST 2009 i686 i686 i386 GNU/Linux however I still see the number of skge interrupts rising at millions per second - the output is after one hour of uptime, on an otherwise idle machine: dikay@loop3:-people/dikay% cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0: 9977531 0 0 0 IO-APIC-edge timer 1: 174 1446 73 6209 IO-APIC-edge i8042 6: 31 0 13 0 IO-APIC-edge floppy 7: 2 0 0 0 IO-APIC-edge parport0 8: 1399005 0 0 0 IO-APIC-edge rtc 9: 1 0 0 0 IO-APIC-level acpi 14: 5615 38641 0 30909 IO-APIC-edge ide0 15: 26 11128 5234 77779 IO-APIC-edge ide1 169: 627 8322 134 69166 IO-APIC-level uhci_hcd:usb1 177: 0 0 0 0 IO-APIC-level uhci_hcd:usb2 185: 15 0 0 0 IO-APIC-level aic7xxx 193: 21731 0 1479663036 0 IO-APIC-level skge 201: 2823 0 0 0 IO-APIC-level Intel 82801BA-ICH2 209: 5896 53787 793335 0 IO-APIC-level nvidia NMI: 0 0 0 0 LOC: 9976477 9976475 9976474 9976473 ERR: 0 MIS: 0
This is a really old bug, but I'd like to clear it out. If you are still interested and willing to test, the patch I am about to attach should resolve this.
Created attachment 517486 [details] skge-possible-interrupt-fix.patch This patch seems to have the best chance to resolve the problem reported. This is a combination of the following upstream patches: 29365c900963d4986b74a0dadea46872bf283d76 78bc218663e3bd6cbbaf6a363d2f88f17541adfb and based on the reporters feedback that 2.6.22.9 worked fine, these two patches together seem like the best option without performing a full backport. The reporter tested 29365c900963d4986b74a0dadea46872bf283d76 previously, but it did not resolve the problem. I think it should be included if we update skge at all, so I kept it and added 78bc218663e3bd6cbbaf6a363d2f88f17541adfb as I think it could be the real problem.
I cannot reproduce this on the skge system we have locally. My system is an older nForce motherboard and CPU and doesn't actually have more than 1 core, so that might be a factor. I hesitate to do this, but I'm going to close this as insufficient data. Please reopen if you find the patch in comment #16 resolves your issue or need anything else. If you do that soon I can try to get that added to the next RHEL update.