Bug 719446 - irqbalance does not handle renamed Ethernet device
Summary: irqbalance does not handle renamed Ethernet device
Keywords:
Status: CLOSED DUPLICATE of bug 798624
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: irqbalance
Version: 5.6
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Petr Holasek
QA Contact: Network QE
URL:
Whiteboard:
Depends On:
Blocks: 743405 798624
TreeView+ depends on / blocked
 
Reported: 2011-07-06 21:37 UTC by Jeremy Mueller
Modified: 2018-12-01 18:53 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 798624 (view as bug list)
Environment:
Last Closed: 2013-01-21 16:03:19 UTC
Target Upstream Version:


Attachments (Terms of Use)
SRPM with backported netdevs patch (34.56 KB, application/x-rpm)
2011-09-01 15:45 UTC, Petr Holasek
no flags Details

Description Jeremy Mueller 2011-07-06 21:37:33 UTC
Description of problem:

If you rename an Ethernet device to something like "dtc12", the irqbalance daemon will not balance the IRQs for that device.  It fails to detect the device class as Ethernet and instead categorizes it as "other".

This is essentially the same bug and problem as bug 682211.  That bug refers to the RHEL 6 method of renaming a device.


Version-Release number of selected component (if applicable):

irqbalance-0.55-15.el5


How reproducible:

Always


Steps to Reproduce:
1. system-config-network-tui
 - Rename an Ethernet device to something that does not start with "eth".
2. Reboot
  
Actual results:

Watching /proc/interrupts (watch -d cat /proc/interrupts) shows that the interrupts for the NICs stay on a single core:

           CPU0       CPU1       CPU2       CPU3
...
130:       1394          0          0          0         PCI-MSI  dtc12

If you run irqbalance in debug mode (irqbalance --debug), it displays that interrupt as class other:

Package 0:  cpu mask is 0000000f (workload 0)
        Cache domain 2: cpu mask is 0000000c  (workload 0)
                CPU number 3  (workload 0)
                CPU number 2  (workload 0)
        Cache domain 0: cpu mask is 00000003  (workload 0)
                CPU number 1  (workload 0)
                CPU number 0  (workload 0)
...
Interrupt 130 (class other) has workload 12


Expected results:

The NICs should be balanced to different CPUs, and may change CPUs depending on load.

cat /proc/interrupts:
           CPU0       CPU1       CPU2       CPU3
...
146:        286          0          0       1317         PCI-MSI  dtc12

irqbalance --debug:
Package 0:  cpu mask is 0000000f (workload 0)
        Cache domain 2: cpu mask is 0000000c  (workload 0)
                CPU number 3  (workload 0)
                CPU number 2  (workload 0)
        Cache domain 0: cpu mask is 00000003  (workload 0)
                CPU number 1  (workload 0)
                CPU number 0  (workload 0)
...
Interrupt 146 (class ethernet) has workload 9


Additional info:

The cause of the problem is the irqbalance's function find_class and the struct ethernet_modules in classify.c.  This method will only find an Ethernet device if its name contains any of the strings "eth", "e100", "eepro100", "orinico_cs", "wvlan_cs", "3c5", "HiSax".

If you remove the code commenting out the Ethernet handling in irqbalance's numa.c:pci_numa_scan(), irqbalance will detect most of the NIC IRQs correctly and handle them.  This will work for PCI-MSI and APIC-level IRQs, but will not work for the multiple IRQs in some PCI-MSI-X devices (bnx2).

Comment 1 Petr Holasek 2011-08-03 10:38:03 UTC
Patch https://bugzilla.redhat.com/attachment.cgi?id=516487 from bug #682211 could be apply on this problem, I guess.

Comment 3 Petr Holasek 2011-08-30 14:25:45 UTC
Jeremy, did patch fix your issue?

Thanks!
Petr H

Comment 4 Jeremy Mueller 2011-08-30 16:47:35 UTC
Petr,

I could not apply the patch you supplied.  The changes to irqbalance.h and network.c failed because there are no functions called "dev_to_node" or "dev_to_bus" in the source code I have.  I tried applying the patch to the src RPM for irqbalance-0.55-15.el5 on RHEL 5.  I also checked the source for the RHEL 6 package and the irqbalance.org .56 source for the dev_to_bus function and didn't see it.

Thanks,

Jeremy

Comment 5 Petr Holasek 2011-08-30 16:59:32 UTC
Apologize, the patch was for upstream svn top from:

http://irqbalance.googlecode.com/svn/trunk/

But if you want to use RHEL5 RPM, just let me know, I will prepare
testing one with backported patch for you.

Thanks!
Petr H

Comment 6 Jeremy Mueller 2011-08-30 17:51:20 UTC
I'm not having too much luck with compiling irqbalance directly from SVN.  If you can provide a back ported SRC RPM (or the binary RPM) I can test it.

I'll continue to work on the compile as I have time over the next few days.

Thanks,

Jeremy

Comment 7 Petr Holasek 2011-09-01 15:45:24 UTC
Created attachment 521038 [details]
SRPM with backported netdevs patch

Comment 8 Petr Holasek 2011-09-01 15:49:40 UTC
(In reply to comment #7)
> Created attachment 521038 [details]
> SRPM with backported netdevs patch

Problem with compilation was caused by older version of numactl-devel in
RHEL5. I backported only bits of code related to your issue and this SRPM
should be fine.

Thanks!
Petr H

Comment 9 Jeremy Mueller 2011-09-06 19:44:43 UTC
Petr,

I was able to compile, install, and test the SRPM you attached.  This patch fixes the problem.  Before the install, my NIC was detected as 'other' ("Interrupt 90 (class other) has workload 3").  After the install, it is detected as 'ethernet' ("Interrupt 90 (class ethernet) has workload 20").

I debugged it with "sudo irqbalance --debug" to verify the class selection.

Thanks,

Jeremy

Comment 10 Roland Friedwagner 2011-09-24 11:08:10 UTC
I confirm this bug for the current RHN RHEL5 provided package
irqbalance-0.55-15.el5:

/proc/interrupts ...
 52:    1668207          0          0          0          0          0          0          0       PCI-MSI-X  vl666-5
 59:    6678541          0          0          0          0          0          0          0       PCI-MSI-X  iscsi-0
 60:    1843608          0          0          0          0          0          0          0       PCI-MSI-X  vl666-6
 67:    2569910          0          0          0          0          0          0          0       PCI-MSI-X  iscsi-1
 68:    2428199          0          0          0          0          0          0          0       PCI-MSI-X  vl666-7
 75:    2277454          0          0          0          0          0          0          0       PCI-MSI-X  iscsi-2
 83:    2656553          0          0          0          0          0          0          0       PCI-MSI-X  iscsi-3
...

The bugfix srpm(irqbalance-0.55-15.netdevs.el5) 
provided by Petr fixes this issue:

/proc/interrupts ...
 52:      28075      32075      28756      97089       6812      15596      16399      49034       PCI-MSI-X  vl666-5
 59:      82219     125452      85254      71544     222417     222170      69186     108966       PCI-MSI-X  iscsi-0
 60:        101      42262      34036      61607      72672       6124      22149      91436       PCI-MSI-X  vl666-6
 67:      92877     124405      93098      15893      80090      30386      31874      31852       PCI-MSI-X  iscsi-1
 68:       7280        380      25081      30239      26496      44053     113230      64850       PCI-MSI-X  vl666-7
 75:      31843      31822     108718     137668          0      15853      15934      60951       PCI-MSI-X  iscsi-2
 83:      15921      30470          0          0     149690      60925      27957      15934       PCI-MSI-X  iscsi-3
...

Thx & Kind Regards,
Roland

Comment 11 RHEL Program Management 2011-12-13 09:57:39 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated in the
current release, Red Hat is unfortunately unable to address this
request at this time. Red Hat invites you to ask your support
representative to propose this request, if appropriate and relevant,
in the next release of Red Hat Enterprise Linux.

Comment 13 RHEL Program Management 2012-06-12 01:00:13 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated in the
current release, Red Hat is unfortunately unable to address this
request at this time. Red Hat invites you to ask your support
representative to propose this request, if appropriate and relevant,
in the next release of Red Hat Enterprise Linux.

Comment 14 Petr Holasek 2013-01-21 16:03:19 UTC

*** This bug has been marked as a duplicate of bug 798624 ***


Note You need to log in before you can comment on or make changes to this bug.