RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 617178 - IRQs recieved count increases on masked CPUs
Summary: IRQs recieved count increases on masked CPUs
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: irqbalance
Version: 6.1
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Petr Holasek
QA Contact: Petr Beňas
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-07-22 11:56 UTC by Petr Beňas
Modified: 2016-10-04 04:07 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-08-23 14:48:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Petr Beňas 2010-07-22 11:56:48 UTC
Description of problem:
If you try to mask consecutively all CPUs, some of them does not seem to be masked at all.

Version-Release number of selected component (if applicable):
[root@dell-pesc1425-02 ~]# uname -r
2.6.32-44.1.el6.x86_64
[root@dell-pesc1425-02 ~]# rpm -qa irqbalance
irqbalance-0.55-25.el6.x86_64

How reproducible:
always

Steps to Reproduce:
folow any part of this scenario or use automated test linked in QA Whiteboard field

[root@dell-pesc1425-02 ~]# service irqbalance stop
Stopping irqbalance: [  OK  ]
[root@dell-pesc1425-02 ~]# irqbalance --debug
Package 0:  cpu mask is 00000005 (workload 0)
        Cache domain 0: cpu mask is 00000005  (workload 0) 
                CPU number 0  (workload 0)
                CPU number 2  (workload 0)
Package 1:  cpu mask is 0000000a (workload 0)
        Cache domain 1: cpu mask is 0000000a  (workload 0) 
                CPU number 1  (workload 0)
                CPU number 3  (workload 0)
Interrupt 32 (class ethernet) has workload 23 
Interrupt 0 (class timer) has workload 0 
Interrupt 18 (class storage) has workload 0 
Interrupt 4 (class legacy) has workload 24 
Interrupt 14 (class other) has workload 0 

-----------------------------------------------------------------------------
IRQ delta is 28 
Rescanning cpu topology 
Package 0:  cpu mask is 00000005 (workload 0)
        Cache domain 0: cpu mask is 00000005  (workload 0) 
                CPU number 0  (workload 0)
                CPU number 2  (workload 0)
Package 1:  cpu mask is 0000000a (workload 0)
        Cache domain 1: cpu mask is 0000000a  (workload 0) 
                CPU number 1  (workload 0)
                CPU number 3  (workload 0)
Package 0:  cpu mask is 00000005 (workload 9)
        Cache domain 0: cpu mask is 00000005  (workload 8) 
                CPU number 0  (workload 0)
                CPU number 2  (workload 8)
                  Interrupt 32 (ethernet/7) 
  Interrupt 14 (other/0) 
Package 1:  cpu mask is 0000000a (workload 10)
        Cache domain 1: cpu mask is 0000000a  (workload 10) 
                CPU number 1  (workload 0)
                CPU number 3  (workload 0)
          Interrupt 18 (storage/0) 
          Interrupt 4 (legacy/8) 

^C
[root@dell-pesc1425-02 ~]# service irqbalance stop
Stopping irqbalance: [FAILED]
00000000000000000000000000000000000000111NCE_BANNED_CPUS=00000000000000000000000 
[root@dell-pesc1425-02 ~]# echo $IRQBALANCE_BANNED_CPUS
0000000000000000000000000000000000000000000000000000000000000111
[root@dell-pesc1425-02 ~]# irqbalance --debug
Package 1:  cpu mask is 0000000a (workload 0)
        Cache domain 1: cpu mask is 0000000a  (workload 0) 
                CPU number 1  (workload 0)
                CPU number 3  (workload 0)
Package 2:  cpu mask is 00000004 (workload 0)
        Cache domain 2: cpu mask is 00000004  (workload 0) 
                CPU number 2  (workload 0)
Interrupt 32 (class ethernet) has workload 23 
Interrupt 0 (class timer) has workload 0 
Interrupt 18 (class storage) has workload 0 
Interrupt 4 (class legacy) has workload 21 
Interrupt 14 (class other) has workload 0 



-----------------------------------------------------------------------------
IRQ delta is 28 
Rescanning cpu topology 
Package 1:  cpu mask is 0000000a (workload 0)
        Cache domain 1: cpu mask is 0000000a  (workload 0) 
                CPU number 1  (workload 0)
                CPU number 3  (workload 0)
Package 2:  cpu mask is 00000004 (workload 0)
        Cache domain 2: cpu mask is 00000004  (workload 0) 
                CPU number 2  (workload 0)
Package 1:  cpu mask is 0000000a (workload 9)
        Cache domain 1: cpu mask is 0000000a  (workload 8) [root@dell-pesc1425-02 ~]# service irqbalance stop
Stopping irqbalance: [  OK  ]
[root@dell-pesc1425-02 ~]# irqbalance --debug
Package 0:  cpu mask is 00000005 (workload 0)
        Cache domain 0: cpu mask is 00000005  (workload 0) 
                CPU number 0  (workload 0)
                CPU number 2  (workload 0)
Package 1:  cpu mask is 0000000a (workload 0)
        Cache domain 1: cpu mask is 0000000a  (workload 0) 
                CPU number 1  (workload 0)
                CPU number 3  (workload 0)
Interrupt 32 (class ethernet) has workload 23 
Interrupt 0 (class timer) has workload 0 
Interrupt 18 (class storage) has workload 0 
Interrupt 4 (class legacy) has workload 24 
Interrupt 14 (class other) has workload 0 



-----------------------------------------------------------------------------
IRQ delta is 28 
Rescanning cpu topology 
Package 0:  cpu mask is 00000005 (workload 0)
        Cache domain 0: cpu mask is 00000005  (workload 0) 
                CPU number 0  (workload 0)
                CPU number 2  (workload 0)
Package 1:  cpu mask is 0000000a (workload 0)
        Cache domain 1: cpu mask is 0000000a  (workload 0) 
                CPU number 1  (workload 0)
                CPU number 3  (workload 0)
Package 0:  cpu mask is 00000005 (workload 9)
        Cache domain 0: cpu mask is 00000005  (workload 8) 
                CPU number 0  (workload 0)
                CPU number 2  (workload 8)
                  Interrupt 32 (ethernet/7) 
  Interrupt 14 (other/0) 
Package 1:  cpu mask is 0000000a (workload 10)
        Cache domain 1: cpu mask is 0000000a  (workload 10) 
                CPU number 1  (workload 0)
                CPU number 3  (workload 0)
          Interrupt 18 (storage/0) 
          Interrupt 4 (legacy/8) 

^C
[root@dell-pesc1425-02 ~]# service irqbalance stop
Stopping irqbalance: [FAILED]
00000000000000000000000000000000000000111NCE_BANNED_CPUS=00000000000000000000000 
[root@dell-pesc1425-02 ~]# echo $IRQBALANCE_BANNED_CPUS
0000000000000000000000000000000000000000000000000000000000000111
[root@dell-pesc1425-02 ~]# irqbalance --debug
Package 1:  cpu mask is 0000000a (workload 0)
        Cache domain 1: cpu mask is 0000000a  (workload 0) 
                CPU number 1  (workload 0)
                CPU number 3  (workload 0)
Package 2:  cpu mask is 00000004 (workload 0)
        Cache domain 2: cpu mask is 00000004  (workload 0) 
                CPU number 2  (workload 0)
Interrupt 32 (class ethernet) has workload 23 
Interrupt 0 (class timer) has workload 0 

                CPU number 1  (workload 0)
                CPU number 3  (workload 8)
                  Interrupt 32 (ethernet/7)
  Interrupt 14 (other/0) 
Package 2:  cpu mask is 00000004 (workload 9)
        Cache domain 2: cpu mask is 00000004  (workload 9) 
                CPU number 2  (workload 0)
          Interrupt 18 (storage/0) 
          Interrupt 4 (legacy/7) 
^C
[root@dell-pesc1425-02 ~]# service irqbalance start
Starting irqbalance: [  OK  ]
[root@dell-pesc1425-02 ~]# cat /proc/interrupts | tail -n+2 | egrep -v \
> 'LOC|NMI|TLB|MCP|CAL|RES' | awk '{ s+=$5 } END { print s }'
6872
TLB|MCP|CAL|RES' | awk '{ s+=$5 } END { print s }'tail -n+2 | egrep -v 'LOC|NMI|T
6895
TLB|MCP|CAL|RES' | awk '{ s+=$5 } END { print s }'tail -n+2 | egrep -v 'LOC|NMI|T
6926
[root@dell-pesc1425-02 ~]# cat /proc/interrupts 
            CPU0       CPU1       CPU2       CPU3       
   0:        162          0          0          2   IO-APIC-edge      timer
   1:          0          0          0          2   IO-APIC-edge      i8042
   4:          0          0        715        772   IO-APIC-edge      serial
   8:          0          0          0          1   IO-APIC-edge      rtc0
   9:          0          0          0          0   IO-APIC-fasteoi   acpi
  12:          0          0          0          4   IO-APIC-edge      i8042
  14:          0          0          0        109   IO-APIC-edge      ata_piix
  15:          0          0          0          0   IO-APIC-edge      ata_piix
  16:          0          0          0          0   IO-APIC-fasteoi   uhci_hcd:usb2
  17:          0          0          0          2   IO-APIC-fasteoi   radeon
  18:          0          0         33       4280   IO-APIC-fasteoi   ata_piix
  19:          0          0          0          0   IO-APIC-fasteoi   uhci_hcd:usb3
  23:          0          0          0          1   IO-APIC-fasteoi   ehci_hcd:usb1
  32:          0          0       2902       1908   IO-APIC-fasteoi   eth0
 NMI:        466        231        241        149   Non-maskable interrupts
 LOC:      35663      31565      29231      28810   Local timer interrupts
 SPU:          0          0          0          0   Spurious interrupts
 PMI:          0          0          0          0   Performance monitoring interrupts
 PND:          0          0          0          0   Performance pending work
 RES:       1212       1626       2111       2175   Rescheduling interrupts
 CAL:       2250        148        147        197   Function call interrupts
 TLB:        361        307        690        845   TLB shootdowns
 TRM:          0          0          0          0   Thermal event interrupts
 THR:          0          0          0          0   Threshold APIC interrupts
 MCE:          0          0          0          0   Machine check exceptions
 MCP:          2          2          2          2   Machine check polls
 ERR:          3
 MIS:          0
[root@dell-pesc1425-02 ~]# cat /proc/interrupts 
            CPU0       CPU1       CPU2       CPU3       
   0:        162          0          0          2   IO-APIC-edge      timer
   1:          0          0          0          2   IO-APIC-edge      i8042
   4:          0          0        855        772   IO-APIC-edge      serial
   8:          0          0          0          1   IO-APIC-edge      rtc0
   9:          0          0          0          0   IO-APIC-fasteoi   acpi
  12:          0          0          0          4   IO-APIC-edge      i8042
  14:          0          0          0        109   IO-APIC-edge      ata_piix
  15:          0          0          0          0   IO-APIC-edge      ata_piix
  16:          0          0          0          0   IO-APIC-fasteoi   uhci_hcd:usb2
  17:          0          0          0          2   IO-APIC-fasteoi   radeon
  18:          0          0         33       4280   IO-APIC-fasteoi   ata_piix
  19:          0          0          0          0   IO-APIC-fasteoi   uhci_hcd:usb3
  23:          0          0          0          1   IO-APIC-fasteoi   ehci_hcd:usb1
  32:          0          0       2902       1947   IO-APIC-fasteoi   eth0
 NMI:        466        231        241        149   Non-maskable interrupts
 LOC:      35700      31622      29263      28823   Local timer interrupts
 SPU:          0          0          0          0   Spurious interrupts
 PMI:          0          0          0          0   Performance monitoring interrupts
 PND:          0          0          0          0   Performance pending work
 RES:       1212       1626       2111       2175   Rescheduling interrupts
 CAL:       2250        148        147        197   Function call interrupts
 TLB:        361        307        690        846   TLB shootdowns
 TRM:          0          0          0          0   Thermal event interrupts
 THR:          0          0          0          0   Threshold APIC interrupts
 MCE:          0          0          0          0   Machine check exceptions
 MCP:          2          2          2          2   Machine check polls
 ERR:          3
 MIS:          0
[root@dell-pesc1425-02 ~]# cat /proc/interrupts 
            CPU0       CPU1       CPU2       CPU3       
   0:        162          0          0          2   IO-APIC-edge      timer
   1:          0          0          0          2   IO-APIC-edge      i8042
   4:          0          0        995        772   IO-APIC-edge      serial
   8:          0          0          0          1   IO-APIC-edge      rtc0
   9:          0          0          0          0   IO-APIC-fasteoi   acpi
  12:          0          0          0          4   IO-APIC-edge      i8042
  14:          0          0          0        109   IO-APIC-edge      ata_piix
  15:          0          0          0          0   IO-APIC-edge      ata_piix
  16:          0          0          0          0   IO-APIC-fasteoi   uhci_hcd:usb2
  17:          0          0          0          2   IO-APIC-fasteoi   radeon
  18:          0          0         38       4280   IO-APIC-fasteoi   ata_piix
  19:          0          0          0          0   IO-APIC-fasteoi   uhci_hcd:usb3
  23:          0          0          0          1   IO-APIC-fasteoi   ehci_hcd:usb1
  32:          0          0       2902       2303   IO-APIC-fasteoi   eth0
 NMI:        466        231        241        149   Non-maskable interrupts
 LOC:      36175      32078      29397      29127   Local timer interrupts
 SPU:          0          0          0          0   Spurious interrupts
 PMI:          0          0          0          0   Performance monitoring interrupts
 PND:          0          0          0          0   Performance pending work
 RES:       1212       1626       2111       2175   Rescheduling interrupts
 CAL:       2250        150        147        197   Function call interrupts
 TLB:        361        307        690        847   TLB shootdowns
 TRM:          0          0          0          0   Thermal event interrupts
 THR:          0          0          0          0   Threshold APIC interrupts
 MCE:          0          0          0          0   Machine check exceptions
 MCP:          2          2          2          2   Machine check polls
 ERR:          3
 MIS:          0
00000000000000000000000000000000000001011NCE_BANNED_CPUS=00000000000000000000000 
[root@dell-pesc1425-02 ~]# echo $IRQBALANCE_BANNED_CPUS
0000000000000000000000000000000000000000000000000000000000001011
[root@dell-pesc1425-02 ~]# service irqbalance stop
Stopping irqbalance: [FAILED]
[root@dell-pesc1425-02 ~]# irqbalance --debug
Package 1:  cpu mask is 0000000a (workload 0)
        Cache domain 1: cpu mask is 0000000a  (workload 0) 
                CPU number 1  (workload 0)
                CPU number 3  (workload 0)
Package 2:  cpu mask is 00000004 (workload 0)
        Cache domain 2: cpu mask is 00000004  (workload 0) 
                CPU number 2  (workload 0)
Interrupt 32 (class ethernet) has workload 26 
Interrupt 0 (class timer) has workload 0 
Interrupt 18 (class storage) has workload 0 
Interrupt 4 (class legacy) has workload 21 
Interrupt 14 (class other) has workload 0 

-----------------------------------------------------------------------------
IRQ delta is 28 
Rescanning cpu topology 
Package 1:  cpu mask is 0000000a (workload 0)
        Cache domain 1: cpu mask is 0000000a  (workload 0) 
                CPU number 1  (workload 0)
                CPU number 3  (workload 0)
Package 2:  cpu mask is 00000004 (workload 0)
        Cache domain 2: cpu mask is 00000004  (workload 0) 
                CPU number 2  (workload 0)
Package 1:  cpu mask is 0000000a (workload 10)
        Cache domain 1: cpu mask is 0000000a  (workload 9) 
                CPU number 1  (workload 0)
                CPU number 3  (workload 9)
                  Interrupt 32 (ethernet/8) 
  Interrupt 14 (other/0) 
Package 2:  cpu mask is 00000004 (workload 9)
        Cache domain 2: cpu mask is 00000004  (workload 9) 
                CPU number 2  (workload 0)
          Interrupt 18 (storage/0) 
          Interrupt 4 (legacy/7) 

00000000000000000000000000000000000001101NCE_BANNED_CPUS=00000000000000000000000 
[root@dell-pesc1425-02 ~]# echo $IRQBALANCE_BANNED_CPUS
0000000000000000000000000000000000000000000000000000000000001101
[root@dell-pesc1425-02 ~]# irqbalance --debug
Package 1:  cpu mask is 0000000a (workload 0)
        Cache domain 1: cpu mask is 0000000a  (workload 0) 
                CPU number 1  (workload 0)
                CPU number 3  (workload 0)
Package 2:  cpu mask is 00000004 (workload 0)
        Cache domain 2: cpu mask is 00000004  (workload 0) 
                CPU number 2  (workload 0)
[root@dell-pesc1425-02 ~]# irqbalance --debug
Package 1:  cpu mask is 0000000a (workload 0)
        Cache domain 1: cpu mask is 0000000a  (workload 0) 
                CPU number 1  (workload 0)
                CPU number 3  (workload 0)
Package 2:  cpu mask is 00000004 (workload 0)
        Cache domain 2: cpu mask is 00000004  (workload 0) 
                CPU number 2  (workload 0)
00000000000000000000000000000000000001110NCE_BANNED_CPUS=00000000000000000000000 
[root@dell-pesc1425-02 ~]# echo $IRQBALANCE_BANNED_CPUS
0000000000000000000000000000000000000000000000000000000000001110
[root@dell-pesc1425-02 ~]# irqbalance --debug
Package 0:  cpu mask is 00000005 (workload 0)
        Cache domain 0: cpu mask is 00000005  (workload 0) 
                CPU number 0  (workload 0)
                CPU number 2  (workload 0)
Package 1:  cpu mask is 0000000a (workload 0)
        Cache domain 1: cpu mask is 0000000a  (workload 0) 
                CPU number 1  (workload 0)
                CPU number 3  (workload 0)
[root@dell-pesc1425-02 ~]# cat /etc/sysconfig/irqbalance | grep -v '#'
ONESHOT=

IRQ_AFFINITY_MASK=00000002
[root@dell-pesc1425-02 ~]# irqbalance --debug
Package 0:  cpu mask is 00000005 (workload 0)
        Cache domain 0: cpu mask is 00000005  (workload 0) 
                CPU number 0  (workload 0)
                CPU number 2  (workload 0)
Package 1:  cpu mask is 0000000a (workload 0)
        Cache domain 1: cpu mask is 0000000a  (workload 0) 
                CPU number 1  (workload 0)
                CPU number 3  (workload 0)
  
Actual results:
when IRQBALANCE_BANNED_CPUS set, it seems like it always masks CPU0
IRQ_AFFINITY_MASK option in config file seems to be ignored completely

Expected results:
no IRQs recieved on processors specified via CPU mask

Additional info:

Comment 2 RHEL Program Management 2010-07-22 12:18:11 UTC
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 3 Neil Horman 2010-07-22 20:05:03 UTC
I don't see where you've illustrated the problem in the data above.  I see that you started irqbalance above with a BANNED_CPUS mask of 1110, but I don't see a subsequent cat of /proc/interrupts that shows we didn't start getting irq's on that cpu

Comment 4 Petr Beňas 2010-07-26 07:31:29 UTC
Sorry, the example data is confusing a bit.

The first part tries to illustrate that when no CPUs banned, irqbalance --debug lists four processors - 0,1,2,3. With 1110 mask, three CPUs are listed 1,2,3. Seems like CPU 0 was banned for irqbalance, althought the banned one should be number 3.

The middle part of example data is the most important. Note, mask is 0111. I count IRQs recieved on CPU3 with awk and the number is increasing. Three /proc/interupts cats follows.(CPU3 recieves interupts from eth0)

The following part of data continues with the first part begun. I change the mask to all options and irqbalance --debug keeps showing CPUS 1,2 and three. This makes me think CPU0 is banned, correct me if I'm wrong.

And the last part, cpu mask is set in config file and CPUs enlisted are all four. Again, this makes me think no CPU is masked.

Here is the link to automated test run. 
https://beaker.engineering.redhat.com/logs/2010/58/8358/14917/182923///TESTOUT.log

The automated test counts IRQs from /proc/interupts with awk the same way I did in example data.  Diff shows which interupts recieved.

I apologize again for the example data, I should have prepared it better. Feel free to ask if thre is something not clear for you.

Comment 5 Neil Horman 2010-07-26 11:33:22 UTC
wait, I see what you're doing wrong.  you're specifying the mask in binary, and irqbalance expects it in hex, but since binary looks like hex to the parser, its not throwing an error.  Change your mask values to a hex radix and see if that solves the problem.  When It does I'll use this bug to update the docs to clarify how to specify the mask.

Comment 6 Petr Beňas 2010-07-26 12:38:07 UTC
All right, when updating docs, please make also a note pointing to the different setup of IRQBALANCE_BANNED_CPUS variable and IRQ_AFFINITY_MASK in config file.
I was writing the mask completely wrong. The variable requires ones for CPUs to be masked, mask in config file requires zeros for masked processors. This is documented, but easy to oversight.

But I see the reported bug even if mask specified in hex. Pasting example data.

[root@dell-pe1955-01 ~]# export IRQBALANCE_BANNED_CPUS=00000008
[root@dell-pe1955-01 ~]# service irqbalance start
Starting irqbalance: [  OK  ]
[root@dell-pe1955-01 ~]# cat /proc/interrupts | tail -n+2 | egrep -v \
>  'LOC|NMI|TLB|MCP|CAL|RES' | awk '{ s+=$5 } END { print s}'
48759
[root@dell-pe1955-01 ~]# ping www.redhat.com
PING origin-www.redhat.com (10.4.127.15) 56(84) bytes of data.
64 bytes from 10.4.127.15: icmp_seq=1 ttl=123 time=70.9 ms
64 bytes from 10.4.127.15: icmp_seq=2 ttl=123 time=70.9 ms
64 bytes from 10.4.127.15: icmp_seq=3 ttl=123 time=70.3 ms

--- origin-www.redhat.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2767ms
rtt min/avg/max/mdev = 70.326/70.744/70.986/0.296 ms
LB|MCP|CAL|RES' | awk '{ s+=$5 } END { print s}'tail -n+2 | egrep -v  'LOC|NMI|TL
48803
[root@dell-pe1955-01 ~]# cat /proc/interrupts 
            CPU0       CPU1       CPU2       CPU3       
   0:        181         11         12         15   IO-APIC-edge      timer
   1:          3          2          1          2   IO-APIC-edge      i8042
   3:       1294         36        752        308   IO-APIC-edge      serial
   8:          0          0          1          0   IO-APIC-edge      rtc0
   9:          0          0          0          0   IO-APIC-fasteoi   acpi
  12:         34         33         33         33   IO-APIC-edge      i8042
  19:         74         77         74         79   IO-APIC-fasteoi   radeon
  20:          0          0          0          0   IO-APIC-fasteoi   uhci_hcd:usb3
  21:          0          0          0          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2
 192:       1045      11557        872      11593   IO-APIC-fasteoi   ioc0
 225:      37193      36855      37266      36804   PCI-MSI-edge      eth0
 NMI:        335        106        118        109   Non-maskable interrupts
 LOC:     209401     238476     141882     150138   Local timer interrupts
 SPU:          0          0          0          0   Spurious interrupts
 PMI:          0          0          0          0   Performance monitoring interrupts
 PND:          0          0          0          0   Performance pending work
 RES:       1927       1772       3285       2243   Rescheduling interrupts
 CAL:       7311       1355        183        217   Function call interrupts
 TLB:       7349       7001       9197       9160   TLB shootdowns
 TRM:          0          0          0          0   Thermal event interrupts
 THR:          0          0          0          0   Threshold APIC interrupts
 MCE:          0          0          0          0   Machine check exceptions
 MCP:         50         50         50         50   Machine check polls
 ERR:          3
 MIS:          0         ping www.redhat.com
PING origin-www.redhat.com (10.4.127.15) 56(84) bytes of data.
64 bytes from 10.4.127.15: icmp_seq=1 ttl=123 time=70.8 ms
64 bytes from 10.4.127.15: icmp_seq=2 ttl=123 time=70.6 ms
64 bytes from 10.4.127.15: icmp_seq=3 ttl=123 time=70.8 ms

--- origin-www.redhat.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2858ms
rtt min/avg/max/mdev = 70.673/70.797/70.882/0.235 ms
[root@dell-pe1955-01 ~]# cat /proc/interrupts 
            CPU0       CPU1       CPU2       CPU3       
   0:        181         11         12         15   IO-APIC-edge      timer
   1:          3          2          1          2   IO-APIC-edge      i8042
   3:       1389         36        843        308   IO-APIC-edge      serial
   8:          0          0          1          0   IO-APIC-edge      rtc0
   9:          0          0          0          0   IO-APIC-fasteoi   acpi
  12:         34         33         33         33   IO-APIC-edge      i8042
  19:         74         77         74         79   IO-APIC-fasteoi   radeon
  20:          0          0          0          0   IO-APIC-fasteoi   uhci_hcd:usb3
  21:          0          0          0          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2
 192:       1045      11559        872      11595   IO-APIC-fasteoi   ioc0
 225:      37230      36901      37307      36850   PCI-MSI-edge      eth0
 NMI:        335        106        118        109   Non-maskable interrupts
 LOC:     209598     238678     142005     150252   Local timer interrupts
 SPU:          0          0          0          0   Spurious interrupts
 PMI:          0          0          0          0   Performance monitoring interrupts
 PND:          0          0          0          0   Performance pending work
 RES:       1927       1772       3285       2243   Rescheduling interrupts
 CAL:       7311       1355        183        217   Function call interrupts
 TLB:       7350       7002       9199       9164   TLB shootdowns
 TRM:          0          0          0          0   Thermal event interrupts
 THR:          0          0          0          0   Threshold APIC interrupts
 MCE:          0          0          0          0   Machine check exceptions
 MCP:         50         50         50         50   Machine check polls
 ERR:          3
 MIS:          0

Comment 7 Neil Horman 2010-07-26 13:15:53 UTC
ok, yeah, thats wierd.  Looks like we might be using a miscomputed mask when doing irq assignments, one thats leaving bits turned on that should be set off.  Can I borrow dell-pe1955-01 so that I can tinker with this myself and track it back to its root cause?

Comment 8 Petr Beňas 2010-07-27 07:27:31 UTC
I don't think it happens only on dell-pe1955-01, probably any four processor machine will behave the same. But if you want to use this one, you can, I just returned it.

Comment 9 Neil Horman 2010-07-27 11:01:08 UTC
yeah, most likely, but since you have data from this system, I'd just as soon use the same one, to make sure that cache commonality and sibling pairing is identical.

Comment 10 RHEL Program Management 2010-08-18 21:32:16 UTC
Thank you for your bug report. This issue was evaluated for inclusion
in the current release of Red Hat Enterprise Linux. Unfortunately, we
are unable to address this request in the current release. Because we
are in the final stage of Red Hat Enterprise Linux 6 development, only
significant, release-blocking issues involving serious regressions and
data corruption can be considered.

If you believe this issue meets the release blocking criteria as
defined and communicated to you by your Red Hat Support representative,
please ask your representative to file this issue as a blocker for the
current release. Otherwise, ask that it be evaluated for inclusion in
the next minor release of Red Hat Enterprise Linux.

Comment 13 Petr Holasek 2011-07-26 10:47:16 UTC
Hi,

I did some research on machine dell-pe1955-01 which was mentioned by Petr and
from my point of view irqbalance handles IRQBALANCE_BANNED_CPUS variable
well.

[root@dell-pe1955-01 ~]# export IRQBALANCE_BANNED_CPUS=00000008
[root@dell-pe1955-01 ~]# service irqbalance restart
Stopping irqbalance: [  OK  ]
Starting irqbalance: [  OK  ]
[root@dell-pe1955-01 ~]# find /proc -name smp_affinity -exec grep . /dev/null {} \;
/proc/irq/223/smp_affinity:04
/proc/irq/192/smp_affinity:02
/proc/irq/19/smp_affinity:02
/proc/irq/20/smp_affinity:0f
/proc/irq/21/smp_affinity:0f
/proc/irq/15/smp_affinity:0f
/proc/irq/14/smp_affinity:0f
/proc/irq/13/smp_affinity:0f
/proc/irq/12/smp_affinity:02
/proc/irq/11/smp_affinity:0f
/proc/irq/10/smp_affinity:0f
/proc/irq/9/smp_affinity:0f
/proc/irq/8/smp_affinity:0f
/proc/irq/7/smp_affinity:0f
/proc/irq/6/smp_affinity:0f
/proc/irq/5/smp_affinity:0f
/proc/irq/4/smp_affinity:0f
/proc/irq/3/smp_affinity:05
/proc/irq/2/smp_affinity:ff
/proc/irq/1/smp_affinity:0f
/proc/irq/0/smp_affinity:ff

=> all active IRQ are steered to CPUs 0,1,2, so 3 is banned.

[root@dell-pe1955-01 ~]# cat /proc/interrupts | grep 223:
 223:     121772     120854     121767     121567   PCI-MSI-edge      eth0
[root@dell-pe1955-01 ~]# ping www.redhat.com
PING origin-www.redhat.com (10.4.127.15) 56(84) bytes of data.
64 bytes from 10.4.127.15: icmp_seq=1 ttl=124 time=161 ms
64 bytes from 10.4.127.15: icmp_seq=2 ttl=124 time=143 ms
64 bytes from 10.4.127.15: icmp_seq=3 ttl=124 time=152 ms
^C
--- origin-www.redhat.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2622ms
rtt min/avg/max/mdev = 143.204/152.380/161.143/7.343 ms
[root@dell-pe1955-01 ~]# cat /proc/interrupts | grep 223:
 223:     121837     120917     121832     121632   PCI-MSI-edge      eth0

for IRQ 223 is affinity set to 04 (CPU 2), but NIC interrupts
go to all 4 CPUs. So interrupt controller disregards affinity
set by irqbalance. I am going to try this test on another machines.


Note You need to log in before you can comment on or make changes to this bug.