Bug 622560

Summary: irqbalance unable to move IRQs for my NICs or HBAs
Product: Red Hat Enterprise Linux 6 Reporter: Barry Marson <bmarson>
Component: irqbalanceAssignee: Neil Horman <nhorman>
Status: CLOSED CURRENTRELEASE QA Contact: Evan McNabb <emcnabb>
Severity: high Docs Contact:
Priority: low    
Version: 6.0CC: anton, dshaks, jhladky, mwagner, nhorman, syeghiay, ttracy
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: irqbalance-0.55-27 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-11-10 20:41:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
untested patch recognice uninitalized affinty_hint code none

Description Barry Marson 2010-08-09 18:39:11 UTC
Description of problem:
Upgraded to RHEL6-snap10 and now irqbalance seems unable to move my IRQ's to different cpus.  They are all pinned to cpu0 even when cpu0 is cpu bound 100% soft interrupts for minutes on end.

Version-Release number of selected component (if applicable):
RHEL6.0-20100805.0

How reproducible:
shown on my bigi testbed.

Steps to Reproduce:
1.
2.
3.
  
Actual results:

IRQ lines 16-19 all show activity at cpu0 only

            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7       
   0:   10550638          0          0          0          0          0          0          0   IO-APIC-edge      timer
   1:          8          0          0          0          0          0          0          0   IO-APIC-edge      i8042
   3:          2          0          0          0          0          0          0          0   IO-APIC-edge    
   4:          2          0          0          0          0          0          0          0   IO-APIC-edge    
   8:          1          0          0          0          0          0          0          0   IO-APIC-edge      rtc0
   9:          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   acpi
  12:        204          0          0          0          0          0          0          0   IO-APIC-edge      i8042
  14:        159          0          0          0          0          0          0          0   IO-APIC-edge      ata_piix
  15:          0          0          0          0          0          0          0          0   IO-APIC-edge      ata_piix
  16:    6702403          0          0          0          0          0          0          0   IO-APIC-fasteoi   uhci_hcd:usb2, uhci_hcd:usb5, qla2xxx, qla2xxx
  17:   20456475          0          0          0          0          0          0          0   IO-APIC-fasteoi   qla2xxx, qla2xxx, eth2
  18:   16403105          0          0          0          0          0          0          0   IO-APIC-fasteoi   uhci_hcd:usb4, eth4, eth3
  19:    5070080          0          0          0          0          0          0          0   IO-APIC-fasteoi   uhci_hcd:usb3, eth5
  22:          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   hpilo
  23:          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   ehci_hcd:usb1
  24:       9827       1087          0          0          0          0          0          0   IO-APIC-fasteoi   eth0
  48:      22323          0          0          0          0          0          0          0   IO-APIC-fasteoi   cciss0
 NMI:        873        678        586        494        402        310        218        126   Non-maskable interrupts
 LOC:          0   10550199   10550127   10550053   10549983   10549899   10549839   10549766   Local timer interrupts
 SPU:          0          0          0          0          0          0          0          0   Spurious interrupts
 PMI:          0          0          0          0          0          0          0          0   Performance monitoring interrupts
 PND:          0          0          0          0          0          0          0          0   Performance pending work
 RES:     524127    1456325    1382514    1408790    1431045    1470127    1428419    1382738   Rescheduling interrupts
 CAL:      35098    2140300    2063262    2119711    2115830    2176595    2105168    2024001   Function call interrupts
 TLB:       1470       1577       1709       1626       1591       1193       1220       1401   TLB shootdowns
 TRM:          0          0          0          0          0          0          0          0   Thermal event interrupts
 THR:          0          0          0          0          0          0          0          0   Threshold APIC interrupts
 MCE:          0          0          0          0          0          0          0          0   Machine check exceptions
 MCP:         36         36         36         36         36         36         36         36   Machine check polls
 ERR:          7
 MIS:          0


Expected results:


Additional info:

Comment 1 Neil Horman 2010-08-09 19:23:35 UTC
Created attachment 437687 [details]
untested patch recognice uninitalized affinty_hint code

looks like intel messed this up upstream too.  we don't properly recognize when an affinity_hint file isn't initalized, and just take its value to be accurate, even if the affinity hint says to mark all cpus as eligible for interrupts, which defeats the purpose of irqbalance.

This patch should correct that.  barry, if you could test it out please that would be great.  Anton, if it works, if you could check it into RHEL, I'll put it upstream asap.  Thanks!

Comment 2 Barry Marson 2010-08-10 13:05:46 UTC
Verfied that patch from comment #1, which was tested with rpm built at: http://brewweb.devel.redhat.com/brew/taskinfo?taskID=2667962

Works.

Thanks,
Barry

Comment 3 Neil Horman 2010-08-10 13:17:48 UTC
Thanks Barry!

Anton, this is a problem in RHEL & Upstream.  I'm committing it upstream and I've asked about getting this approved for 6.0GA.  Please check it in as soon as possible.  Thanks!

Comment 4 Neil Horman 2010-08-10 15:23:33 UTC
Heard aarapov was on PTO, so I can handle this.

Comment 8 releng-rhel@redhat.com 2010-11-10 20:41:39 UTC
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.