Bug 911649
Summary: | HP SmartArrray hpsa module interrupts are imbalanced | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Roland Friedwagner <roland.friedwagner> | ||||||||||||||
Component: | irqbalance | Assignee: | Petr Holasek <pholasek> | ||||||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||||||
Priority: | unspecified | ||||||||||||||||
Version: | 6.3 | CC: | dhoward, ewwhite, roland.friedwagner, thenzl, viktor.villafuerte | ||||||||||||||
Target Milestone: | rc | Keywords: | Reopened | ||||||||||||||
Target Release: | --- | ||||||||||||||||
Hardware: | Unspecified | ||||||||||||||||
OS: | Linux | ||||||||||||||||
Whiteboard: | |||||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||
Clone Of: | Environment: | ||||||||||||||||
Last Closed: | 2013-09-19 09:59:11 UTC | Type: | Bug | ||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||
Embargoed: | |||||||||||||||||
Attachments: |
|
Created attachment 697825 [details]
spec file changes
This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux. Hello, wrong classifying issue was fixed in new irqbalance in RHEL 6.4. If your issue wasn't fixed by update, feel free to reopen this bug. thanks, Petr reopened because new irqbalance (irqbalance-1.0.4-3.el6.x86_64) in RHEL 6.4 (Bug 878708) does not balance any irq. I am not sure if your problem can be considered as duplicate of bz878708. If you think so, let me know and I'll close the bz as duplicate, otherwise send me an output after a few minutes running of following commands, please: # service irqbalance stop # irqbalance --debug Thank you, Petr Holasek I currently im not authorized to lookup bz878708. But I've collected the debug output from irqbalance as requested - uploaded as 2013-09-19_irqbalance-1.0.4-4.el6_4.x86_64_debug.log - and find that the hpsa interrupts are handled by irqbalande now. thx®ards Roland Created attachment 799808 [details]
irqbalance --debug log
Roland, thank you for cooperation! Based on your feedback, I am closing BZ as CURRENTRELEASE. regards, Petr I continue to see unbalanced hpsa controller interrupts under irqbalance-1.0.4-10.el6 on EL6.6. Was this patch ever incorporated? Edmund, could you please attach output of /proc/interrupts and one minute of irqbalance debug? (Can be collected by # service irqbalance stop && irqbalance --debug > irqbalance_debug) Created attachment 982840 [details]
Unbalanced hpsa /proc/interrupts
Created attachment 982842 [details]
irqbalance debug output
Thank you for outputs. hpsa interrupts were classified right as the "storage", but balanced on the cache level instead of core level that shouldn't be the problem though. Could you please add output of "# for i in $(seq 0 100); do grep . /proc/irq/$i/smp_affinity /dev/null 2>/dev/null; done" after irqbalance daemon is running for a while? The smp_affinity output is below: # for i in $(seq 0 100); do grep . /proc/irq/$i/smp_affinity /dev/null 2>/dev/null; done /proc/irq/0/smp_affinity:ffffffff,ffffffff /proc/irq/1/smp_affinity:00000000,0003f03f /proc/irq/2/smp_affinity:ffffffff,ffffffff /proc/irq/3/smp_affinity:00000000,0003f03f /proc/irq/4/smp_affinity:00000000,0003f03f /proc/irq/5/smp_affinity:00000000,0003f03f /proc/irq/6/smp_affinity:00000000,00ffffff /proc/irq/7/smp_affinity:00000000,0003f03f /proc/irq/8/smp_affinity:00000000,0003f03f /proc/irq/9/smp_affinity:00000000,0003f03f /proc/irq/10/smp_affinity:00000000,0003f03f /proc/irq/11/smp_affinity:00000000,00ffffff /proc/irq/12/smp_affinity:00000000,0003f03f /proc/irq/13/smp_affinity:00000000,00ffffff /proc/irq/14/smp_affinity:00000000,00ffffff /proc/irq/15/smp_affinity:00000000,00ffffff /proc/irq/16/smp_affinity:00000000,0003f03f /proc/irq/17/smp_affinity:00000000,0003f03f /proc/irq/20/smp_affinity:00000000,0003f03f /proc/irq/21/smp_affinity:00000000,0003f03f /proc/irq/72/smp_affinity:00000000,00000001 /proc/irq/73/smp_affinity:00000000,00000001 /proc/irq/74/smp_affinity:00000000,00000001 /proc/irq/75/smp_affinity:00000000,00000002 /proc/irq/76/smp_affinity:00000000,00000004 /proc/irq/77/smp_affinity:00000000,00000008 /proc/irq/78/smp_affinity:00000000,00000010 /proc/irq/79/smp_affinity:00000000,0003f03f /proc/irq/80/smp_affinity:00000000,0003f03f /proc/irq/81/smp_affinity:00000000,0003f03f /proc/irq/82/smp_affinity:00000000,0003f03f /proc/irq/83/smp_affinity:00000000,0003f03f /proc/irq/84/smp_affinity:00000000,0003f03f /proc/irq/85/smp_affinity:00000000,0003f03f /proc/irq/86/smp_affinity:00000000,0003f03f /proc/irq/87/smp_affinity:00000000,0003f03f /proc/irq/88/smp_affinity:00000000,0003f03f /proc/irq/89/smp_affinity:00000000,00000020 /proc/irq/90/smp_affinity:00000000,00001000 /proc/irq/91/smp_affinity:00000000,00002000 /proc/irq/92/smp_affinity:00000000,00008000 /proc/irq/93/smp_affinity:00000000,00010000 Thanks for the output. It seems that irqbalance does a good job there and problem is in hardware - namely APIC. Kernel would also refuse to set incorrect smp_affinity, but anyway you can fill the bug against hpsa driver. I was just upgrading another system, and notice that this was not an issue on 2.6.32-431.17.1.el6, but seems to have started with 2.6.32-504.3.3.el6. I can't tell if this is due to an HPSA driver change or the kernel revision. Created attachment 983964 [details]
Balanced HPSA /proc/interrupts
Thank you. I'd recommend you to fill bug against kernel/StorageDrivers and refer this bugzilla in the report, e.g. provide information about correctly set smp_affinity. Cau Peto :) just to clarify this... I've got very similar problem where updating caused the same problems as mentioned above. Also I was forced to downgrade kernel back to 2.6.32-431.17.1.el6.x86_64 which seemed to have helped with running down CPUs. However, I did affinity checks as suggested (above) and the results still seem bit odd to me. The only difference there is the version of irqbalance package. 1) irqbalance-1.0.4-10.el6.x86_64 2.6.32-431.17.1.el6.x86_64 for i in $(seq 0 100); do grep . /proc/irq/$i/smp_affinity /dev/null 2>/dev/null; done /proc/irq/0/smp_affinity:ffffffff /proc/irq/1/smp_affinity:000000ff /proc/irq/2/smp_affinity:ffffffff /proc/irq/3/smp_affinity:000000ff /proc/irq/4/smp_affinity:000000ff /proc/irq/5/smp_affinity:000000ff /proc/irq/6/smp_affinity:000000ff /proc/irq/7/smp_affinity:000000ff /proc/irq/8/smp_affinity:000000ff /proc/irq/9/smp_affinity:000000ff /proc/irq/10/smp_affinity:000000ff /proc/irq/11/smp_affinity:000000ff /proc/irq/12/smp_affinity:000000ff /proc/irq/13/smp_affinity:000000ff /proc/irq/14/smp_affinity:000000ff /proc/irq/15/smp_affinity:000000ff /proc/irq/17/smp_affinity:000000ff /proc/irq/20/smp_affinity:000000ff /proc/irq/22/smp_affinity:000000ff /proc/irq/23/smp_affinity:000000ff /proc/irq/50/smp_affinity:000000ff /proc/irq/51/smp_affinity:000000ff /proc/irq/52/smp_affinity:000000ff /proc/irq/53/smp_affinity:000000ff /proc/irq/54/smp_affinity:000000ff /proc/irq/55/smp_affinity:000000ff /proc/irq/56/smp_affinity:000000ff /proc/irq/57/smp_affinity:000000ff /proc/irq/58/smp_affinity:00000008 /proc/irq/59/smp_affinity:00000008 /proc/irq/60/smp_affinity:00000008 /proc/irq/61/smp_affinity:00000008 /proc/irq/62/smp_affinity:00000008 /proc/irq/63/smp_affinity:00000008 /proc/irq/64/smp_affinity:00000008 /proc/irq/65/smp_affinity:00000008 2) irqbalance-1.0.4-9.el6_5.x86_64 2.6.32-431.17.1.el6.x86_64 for i in $(seq 0 100); do grep . /proc/irq/$i/smp_affinity /dev/null 2>/dev/null; done /proc/irq/0/smp_affinity:ffffffff /proc/irq/1/smp_affinity:000000ff /proc/irq/2/smp_affinity:ffffffff /proc/irq/3/smp_affinity:000000ff /proc/irq/4/smp_affinity:000000ff /proc/irq/5/smp_affinity:000000ff /proc/irq/6/smp_affinity:000000ff /proc/irq/7/smp_affinity:000000ff /proc/irq/8/smp_affinity:000000ff /proc/irq/9/smp_affinity:000000ff /proc/irq/10/smp_affinity:000000ff /proc/irq/11/smp_affinity:00000088 /proc/irq/12/smp_affinity:000000ff /proc/irq/13/smp_affinity:000000ff /proc/irq/14/smp_affinity:000000ff /proc/irq/15/smp_affinity:000000ff /proc/irq/17/smp_affinity:000000ff /proc/irq/20/smp_affinity:00000022 /proc/irq/22/smp_affinity:00000088 /proc/irq/23/smp_affinity:00000044 /proc/irq/50/smp_affinity:00000088 /proc/irq/51/smp_affinity:00000088 /proc/irq/52/smp_affinity:00000022 /proc/irq/53/smp_affinity:00000022 /proc/irq/54/smp_affinity:00000044 /proc/irq/55/smp_affinity:00000022 /proc/irq/56/smp_affinity:00000088 /proc/irq/57/smp_affinity:00000022 /proc/irq/58/smp_affinity:00000010 /proc/irq/59/smp_affinity:00000020 /proc/irq/60/smp_affinity:00000040 /proc/irq/61/smp_affinity:00000080 /proc/irq/62/smp_affinity:00000001 /proc/irq/63/smp_affinity:00000002 /proc/irq/64/smp_affinity:00000004 /proc/irq/65/smp_affinity:00000008 I've detailled this in Bug #1185890, but it is not public yet. No worries, I'll keep an eye on that one. thanks v Ahoj Viktore :) Different smp_affinities across versions are probably caused by bug bz1170351 ( Broken irqbalance deepest cache backport) which is going to be fixed in RHEL6.7. Edmund, just another try: Are interrupts distributed among processors when you run irqbalance with --hintpolicy=exact option? Petr, I'll try that, but the issue was narrowed down to bz1170351 in my other bz1185890. Downgrading irqbalance to irqbalance-1.0.4-9.el6_5.x86_64 is a temporary resolution. The permanent fix is slated for release in 6.7. Running the irqbalance with --hintpolicy=exact did not change the interrupt distribution. |
Created attachment 697824 [details] Add hpsa to storage_modules list in classify.c Description of problem: Hewlett Packard is moving from cciss to hpsa driver module for their Smart Array Controller Chip. The new module name (hpsa) is not in the storage_modules list in classify.c and all SmartArray Controler Interrupts are handled by CPU0 Version-Release number of selected component (if applicable): 0.55-35 Actual results: $ grep hpsa /proc/interrupts 52: 253368 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge hpsa0 53: 25294 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge hpsa0 54: 25360 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge hpsa0 55: 19647 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge hpsa0 56: 8907 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge hpsa0 57: 5888 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge hpsa0 58: 210612 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge hpsa0 59: 22698 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge hpsa0 60: 78013564 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge hpsa1 61: 2020926 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge hpsa1 62: 1562522 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge hpsa1 63: 760476 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge hpsa1 64: 271462 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge hpsa1 65: 195098 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge hpsa1 66: 5489932 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge hpsa1 67: 232921 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge hpsa1 Expected results: irq counters increase on all CPUs Additional info: Patch simply adds "hpsa" to storage_modules list