Bug 217373

Summary: irqbalance segmentation fault
Product: [Fedora] Fedora Reporter: Bryce <root>
Component: irqbalanceAssignee: Neil Horman <nhorman>
Status: CLOSED WONTFIX QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 6CC: horsley1953, triage
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: bzcl34nup
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-05-06 16:57:55 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
patch to increase max interrupts on irqbalance for xen kernels none

Description Bryce 2006-11-27 16:13:33 UTC
I recently stuck FC6 on my x86_64 (actually a Pentium D 950 which supports 64 bit)
I cannot get irqbalance to run under a variety of kernels including the stock
FC6. (atm I'm using 2.6.19-rc6). I als tried recompiling it from the src.rpm

[root@emerald-x64 irqbalance]# ./irqbalance 
Segmentation fault
[root@emerald-x64 irqbalance]# gdb  ./irqbalance 
(gdb) run
Starting program: /usr/src/redhat/BUILD/irqbalance-1.13/irqbalance/irqbalance 

Program received signal SIGSEGV, Segmentation fault.
0x0000555555555ac9 in parse_proc_interrupts (incremental=0) at procinterrupts.c:108
108                                     interrupts[irqnumber].count += count;
(gdb) info locals
word = <value optimized out>
count = 37386
cursor = 0x7fff3aef4031 "         0   PCI-MSI-edge      eth0\n"
column = 1
irqnumber = 8411
file = (FILE *) 0x55555575b010
linebuffer = "8411:\000\000\000\000\000\00037386\000         0   PCI-MSI-edge  
   eth0\n\000\000\n\000sb1,\000ehci_hcd:usb5\n\000e36 clflush dts acpi mmx fxsr
sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx cid cx16
xtpr lahf_lm\n", '\0' <repeats 59 times>...
__PRETTY_FUNCTION__ = "parse_proc_interrupts"
(gdb) list
103                                     if (!ret)  /* non numeric end stuff */
104                                             irqnumber = MAX_INTERRUPTS-1; 
105                             /* then N columns of counts, where N is the
number of cpu's */
106                             } else if (column <= cpucount) {
107                                     sscanf(word,"%lli",&count);
108                                     interrupts[irqnumber].count += count;
109                             /* and lastly the names of the drivers */
110                             } else if ( ( (incremental==0) ||
(interrupts[irqnumber].type==IRQ_INACTIVE) ) 
111                                                             &&
column>cpucount+1)
112                                     classify_type(irqnumber, word);
(gdb) quit
The program is running.  Exit anyway? (y or n) y
[root@emerald-x64 irqbalance]# cat /proc/interrupts 
           CPU0       CPU1       
  0:    6685870          0   IO-APIC-edge      timer
  1:      12375          0   IO-APIC-edge      i8042
  6:          5          0   IO-APIC-edge      floppy
  7:          0          0   IO-APIC-edge      parport0
  8:          0          0   IO-APIC-edge      rtc
  9:          0          0   IO-APIC-fasteoi   acpi
 12:     133628          0   IO-APIC-edge      i8042
 14:      19918          0   IO-APIC-edge      ide0
 17:        791          0   IO-APIC-fasteoi   uhci_hcd:usb2
 18:          7          0   IO-APIC-fasteoi   uhci_hcd:usb3
 19:      45666          0   IO-APIC-fasteoi   uhci_hcd:usb4, HDA Intel
 20:     222106          0   IO-APIC-fasteoi   uhci_hcd:usb1, ehci_hcd:usb5
 21:          3          0   IO-APIC-fasteoi   ohci1394
 23:      36848          0   IO-APIC-fasteoi   libata
 24:   21900226          0   IO-APIC-fasteoi   libata
8411:      38270          0   PCI-MSI-edge      eth0
NMI:       3491       2771 
LOC:   13483510   13483783 
ERR:          0
[root@emerald-x64 irqbalance]# 


Ideas? I'm kinda stumped

Phil
=--=

Comment 1 Neil Horman 2006-11-27 18:09:44 UTC
I think this is a xen enabled kernel right?  this is probably a segfault that I
recently fixed under RHEL5.  Must have forgotten to do this in FC6.  Please try
the attached patch in irqbalance and confirm that it fixes the problem.  Thanks!


Comment 2 Neil Horman 2006-11-27 18:11:48 UTC
Created attachment 142197 [details]
patch to increase max interrupts on irqbalance for xen kernels

Comment 3 Bryce 2006-11-27 18:22:16 UTC
Sooo close but no cigar 8)
Actually I talked to ARjan about it (since he originally wrote the code) the
issu is the interrupts number In this case the kernel has assigned the utterly
WILD number of 8411 as an interrupt number to the ethernet controller. The way
irqbalance works, it won't find a slot for that as even with your patch it only
has slots for IRQ's from 0 up to 1023. Now On the good side, Arjan has actually
been busy rewriting this code though he's awaiting Intel's lawyers to give him
signoff to release it.

So in conclusion,..
IRQ number 8411 is WAY too big for irqbalance least in the manner it handles it
currently (even with your patch)
Replacement irqbalance code is due out in a day or so's time

Phil
=--=

Comment 4 Neil Horman 2006-11-27 18:41:00 UTC
Well, he's right, 8411 is a wild number.  Unfortunately, I'm waiting for arjan
to send me his new code too, and I'm not sure that it will solve this problem. 
To be honest, Barring arjan doing a linked list in the new irqbalance that isn't
directly indexed by irq number, this is likely going to be a WONTFIX.  Is there
anyway you can get that ethernet controller assigned a lower irq value?

Comment 5 Tom Horsley 2006-12-13 02:56:48 UTC
We just installed FC6 x86_64 on a whopping huge server at work (4 dual core
opterons), and I see what is probably the same irqbalance segfault
when the system boots (though it seems to work without it - don't know what
the long term implications of no irqbalance might be).


Comment 6 Bryce 2006-12-13 11:40:15 UTC
Having antagonized Arjan for ages thers a newer update that I've tried and works

http://www.irqbalance.org/download.php

Phil
=--=

Comment 7 Neil Horman 2006-12-13 14:37:49 UTC
yeah, I pushed -0.55-2 for fc6 last night.  This bug will close once the release
team gets it into the fc6 updates repository

Comment 8 Neil Horman 2006-12-13 21:44:14 UTC
pushed

Comment 9 Bug Zapper 2008-04-04 04:57:53 UTC
Fedora apologizes that these issues have not been resolved yet. We're
sorry it's taken so long for your bug to be properly triaged and acted
on. We appreciate the time you took to report this issue and want to
make sure no important bugs slip through the cracks.

If you're currently running a version of Fedora Core between 1 and 6,
please note that Fedora no longer maintains these releases. We strongly
encourage you to upgrade to a current Fedora release. In order to
refocus our efforts as a project we are flagging all of the open bugs
for releases which are no longer maintained and closing them.
http://fedoraproject.org/wiki/LifeCycle/EOL

If this bug is still open against Fedora Core 1 through 6, thirty days
from now, it will be closed 'WONTFIX'. If you can reporduce this bug in
the latest Fedora version, please change to the respective version. If
you are unable to do this, please add a comment to this bug requesting
the change.

Thanks for your help, and we apologize again that we haven't handled
these issues to this point.

The process we are following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

We will be following the process here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this
doesn't happen again.

And if you'd like to join the bug triage team to help make things
better, check out http://fedoraproject.org/wiki/BugZappers

Comment 10 Bug Zapper 2008-05-06 16:57:54 UTC
This bug is open for a Fedora version that is no longer maintained and
will not be fixed by Fedora. Therefore we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen thus bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.