Bug 268561

Summary: irqbalance seems to be a no-op on RHEL5.1snap3
Product: Red Hat Enterprise Linux 5 Reporter: Nick Dokos <nicholas.dokos>
Component: irqbalanceAssignee: Neil Horman <nhorman>
Status: CLOSED NOTABUG QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 5.1   
Target Milestone: ---   
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-03-03 18:34:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
AIM7 fserver @ 10K: throughput and interrupt counts
none
AIM7 fserver graphs - various combos of kernel/irqbalance, 16- and 32-way none

Description Nick Dokos 2007-08-30 19:20:20 UTC
Description of problem:
AIM7 fserver runs on RHEL5.1snap3 (as well as on beta1) show a severe
performance regression relative to RHEL5. The system is a 32-way IA64 system
(rx8640, Montecito) with 128Gb of memory and 288 filesystems. Although the
system is NUMA-capable, these experiments use the default (interleaved)
configuration.

Part of the problem is that irqbalance seems to do nothing on RHEL5.1snap3 on
this system. Replacing it with the irqbalance fromn RHEL5 gives a much more even
interrupt distribution (with much lower counts per CPU) and a 6% bump in the
throughput (this was just a point run at a load of 10000 jobs - I am now doing
full runs to get a clearer picture). See the attachment for the actual numbers.

irqbalance does not account for all of the regression, but it seems to account
for a substantial chunk of it. It is encouraging to note that snap3 is much
better than beta1, although still worse than RHEL5.

Version-Release number of selected component (if applicable):

irqbalance-0.55-6.el5.ia64.rpm

How reproducible:

Always.

Steps to Reproduce:
1. Run AIM7 fserver on the system described above - check /proc/interrupts
2. Replace irqbalance with the one from RHEL5 and run it again - check
/proc/interrupts.
3.
  
Actual results:

See attachments.

Expected results:


Additional info:

Comment 1 Nick Dokos 2007-08-30 19:20:20 UTC
Created attachment 181661 [details]
AIM7 fserver @ 10K: throughput and interrupt counts

Comment 2 Neil Horman 2007-08-30 20:17:04 UTC
can you provide me with the specific version of irqblance that you are running,
along with a sysreport so that I can tell what kind of interrupts these are (as
well as see what your /etc/sysconfig/irqbalance file looks like.  

Clearly, It appears that irqbalance is really spreading interrupts out on your
system, which doesn't normally happen.

Can you also check after the system boots, that irqbalance is continuing to run?
 Use both a 
service irqbalance status 
and
ps -e  | grep irqbalance

It may be that you hit an irqbalance bug or legitimate exit condition which
resulted in irq affinity getting set to 0xffffffff for all  (or most cpus), hece
your abnormally even distribution.  Thanks!


Comment 3 Nick Dokos 2007-08-30 21:33:09 UTC
o The interrupt counts I've shown are from the 12 QLogic FC adapters (actually 6
dual-port ones) with the qla2xxx driver. Here is a typical line from 
proc/interrupts except I've squashed spaces

 56: 3294 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 23 0 8884564 0 0 0 0 0 24 0 0 0 0 0 0
 IO-SAPIC-level  qla2xxx

o irqbalance was enabled: chkconfig said

  irqbalance     	0:off	1:off	2:on	3:on	4:on	5:on	6:off

and was running (checked with ps awlx).

o The RPM package says

  irqbalance-0.55-6.el5.ia64.rpm

Is that the version number you are looking for? If not, let me know how to
get it and I'll get back to you.

o I'm not sure what you mean by "really spreading interrupts" because to
me it does not seem to spread them out at all: just about all the interrupts
from a particular adapter port seem to go to one CPU (e.g. the line above
shows CPU 19 getting almost all of them).

When I replaced irqbalance with the RHEL5 version (the RPM package that
it came from originally was irqbalance-1.13-9.el5.ia64.rpm - why did the
number go backwards?), interrupts *are* spread out to many CPUs: that's
a good thing - we get improved throughput that way.

o /etc/sysconfig/irqbalance has not been changed from what was installed:


# irqbalance is a daemon process that distributes interrupts across
# CPUS on SMP systems.  The default is to rebalance once every 10
# seconds.  There is one configuration option:
#
# ONESHOT=yes
#    after starting, wait for a minute, then look at the interrupt
#    load and balance it once; after balancing exit and do not change
#    it again.
ONESHOT=

#
# IRQ_AFFINITY_MASK
#    64 bit bitmask which allows you to indicate which cpu's should
#    be skipped when reblancing irqs.  Cpu numbers which have their 
#    corresponding bits set to zero in this mask will not have any
#    irq's assigned to them on rebalance
#
#IRQ_AFFINITY_MASK=



Comment 4 Neil Horman 2007-08-30 23:38:36 UTC
based on email conversation I've looked at the data again, and its the RHEL5 GA
version of Irqbalance thats not balancing interrupts properly.  the 5.1 version
seems to be balancing them just fine (isloation irqs to cpus to minimize
cacheline misses).  Performance differential notwithstanding, this appears to me
to be NOTABUG.  There is certainly a performance issue here that we can look at,
but unless you can provide additional evidence that irqbalance is not doing the
job it was designed to do, this is NOTABUG.

Comment 5 Nick Dokos 2008-03-03 17:55:57 UTC
I ran AIM7 fserver at 32-way and 16-way, in the interleaved memory
configuration, on an RX8640. The runs were on
the 5.1 kernel but with the 5.0 irqbalance (which is much more
aggressive about balancing IRQs across all CPUs) and on the 5.0 kernel
with the 5.1 irqbalance. The graph shows a comparison of these runs
with the "normal" runs: 5.1 kernel with 5.1 irqbalance and 5.0 kernel
with the 5.0 irqbalance.

The results are mixed (see the graph in the attachment):

o There is still a regression between 5.1 and 5.0, larger at 32-way
than at 16-way.

o The 5.1 kernel/5.0 irqbalance combination helps the
situation marginally, but certainly not nearly enough to make
up for the regression noted in point 1 above.

o However, the 5.0 kernel/5.1 irqbalance combination is *better*
than the 5.0 kernel/5.0 irqbalance combination and therefore
*increases* the regression margin noted in point 1 above at 32-way,
but is worse than the "normal" combination at 16-way.

The conclusion: there is no silver bullet here; irqbalance cannot
be the whole culprit of the regression.

What's the way forward? Close this BZ and open a different one to
keep track of the regression?


Comment 6 Nick Dokos 2008-03-03 17:57:34 UTC
Created attachment 296645 [details]
AIM7 fserver graphs - various combos of kernel/irqbalance, 16- and 32-way

Comment 7 Neil Horman 2008-03-03 18:34:58 UTC
Yes, a new bug would be the way forward. But before you open something on it,
does this hardware by any chance balance do irq load balancing in hardware at
all?  SGI brought this up a while back, and is currently trying to come up with
a way to detect systems with hw balanced irq routing, and disable irqbalance in
those casees.  What is your baseline performance if you just don't have
irqbalance enabled?

Also, what kind of benchmark is this?  Do you have a pointer documenting the
contents of this test?  Is it the sort of workload where you expect incomming
interrupts to have a significant impact?  I ask because looking at your graph, I
don't really see any strong correlation between which version of irqbalance you
use, and the throughput you recieve.  Comparing the same kernel between the 5.0
and 5.1 versions or irqbalance shows very close performance metrics.  About the
larest discrepancy I see is between RHEL5GA with its own irqbalance vx RHEL5GA +
the 5.1 irwbalance.  I see a momentary discrepancy of approximately 5000
jobs/min, which while large converges to euqal performance as the workload
increases.  Unless these jobs are bound by io and we're waiting on an interrupt
of some sort, I'm somewhat skeptical that irqbalance has as much to do with this
as you think.

Comment 8 Nick Dokos 2008-03-04 20:06:40 UTC
No, the hardware doesn't do any irq balancing. I was trying to find the
data that we had with/without irqbalance but I haven't got the whole story
together yet (short on time: I'll be out of the office for the next two
weeks, so I'm not going to be able to put it together before then).

fserver is one of the four AIM7 standard workloads: it's the most I/O
intensive. It attempts to simulate a large fileserver environment,
with some CPU intensive tasks and some disk I/O intensive tasks (mostly
asynchronous but with some synchronous IO mixed in), plus some tasks
doing network I/O. They are very much I/O bound: the reason we use a
large number of disks is that we are trying to keep the I/O from
becoming a bottleneck and masking the behavior of the kernel: with
less disk, the curves reach a maximum throughput and quickly droop
down to almost nothing.

We had measured with and without irqbalance running in RHEL5.0 (I
think) on an interleaved memory configuration.  There, we had seen
significant improvement *with* (the old) irqbalance. But I do agree
that the difference has gotten much less significant with the later
kernel (and as I mentioned, at least in one case, the 5.0 kernel
behaved *better* with the 5.1 irqbalance - I believe that's the one
you are pointing out: the green squares graph vs the tan triangles
graph). So it's clear that the situation is more complicated than we
thought at first.

For the 5.1 kernel, it's probably the case on this hardware, that running
irqbalance with the --oneshot option, just to make sure that the interrupts
don't all end up on CPU 0, might be enough. I think that the 5.0 kernel
benefitted from the more active irqbalance on this benchmark, but I still owe
you the evidence for that.