Red Hat Bugzilla – Bug 240392
irq smp_affinity should be all cpus by default
Last modified: 2008-02-27 14:58:17 EST
LTC Owner is: email@example.com
LTC Originator is: firstname.lastname@example.org
If an irq's smp_affinity is not set to all cpus, then its possible for a single
badly behaving -rt thread to hang a system.
This occurs because the IRQ threads run w/ the smp affinity mask that the irq is
set for. So not only is the interrupt targetted to the smp_affinity mask, but
the irq process must run on those cpus as well.
By default, smp affinity masks are set to a single cpu, which means if a high
priority realtime process runs on that cpu and does not yield, it will block the
irq from being processed, even if there are idle cpus in the system. If the irq
being blocked is the SCSI disk, that can result in a system lock up.
By setting the irq smp_affinity mask to all cpus, you allow the IRQ threads to
run on the idle cpus. This avoids the issue.
IBM has created a irqbindall init script as part of its realtime utilities
package. RedHat should consider including this init script to avoid this issue.
Created attachment 154889 [details]
irqbindall init script
----- Additional Comments From email@example.com (prefers email at firstname.lastname@example.org) 2007-05-21 17:03 EDT -------
From the discussion on the call, LTC is going to try to re-work irqbindall.sh
into the irqbalance one shot script.
Created attachment 155146 [details]
----- Additional Comments From email@example.com (prefers email at firstname.lastname@example.org) 2007-05-21 22:08 EDT -------
First pass changes to irqbalance init script
First pass change to the irqbalance init script to provide irqbindall.sh like
1) Add reset_mask function that sets the irqs smp_affinity mask back to all
cpus when irqbalance is stopped
2) Check the /etc/sysconfig/irqbalance config file for RT_BINDALL=yes and avoid
starting irqbalance in that case.
I'm sure this isn't completely optimal, but gives us something to start with.
Neil: Your thoughts?
I think the idea for the feature is a fine one, but I think mechanically its
just bit shy of the mark.
1) the use of RT_BINDALL messes with the way the service infrastrucuture
behaves. If I understand the purpose of the RT_BINDALL variable correctly, its
intent is to prevent irqbalance from actually running, and force it to not make
any changes to any irq's cpu affinity, presumably so that an external script can
configure the the cpu affinity for all the interrupts. If that is the case,
then I think the appropriate course of action is to use the ONESHOT environment
variable to indicate that irqbalance wil run once and exit (and fix up the the
service locks appropriately), and then use a second environment variable to
point to a set of files that define the irqbalance per-irq affinity.
2) The use of a 32 bit all f's mask isnt sufficient for the cpu count we
support. IIRC we support up to 128 cpus at this point, and that needs to be
reflected in the affinity mask.
I'll fix up these points and provide you a patch for testing
Created attachment 155155 [details]
new patch to handle one shot specific settings and reset of affinity
Here you are. This should handle what you are looking for. It corrects the
reset_mask function such that it can handle more than 32 cpuss, and it adds a
SPECFIC_MASK environment variable, which is used as follows:
1) If ONESHOT is _not_ set SPECIFIC_MASK is ignored
2) If ONESHOT is set but SPECIFIC_MASK is _not_ set, ONESHOT behavior is
consistent with previous versions of irqbalance
3) If ONESHOT and SPECIFIC_MASK is set, /etc/sysconfig/irqbalance_masks/ is
searched for files named <irqnum>.mask. For each file found, if <irqnum> is
present in /proc/irq/ then the contents of <irqnum>.mask are applied to
This should give us the flexibility to do oneshot specific mask setting without
needing to add an external program to the irqbalance package.
I've not tested this yet, so there might be some minor syntax bugs in this, but
nothing major. If you could test this out and let me know the results, and if
you are satisfied with this solution, I'll be happy to check this into
irqbalance (along with some documentation describing the new option)
----- Additional Comments From email@example.com (prefers email at firstname.lastname@example.org) 2007-05-22 20:28 EDT -------
So w/ the -rt patch, I'm getting lots of:
etc/init.d/irqbalance_new: line 62: echo: write error: Value too large for
defined data type
So the mask calculation doesn't quite look ok for the -rt kernel.
----- Additional Comments From email@example.com (prefers email at firstname.lastname@example.org) 2007-05-22 21:28 EDT -------
Oh, goofy, apparently just the high bit is invalid.
#echo 70000000,00000000,00000000,00000000,00000000,00000000,00000000,0000000f >
works fine. But 8.... and above fails.
----- Additional Comments From email@example.com (prefers email at firstname.lastname@example.org) 2007-05-22 21:46 EDT -------
I think replacing the for look in reset_mask() to the following fixes the first
for i in `ls /proc/irq`
| sed s/[0-f]/f/g
| sed s/f/7/
| sed s/
echo $i - $mask
echo $mask > /proc/irq/$i/smp_affinity
Now, with regards to the set_specific_masks() string, do you oppose having one
SPECIFIC_MASK=<long mask> string that we format as we do in the code snippit
above? I think that will simplify populating the masks dir as required in your
Regarding the reset_mask issue, you're right, I neglected the commas in the mask
specification completely. Thanks you. I'll roll your fix into the patch.
With regards to the SPECIFIC_MASK setting, I would prefer to offer customers the
option to individually set their irq masks so that they could set a variety if
they wanted to. I would be willing to add an all.mask file that we could check
for if you wanted to specify the same mask for all irq's. would that be agreeable?
----- Additional Comments From email@example.com (prefers email at firstname.lastname@example.org) 2007-05-22 21:25 EDT -------
Actually, even w/ mainline, on x86_64 I'm getting those
line 62: echo: write error: Value too large for defined data type
It seems the smp_affinity value on x86_64 looks like:
# cat /proc/irq/0/smp_affinity
And the following works fine:
echo 00000000,00000000,00000000,00000000,00000000,00000000,00000000,0000000f >
However, just replacing the first zero w/ a f breaks it:
# echo f0000000,00000000,00000000,00000000,00000000,00000000,00000000,0000000f >
bash: echo: write error: Value too large for defined data type
So it seems you can't use the character count as your mask length. Hmm.
----- Additional Comments From email@example.com (prefers email at firstname.lastname@example.org) 2007-05-22 14:50 EDT -------
Neil: Thanks for the review. The RT_BINDALL in my patch was really just a "don't
touch it" flag. No other application was planned, but I agree that more advanced
features then what I was considering (like those found in your patch) could be nice.
So few comments on your patch:
1) I need to check it against -rt, as I think -rt causes the
/proc/irq/num/smp_affinity output (not input) to be more then a simple mask.
2) Not sure how the sysconfig/irqbalance_masks/ files handle the 32 bit or 64
bit gotcha you caught in my patch.
3) I worry it might be somewhat over complicated, as populating the
irqbalance_masks might be machine specific (or do we by default put 255 files
Would a simpler solution be instead of using the masks dir, could we just have
SPECIFIC_MASK be a mask string that is applied to all IRQs? Sort of like just
the inverse of the IRQ_AFFINITY_MASK value.
I'll give your patch a test and let you know how it goes.
----- Additional Comments From email@example.com (prefers email at firstname.lastname@example.org) 2007-05-22 21:47 EDT -------
Oh, you can probably yank the debug "echo $i - $mask" from the snippit above, I
forgot to pull that.
----- Additional Comments From email@example.com (prefers email at firstname.lastname@example.org) 2007-05-23 14:41 EDT -------
Neil: Your suggestion sounds fine by me! Let me know when the patch is ready and
I'll start testing it.
Created attachment 155291 [details]
new irabalance patch
New version of the patch.
I've redone the patch, and incorporated your fixes to reset_masks (as well as
consolidated the sed expressions a bit). Thank you for that work.
Regarding the use of SPECIFIC_MASK as a fixed mask, I understand your desire to
make that as simple as possible to do a one shot set, but I'm concerned that
others might want to have some extra flexibility in that feature (i.e. isolate
high volume interrupts to one cpu set, low latency interrupts to another single
cpu, and isolate the remaining cpus from interrupts entirely).
What I've done in this version of the patch is add logic for you to write a
default.mask file to /etc/sysconfig/irqbalance_masks. This file will be echoed
into any irq number for which there is no <irqnum>.mask file. This way you can
set a default mask for all irqs, and only override those irqs for which you
wish a different mask.
Give it a shot and let me know what you think. Thanks!
----- Additional Comments From email@example.com (prefers email at firstname.lastname@example.org) 2007-05-23 18:20 EDT -------
Looks good. Although two minor nits:
1) On irqbalance start, I don't see the [OK] message
2) Some interrupts that do not have devices bound (irq 2 in my case) print:
cat: write error: Input/output error
For #2, probably want to redirect error messages from cat to /dev/null.
Otherwise this patch provides sufficient infrastructure to do what we need.
The next question is default packaging for the RHEL5-rt distro. I think we'd
prefer if there was a default.mask file and it was: 7fff.... for robustness to
avoid irq starvation.
How does that vibe on the RH side?
Sounds reasonable to me. I'll fix up the patch with your suggestions/fixes and
attach a complete rpm for your testing.
Created attachment 155390 [details]
test srpm for irqbalance
Hey there, heres a src rpm of the new irqbalance package for your testing.
Please build it and give it a try. It should dynamically create a default.mask
file for you on install based on the size of the smp_affinity files on a given
system. Please confirm that the built rpm installs and uninstalls correctly
for you, and that starting and stopping the service results in expected
behavior. If you need me to build a binary rpm for you, let me know and I
----- Additional Comments From email@example.com (prefers email at firstname.lastname@example.org) 2007-05-24 17:39 EDT -------
The src rpm built and functioned properly.
Just a few last nits:
1) On irqbalance stop, I get:
/etc/init.d/irqbalance: line 50: echo: write error: Input/output error
/etc/init.d/irqbalance: line 50: echo: write error: Input/output error
sed: can't read /proc/irq/prof_cpu_mask/smp_affinity: Not a directory
/etc/init.d/irqbalance: line 50: /proc/irq/prof_cpu_mask/smp_affinity: Not a
2) Also /etc/sysconfig/irqbalance needs a comment about the SPECIFIC_MASKS
options. Something like:
# When ONESHOT=yes, this option use the mask files found in
# /etc/sysconfig/irqbalance_mask/ directory.
Ok, I've made your last fixups, and the new rpm with your included feature is
available as irqbalance-0.55-4.el5rt. Thanks!
----- Additional Comments From email@example.com (prefers email at firstname.lastname@example.org) 2007-05-25 16:19 EDT -------
Neil, Thanks so much for your help and expertise here! I really appreciate it!
Clark/Tim: is there an ETA for this update to land on the partner repo?
----- Additional Comments From email@example.com (prefers email at firstname.lastname@example.org) 2007-06-04 17:10 EDT -------
Just pulled new irqbalance from the yum repo. Verifying everything is good.
What |Removed |Added
------- Additional Comments From email@example.com (prefers email at firstname.lastname@example.org) 2007-06-04 17:15 EDT -------
Everything looks good. Marking accepted.