Bug 240392 - irq smp_affinity should be all cpus by default
irq smp_affinity should be all cpus by default
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel (Show other bugs)
1.0
x86_64 Linux
medium Severity high
: ---
: ---
Assigned To: Neil Horman
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-05-16 21:27 EDT by IBM Bug Proxy
Modified: 2008-02-27 14:58 EST (History)
1 user (show)

See Also:
Fixed In Version: irqbalance-0.55-4.el5rt
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-06-04 21:54:21 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
irqbindall init script (799 bytes, text/plain)
2007-05-16 21:27 EDT, IBM Bug Proxy
no flags Details
irqbalance.diff (720 bytes, text/plain)
2007-05-21 22:15 EDT, IBM Bug Proxy
no flags Details
new patch to handle one shot specific settings and reset of affinity (1.73 KB, patch)
2007-05-22 07:35 EDT, Neil Horman
no flags Details | Diff
new irabalance patch (1.57 KB, patch)
2007-05-23 15:42 EDT, Neil Horman
no flags Details | Diff
test srpm for irqbalance (28.15 KB, application/x-rpm)
2007-05-24 15:56 EDT, Neil Horman
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
IBM Linux Technology Center 34749 None None None Never

  None (edit)
Description IBM Bug Proxy 2007-05-16 21:27:40 EDT
LTC Owner is: jstultz@us.ibm.com
LTC Originator is: jstultz@us.ibm.com


Problem description:

If an irq's smp_affinity is not set to all cpus, then its possible for a single
badly behaving -rt thread to hang a system.

This occurs because the IRQ threads run w/ the smp affinity mask that the irq is
set for. So not only is the interrupt targetted to the smp_affinity mask, but
the irq process must run on those cpus as well.

By default, smp affinity masks are set to a single cpu, which means if a high
priority realtime process runs on that cpu and does not yield, it will block the
irq from being processed, even if there are idle cpus in the system. If the irq
being blocked is the SCSI disk, that can result in a system lock up.

By setting the irq smp_affinity mask to all cpus, you allow the IRQ threads to
run  on the idle cpus. This avoids the issue.

IBM has created a irqbindall init script as part of its realtime utilities
package. RedHat should consider including this init script to avoid this issue.
Comment 1 IBM Bug Proxy 2007-05-16 21:27:41 EDT
Created attachment 154889 [details]
irqbindall init script
Comment 4 IBM Bug Proxy 2007-05-21 17:05:41 EDT
----- Additional Comments From jstultz@us.ibm.com (prefers email at johnstul@us.ibm.com)  2007-05-21 17:03 EDT -------
From the discussion on the call, LTC is going to try to re-work irqbindall.sh
into the irqbalance one shot script. 
Comment 5 IBM Bug Proxy 2007-05-21 22:15:45 EDT
Created attachment 155146 [details]
irqbalance.diff
Comment 6 IBM Bug Proxy 2007-05-21 22:15:51 EDT
----- Additional Comments From jstultz@us.ibm.com (prefers email at johnstul@us.ibm.com)  2007-05-21 22:08 EDT -------
 
First pass changes to irqbalance init script

First pass change to the irqbalance init script to provide irqbindall.sh like
functionality.

Two changes:
1) Add reset_mask function that sets the irqs smp_affinity mask back to all
cpus when irqbalance is stopped
2) Check the /etc/sysconfig/irqbalance config file for RT_BINDALL=yes and avoid
starting irqbalance in that case.

I'm sure this isn't completely optimal, but gives us something to start with.

Neil: Your thoughts? 
Comment 7 Neil Horman 2007-05-22 07:02:05 EDT
I think the idea for the feature is a fine one, but I think mechanically its
just bit shy of the mark.

Specific concerns:

1) the use of RT_BINDALL messes with the way the service infrastrucuture
behaves.  If I understand the purpose of the RT_BINDALL variable correctly, its
intent is to prevent irqbalance from actually running, and force it to not make
any changes to any irq's cpu affinity, presumably so that an external script can
configure the the cpu affinity for all the interrupts.  If that is the case,
then I think the appropriate course of action is to use the ONESHOT environment
variable to indicate that irqbalance wil run once and exit (and fix up the the
service locks appropriately), and then use a second environment variable to
point to a set of files that define the irqbalance per-irq affinity.

2) The use of a 32 bit all f's mask isnt sufficient for the cpu count we
support.  IIRC we support up to 128 cpus at this point, and that needs to be
reflected in the affinity mask.

I'll fix up these points and provide you a patch for testing
Comment 8 Neil Horman 2007-05-22 07:35:59 EDT
Created attachment 155155 [details]
new patch to handle one shot specific settings and reset of affinity

Here you are.  This should handle what you are looking for.  It corrects the
reset_mask function such that it can handle more than 32 cpuss, and it adds a
SPECFIC_MASK environment variable, which is used as follows:

1) If ONESHOT is _not_ set SPECIFIC_MASK is ignored

2) If ONESHOT is set but SPECIFIC_MASK is _not_ set, ONESHOT behavior is 
consistent with previous versions of irqbalance

3) If ONESHOT and SPECIFIC_MASK is set, /etc/sysconfig/irqbalance_masks/ is
searched for files named <irqnum>.mask.  For each file found, if <irqnum> is
present in /proc/irq/ then the contents of <irqnum>.mask are applied to
/proc/irq/<irqnum>/smp_affinity

This should give us the flexibility to do oneshot specific mask setting without
needing to add an external program to the irqbalance package.

I've not tested this yet, so there might be some minor syntax bugs in this, but
nothing major.	If you could test this out and let me know the results, and if
you are satisfied with this solution, I'll be happy to check this into
irqbalance (along with some documentation describing the new option)
Comment 9 IBM Bug Proxy 2007-05-22 20:30:34 EDT
----- Additional Comments From jstultz@us.ibm.com (prefers email at johnstul@us.ibm.com)  2007-05-22 20:28 EDT -------
So w/ the -rt patch, I'm getting lots of:
etc/init.d/irqbalance_new: line 62: echo: write error: Value too large for
defined data type

So the mask calculation doesn't quite look ok for the -rt kernel. 
Comment 10 IBM Bug Proxy 2007-05-22 21:31:08 EDT
----- Additional Comments From jstultz@us.ibm.com (prefers email at johnstul@us.ibm.com)  2007-05-22 21:28 EDT -------
Oh, goofy, apparently just the high bit is invalid.

#echo 70000000,00000000,00000000,00000000,00000000,00000000,00000000,0000000f >
/proc/irq/0/smp_affinity 

works fine. But 8.... and above fails. 
Comment 11 IBM Bug Proxy 2007-05-22 21:50:36 EDT
----- Additional Comments From jstultz@us.ibm.com (prefers email at johnstul@us.ibm.com)  2007-05-22 21:46 EDT -------
I think replacing the for look in reset_mask() to the following fixes the first
issue:

for i in `ls /proc/irq`
	do
		mask=`cat /proc/irq/$i/smp_affinity 
			| sed s/[0-f]/f/g 
			| sed s/f/7/ 
			| sed s/
//`
		echo $i - $mask
		echo $mask > /proc/irq/$i/smp_affinity
	done


Now, with regards to the set_specific_masks() string, do you oppose having one
SPECIFIC_MASK=<long mask> string that we format as we do in the code snippit
above? I think that will simplify populating the masks dir as required in your
patch. 
Comment 12 Neil Horman 2007-05-23 08:54:22 EDT
Regarding the reset_mask issue, you're right, I neglected the commas in the mask
specification completely.  Thanks you.  I'll roll your fix into the patch.

With regards to the SPECIFIC_MASK setting, I would prefer to offer customers the
option to individually set their irq masks so that they could set a variety if
they wanted to.  I would be willing to add an all.mask file that we could check
for if you wanted to specify the same mask for all irq's.  would that be agreeable?
Comment 13 IBM Bug Proxy 2007-05-23 13:28:27 EDT
----- Additional Comments From jstultz@us.ibm.com (prefers email at johnstul@us.ibm.com)  2007-05-22 21:25 EDT -------
Actually, even w/ mainline, on x86_64 I'm getting those 
line 62: echo: write error: Value too large for defined data type
errors.

It seems the smp_affinity value on x86_64 looks like:
# cat /proc/irq/0/smp_affinity
00000000,00000000,00000000,00000000,00000000,00000000,00000000,0000000f

And the following works fine:
echo 00000000,00000000,00000000,00000000,00000000,00000000,00000000,0000000f >
/proc/irq/0/smp_affinity 

However, just replacing the first zero w/ a f breaks it:
# echo f0000000,00000000,00000000,00000000,00000000,00000000,00000000,0000000f >
/proc/irq/0/smp_affinity 
bash: echo: write error: Value too large for defined data type

So it seems you can't use the character count as your mask length. Hmm. 
Comment 14 IBM Bug Proxy 2007-05-23 13:30:51 EDT
----- Additional Comments From jstultz@us.ibm.com (prefers email at johnstul@us.ibm.com)  2007-05-22 14:50 EDT -------
Neil: Thanks for the review. The RT_BINDALL in my patch was really just a "don't
touch it" flag. No other application was planned, but I agree that more advanced
features then what I was considering (like those found in your patch) could be nice.

So few comments on your patch:
1) I need to check it against -rt, as I think -rt causes the
/proc/irq/num/smp_affinity output (not input) to be more then a simple mask.

2) Not sure how the sysconfig/irqbalance_masks/ files handle the 32 bit or 64
bit gotcha you caught in my patch.

3) I worry it might be somewhat over complicated, as populating the
irqbalance_masks might be machine specific (or do we by default put 255 files
there?)

Would a simpler solution be instead of using the masks dir, could we just have
SPECIFIC_MASK be a mask string that is applied to all IRQs?  Sort of like just
the inverse of the IRQ_AFFINITY_MASK value.

I'll give your patch a test and let you know how it goes. 
Comment 15 IBM Bug Proxy 2007-05-23 13:44:37 EDT
----- Additional Comments From jstultz@us.ibm.com (prefers email at johnstul@us.ibm.com)  2007-05-22 21:47 EDT -------
Oh, you can probably yank the debug "echo $i - $mask" from the snippit above, I
forgot to pull that. 
Comment 16 IBM Bug Proxy 2007-05-23 14:45:47 EDT
----- Additional Comments From jstultz@us.ibm.com (prefers email at johnstul@us.ibm.com)  2007-05-23 14:41 EDT -------
Neil: Your suggestion sounds fine by me! Let me know when the patch is ready and
I'll start testing it. 
Comment 17 Neil Horman 2007-05-23 15:42:42 EDT
Created attachment 155291 [details]
new irabalance patch

New version of the patch.

I've redone the patch, and incorporated your fixes to reset_masks (as well as
consolidated the sed expressions a bit).  Thank you for that work.

Regarding the use of SPECIFIC_MASK as a fixed mask, I understand your desire to
make that as simple as possible to do a one shot set, but I'm concerned that
others might want to have some extra flexibility in that feature (i.e. isolate
high volume interrupts to one cpu set, low latency interrupts to another single
cpu, and isolate the remaining cpus from interrupts entirely).	

What I've done in this version of the patch is add logic for you to write a
default.mask file to /etc/sysconfig/irqbalance_masks.  This file will be echoed
into any irq number for which there is no <irqnum>.mask file.  This way you can
set a default mask for all irqs, and only override those irqs for which you
wish a different mask.

Give it a shot and let me know what you think.	Thanks!
Comment 18 IBM Bug Proxy 2007-05-23 18:25:45 EDT
----- Additional Comments From jstultz@us.ibm.com (prefers email at johnstul@us.ibm.com)  2007-05-23 18:20 EDT -------
Looks good. Although two minor nits:

1) On irqbalance start, I don't see the [OK] message
2) Some interrupts that do not have devices bound (irq 2 in my case) print:
     cat: write error: Input/output error

For #2, probably want to redirect error messages from cat to /dev/null.

Otherwise this patch provides sufficient infrastructure to do what we need.

The next question is default packaging for the RHEL5-rt distro. I think we'd
prefer if there was a default.mask file and it was: 7fff.... for robustness to
avoid irq starvation.

How does that vibe on the RH side? 
Comment 19 Neil Horman 2007-05-23 19:45:29 EDT
Sounds reasonable to me.  I'll fix up the patch with your suggestions/fixes and
attach a complete rpm for your testing.
Comment 20 Neil Horman 2007-05-24 15:56:19 EDT
Created attachment 155390 [details]
test srpm for irqbalance

Hey there, heres a src rpm of the new irqbalance package for your testing. 
Please build it and give it a try.  It should dynamically create a default.mask
file for you on install based on the size of the smp_affinity files on a given
system.  Please confirm that the built rpm installs and uninstalls correctly
for you, and that starting and stopping the service results in expected
behavior.  If you need me to build a binary rpm for you, let me know and I
will.  Thanks!
Comment 21 IBM Bug Proxy 2007-05-24 17:45:38 EDT
----- Additional Comments From jstultz@us.ibm.com (prefers email at johnstul@us.ibm.com)  2007-05-24 17:39 EDT -------
The src rpm built and functioned properly.

Just a few last nits:
1) On irqbalance stop, I get:
/etc/init.d/irqbalance: line 50: echo: write error: Input/output error
/etc/init.d/irqbalance: line 50: echo: write error: Input/output error
sed: can't read /proc/irq/prof_cpu_mask/smp_affinity: Not a directory
/etc/init.d/irqbalance: line 50: /proc/irq/prof_cpu_mask/smp_affinity: Not a
directory

2) Also /etc/sysconfig/irqbalance needs a comment about the SPECIFIC_MASKS
options. Something like:
# SPECIFIC_MASKS=yes
#   When ONESHOT=yes, this option use the mask files found in 
#   /etc/sysconfig/irqbalance_mask/ directory. 
SPECIFIC_MASKS=yes 
Comment 22 Neil Horman 2007-05-25 15:34:45 EDT
Ok, I've made your last fixups, and the new rpm with your included feature is
available as irqbalance-0.55-4.el5rt.  Thanks!
Comment 23 IBM Bug Proxy 2007-05-25 16:25:36 EDT
----- Additional Comments From jstultz@us.ibm.com (prefers email at johnstul@us.ibm.com)  2007-05-25 16:19 EDT -------
Neil, Thanks so much for your help and expertise here! I really appreciate it!

Clark/Tim: is there an ETA for this update to land on the partner repo? 
Comment 24 IBM Bug Proxy 2007-06-04 17:16:16 EDT
----- Additional Comments From jstultz@us.ibm.com (prefers email at johnstul@us.ibm.com)  2007-06-04 17:10 EDT -------
Just pulled new irqbalance from the yum repo. Verifying everything is good. 
Comment 25 IBM Bug Proxy 2007-06-04 17:20:25 EDT
changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|FIXEDAWAITINGTEST           |ACCEPTED




------- Additional Comments From jstultz@us.ibm.com (prefers email at johnstul@us.ibm.com)  2007-06-04 17:15 EDT -------
Everything looks good. Marking accepted. 

Note You need to log in before you can comment on or make changes to this bug.