Bug 112578

Summary:	softirq for rhel 3 update 2
Product:	Red Hat Enterprise Linux 2.1	Reporter:	sheryl sage <sheryl.sage>
Component:	kernel	Assignee:	Rik van Riel <riel>
Status:	CLOSED ERRATA	QA Contact:	Brian Brock <bbrock>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	2.1	CC:	petrides, riel, summer, tao, tburke
Target Milestone:	---	Keywords:	FutureFeature
Target Release:	---
Hardware:	i386
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Enhancement
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2004-06-08 19:04:33 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	107563

Description sheryl sage 2003-12-23 15:22:00 UTC

From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)

Description of problem:
(Here's the RHEL 3.0 2.4.21-4.EL kernel version of our deepstack
softirq deferral patch.  this RHEL 3.0 one further excludes s390 and 
s390x from the scheme, since they gained an interrupt stack between 
2.4.9 and 2.4.21. The same arguments apply, for completeness 
prepended again....)

Some VERITAS users on RHAS2.1 have suffered from do_irq's (2048 bytes
away from) stack overflow messages.  The threshold for that message is
at a more reasonable level (1024) in RHEL3.0, but we're still worried
by the prospect of overflow, and are working to reduce our stack 
usage.

There's a small change to Linux which would reduce the likelihood of
stack overflow for all.  It's far from being a complete solution,
but a small enough change to be worth making.

In many (but not all) drivers, the complex and stack-hungry part of
interrupt processing is done in the softirq rather than the hardirq.
do_softirq() already defers softirq work to its daemon when swamped
by more softirqs while it's working.  This patch adds a stack check,
deferring all softirq work to the daemon when the stack is too deep.

How deep is too deep?  Given the hardirq warning at 1k, we estimate
the threshold for softirq deferral should be between 2k and 3k, and
have set 2560 here.  Much lower than that would make it ineffective,
much higher than that would impact performance.

This patch differs slightly from the patch we offered earlier for
RHEL3.0: extending it from i386 to other architectures (excepting
parisc, s390, s390x and

Comment 1 Rik van Riel 2004-01-05 23:07:13 UTC

Sheryl, I agree to this patch.  All we need to do now is get it past
the other developers at Red Hat, by proving that it doesn't have a
noticable performance penalty in the kind of setup that shows stack
overflows with normal users.

In short, we would need a setup that:
1) sometimes results in stack overflows when the benchmark is run normally
2) works fine with the patch

If this setup produces pretty much identical performance with and
without the stack overflow patch, it'll be hard for other engineers
inside Red Hat to object to the patch.

Comment 2 Tim Burke 2004-01-06 20:57:34 UTC

Could Veritas come back with a proposed testing/characterization
scenario as Rik described above?  Would be useful for Veritas to first
describe how they can reproduce the scenario (hardware used, tests
run, etc), then bounce that off us.  The whole point of this exercise
is to allay concerns of potential negative performance implications. 
The more convincing the scenario, the better chance of acceptance.

Once you have proposed the testing scenarios, we can discuss whether
it covers the concerns.  If Veritas provides us this testing proposal
before investing the time to do it, we can provide feedback.  That way
you won't end up wasting time and we later determine its insuficient
testing.

(Separately, we are trying to drum up testing recommendations, but
since Veritas is most familiar with how to reproduce the problem, it
might be beneficial for you to pose the initial test descriptions. 
Then we can cooperatively kick it around from there.)

Comment 5 Rik van Riel 2004-06-08 14:18:46 UTC

This got applied to one of the RHEL3 updates.  Probably U2 ;)

Comment 6 Ernie Petrides 2004-06-08 19:04:33 UTC

The fix was committed to the RHEL3 U2 patch pool in kernel version
2.4.21-9.15.EL, and it was released in errata RHSA-2004:188.