Bug 191978

Summary: heavy network usage causes do_IRQ stack overflow
Product: Red Hat Enterprise Linux 4 Reporter: Jim King <jrk>
Component: kernelAssignee: Jason Baron <jbaron>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: urgent Docs Contact:
Priority: medium    
Version: 4.0CC: knoel, mingo
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-20 15:57:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
console output from crash on machine 1
none
console output from crash on machine 2 none

Description Jim King 2006-05-16 17:27:23 UTC
Description of problem:

Kernel crashes under high network usage. Crash is almost immediate in some
situations. The message is: do_IRQ: stack overflow: 420

Attached are console outputs on the crash two different identical machines.

Both machines were working fine under RHEL3, but as soon as we upgraded them to
RHEL 4 (this week, with the latest up2date-installed kernel), they started
experiencing this. You'll notice a 3rd party SAN driver installed in the module
list in these traces (svm, vsd). We uninstalled that to get back to a basic
machine, and got the exact same crash.



Version-Release number of selected component (if applicable):
Kernel 2.6.9-34.ELsmp
Kernel 2.6.9-36.ELsmp development kernel (from
   http://people.redhat.com/~jbaron/rhel4/)


How reproducible:
Every time.


Steps to Reproduce:
1. Start up machine
2. Run our software. 
3. Crash is within 2 seconds.
  
Actual results:
Software works

Expected results:
System crashes in do_IRQ with a stack overflow

Additional info:

System: Sun v65x
Memory: both 4 GB and 8 GB.
Tested both with the Redhat e1000 driver, and the latest from Intel (7.0.38).
Crash is identical in all cases.
Crash logs have 'noapic' on for the kernel... but the crash occurs with or
without it.


Will try to create simple program to make it crash right now and upload it in a bit.

Comment 1 Jim King 2006-05-16 17:27:23 UTC
Created attachment 129240 [details]
console output from crash on machine 1

Comment 2 Jim King 2006-05-16 17:29:07 UTC
Created attachment 129241 [details]
console output from crash on machine 2

Comment 3 Jim King 2006-05-18 18:19:43 UTC
Updated status:

We were able to trace it down to some sort of library conflict. If we have
certain of our own libraries in LD_LIBRARY_PATH, this happens. My guess is that
we've got a library that's conflicting with some system library. If I add
/lib:/usr/lib to the front of LD_LIBRARY_PATH, the crash doesn't happen.

Note that this is all done as an unprivileged user. So to me this makes it a
security issue... all a user has to do is drop a library in their home
directory, clear LD_LIBRARY_PATH except for that library location, and he can
make the system crash.

Still working on isolating it down to a simple case I can pass to you.

Comment 4 Jason Baron 2006-05-18 21:07:40 UTC
interesting, what do you mean by conflicting? i'd guess that the other libraries
are making different syscalls and thus causing different system dynamics. pretty
strange though...it'd be great if you could narrow this down further.

Comment 6 Jiri Pallich 2012-06-20 15:57:43 UTC
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. 
Please See https://access.redhat.com/support/policy/updates/errata/

If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.