Bug 191978 - heavy network usage causes do_IRQ stack overflow
heavy network usage causes do_IRQ stack overflow
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
i686 Linux
medium Severity urgent
: ---
: ---
Assigned To: Jason Baron
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-05-16 13:27 EDT by Jim King
Modified: 2013-03-06 00:59 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-06-20 11:57:43 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
console output from crash on machine 1 (9.58 KB, application/octet-stream)
2006-05-16 13:27 EDT, Jim King
no flags Details
console output from crash on machine 2 (136.84 KB, application/octet-stream)
2006-05-16 13:29 EDT, Jim King
no flags Details

  None (edit)
Description Jim King 2006-05-16 13:27:23 EDT
Description of problem:

Kernel crashes under high network usage. Crash is almost immediate in some
situations. The message is: do_IRQ: stack overflow: 420

Attached are console outputs on the crash two different identical machines.

Both machines were working fine under RHEL3, but as soon as we upgraded them to
RHEL 4 (this week, with the latest up2date-installed kernel), they started
experiencing this. You'll notice a 3rd party SAN driver installed in the module
list in these traces (svm, vsd). We uninstalled that to get back to a basic
machine, and got the exact same crash.



Version-Release number of selected component (if applicable):
Kernel 2.6.9-34.ELsmp
Kernel 2.6.9-36.ELsmp development kernel (from
   http://people.redhat.com/~jbaron/rhel4/)


How reproducible:
Every time.


Steps to Reproduce:
1. Start up machine
2. Run our software. 
3. Crash is within 2 seconds.
  
Actual results:
Software works

Expected results:
System crashes in do_IRQ with a stack overflow

Additional info:

System: Sun v65x
Memory: both 4 GB and 8 GB.
Tested both with the Redhat e1000 driver, and the latest from Intel (7.0.38).
Crash is identical in all cases.
Crash logs have 'noapic' on for the kernel... but the crash occurs with or
without it.


Will try to create simple program to make it crash right now and upload it in a bit.
Comment 1 Jim King 2006-05-16 13:27:23 EDT
Created attachment 129240 [details]
console output from crash on machine 1
Comment 2 Jim King 2006-05-16 13:29:07 EDT
Created attachment 129241 [details]
console output from crash on machine 2
Comment 3 Jim King 2006-05-18 14:19:43 EDT
Updated status:

We were able to trace it down to some sort of library conflict. If we have
certain of our own libraries in LD_LIBRARY_PATH, this happens. My guess is that
we've got a library that's conflicting with some system library. If I add
/lib:/usr/lib to the front of LD_LIBRARY_PATH, the crash doesn't happen.

Note that this is all done as an unprivileged user. So to me this makes it a
security issue... all a user has to do is drop a library in their home
directory, clear LD_LIBRARY_PATH except for that library location, and he can
make the system crash.

Still working on isolating it down to a simple case I can pass to you.
Comment 4 Jason Baron 2006-05-18 17:07:40 EDT
interesting, what do you mean by conflicting? i'd guess that the other libraries
are making different syscalls and thus causing different system dynamics. pretty
strange though...it'd be great if you could narrow this down further.
Comment 6 Jiri Pallich 2012-06-20 11:57:43 EDT
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. 
Please See https://access.redhat.com/support/policy/updates/errata/

If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.

Note You need to log in before you can comment on or make changes to this bug.