Red Hat Bugzilla – Bug 191978
heavy network usage causes do_IRQ stack overflow
Last modified: 2013-03-06 00:59:24 EST
Description of problem:
Kernel crashes under high network usage. Crash is almost immediate in some
situations. The message is: do_IRQ: stack overflow: 420
Attached are console outputs on the crash two different identical machines.
Both machines were working fine under RHEL3, but as soon as we upgraded them to
RHEL 4 (this week, with the latest up2date-installed kernel), they started
experiencing this. You'll notice a 3rd party SAN driver installed in the module
list in these traces (svm, vsd). We uninstalled that to get back to a basic
machine, and got the exact same crash.
Version-Release number of selected component (if applicable):
Kernel 2.6.9-36.ELsmp development kernel (from
Steps to Reproduce:
1. Start up machine
2. Run our software.
3. Crash is within 2 seconds.
System crashes in do_IRQ with a stack overflow
System: Sun v65x
Memory: both 4 GB and 8 GB.
Tested both with the Redhat e1000 driver, and the latest from Intel (7.0.38).
Crash is identical in all cases.
Crash logs have 'noapic' on for the kernel... but the crash occurs with or
Will try to create simple program to make it crash right now and upload it in a bit.
Created attachment 129240 [details]
console output from crash on machine 1
Created attachment 129241 [details]
console output from crash on machine 2
We were able to trace it down to some sort of library conflict. If we have
certain of our own libraries in LD_LIBRARY_PATH, this happens. My guess is that
we've got a library that's conflicting with some system library. If I add
/lib:/usr/lib to the front of LD_LIBRARY_PATH, the crash doesn't happen.
Note that this is all done as an unprivileged user. So to me this makes it a
security issue... all a user has to do is drop a library in their home
directory, clear LD_LIBRARY_PATH except for that library location, and he can
make the system crash.
Still working on isolating it down to a simple case I can pass to you.
interesting, what do you mean by conflicting? i'd guess that the other libraries
are making different syscalls and thus causing different system dynamics. pretty
strange though...it'd be great if you could narrow this down further.
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life.
Please See https://access.redhat.com/support/policy/updates/errata/
If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.