Bug 101650
| Summary: | after heavy load - kernel stuck - networking still active but userland dead | ||
|---|---|---|---|
| Product: | [Retired] Red Hat Linux | Reporter: | yuval yeret <yuval> |
| Component: | kernel | Assignee: | Arjan van de Ven <arjanv> |
| Status: | CLOSED NOTABUG | QA Contact: | Brian Brock <bbrock> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 8.0 | CC: | riel, yuval |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | i686 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2003-09-29 20:52:22 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Attachments: | |||
You're using a way old kernel, and binary only kernel modules... not a lot to do here. Can you reproduce this with the current erratum kernel and without bin only kernel modules? We are currently not utilizing any binary-only kernel modules : this is our lsmod: ians 110128 2 ====> source taken from Intel qla2300 231120 4 ====> source taken from Qlogic sg 31184 0 (unused) e1000_5.0.43 69152 4 ====> source taken from Intel exastore_mod 912 0 (unused) =====> internal patch to allow panic from userspace via the procfs e100 52144 2 (autoclean) md 62944 0 (unused) We will try to reproduce with the newest errata kernel from redhat, but can you point to fixes made between 24 and the latest that could have addressed this ? I'm also attaching a more detailed trace we got together with our ksyms dump all drivers are built from source: Created attachment 93422 [details]
full magic keys output after the hangup occurs
Created attachment 93423 [details]
sorted ksyms for the relevant kernel
ians is very much a binary only module (with a .c glue layer) *** This bug has been marked as a duplicate of 78616 *** well this happened even without iANS loaded. (P.S. I don't understand why iANS is considered binary module. from looking at the sources I can't find reference to any firmware or binary there. ) Created attachment 93426 [details]
magic keys output without IANS loaded
Created attachment 93427 [details]
ksyms without iANS loaded
Created attachment 93428 [details]
magic keys without IANS with symbols looked up
We are currently not utilizing any binary-only kernel modules :
ians 110128 2 ====> source taken from Intel
^^^^^^^^^^^^^^^^^^^^^^^
Unsupported. Red Hat does not support 3rd party kernel modules in any way,
wether they are proprietary/binary only, open source, GPL, or otherwise. We
also do not support user compiled kernels or kernel modules. We only support
the binary kernel we ship, with the binary modules we ship.
qla2300 231120 4 ====> source taken from Qlogic
^^^^^^^^^^^^^^^^^^^^^^^^
Unsupported. Red Hat does not support 3rd party kernel modules in any way,
wether they are proprietary/binary only, open source, GPL, or otherwise. We
also do not support user compiled kernels or kernel modules. We only support
the binary kernel we ship, with the binary modules we ship.
e1000_5.0.43 69152 4 ====> source taken from Intel
^^^^^^^^^^^^^^^^^^^^^^^
Unsupported. Red Hat does not support 3rd party kernel modules in any way,
wether they are proprietary/binary only, open source, GPL, or otherwise. We
also do not support user compiled kernels or kernel modules. We only support
the binary kernel we ship, with the binary modules we ship.
exastore_mod 912 0 (unused) =====> internal patch to allow panic
from userspace via the procfs
Unsupported. Red Hat does not support 3rd party kernel modules in any way,
wether they are proprietary/binary only, open source, GPL, or otherwise. We
also do not support user compiled kernels or kernel modules. We only support
the binary kernel we ship, with the binary modules we ship. This includes
patched sources as well as unmodified sources.
>We will try to reproduce with the newest errata kernel from redhat, but can you
>point to fixes made between 24 and the latest that could have addressed this ?
You'll have to reproduce it with our supplied binary kernel, not recompiled,
and not using any 3rd party modules or recompiled sources.
>all drivers are built from source:
Which is never supported.
|
Description of problem: We are running 2.4.18-24 on SMP machines with 2CPUs and hyperthreading (SuperMicro Xeon servers) and doing heavy IO to disk and networking. (Qlogic /Emulex HBAs and Intel e1000 NICs are used) At some point the machine oopses /hangs Version-Release number of selected component (if applicable): 2.4.18-24 SMP kernel How reproducible: not reproduceable easily Steps to Reproduce: run heavy I/O load on the system (we have a proprietary application but it mainly writes a lot of files into ext3 partitions and does a lot of networking to clients and other machines in the cluster) a variant is to somehow reduce the I/O capabilities of the system for a little while when it is under heavy load so there are pending IO requests. Once they can be serviced, the potential for the problem is higher. Actual results: System is stuck in a zombie-like state where pings are answered and kernel seems to be functioning in a limited way, while some userland processes cannot work (e.g. ssh cannot work) The oops doesn't appear in logs or on the console, but I've been able to use the diagnostic keys to get the following information: right ALT+Scroll lock CPU 0 : swapper [<c0106f32>] (0xc0323fc4)) default_idle [<c0105000>] (0xc0323fd4)) empty_zero_page CPU 1 : swapper (<c0106f32>) (0xc 6597fb0) - default_idle (<c011d29b>) (0xc 6597fd0) - out_of_line_bug (<c011d449>) (8xc 659ffc) - printk CPU 3 : swapper [<c0106f32>] (0xc8257fb0)) default_idle [<c011d449>][0xc8257fd0)) printk Expected results: system continues to work without oops... Additional info: