From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.0) Gecko/20020606 Description of problem: RedHat 7.3 kernel 2.4.18-5bigmem panics with great frequency generating kernel Oops. Stack seems to get overwritten with bad data. Servers are 4 way Xeon 1.6 GHz with hyterthreading turned on. Systems have 8 gig physcial RAM and 16 gig of swap. Version-Release number of selected component (if applicable): How reproducible: Sometimes Steps to Reproduce: 1. boot system 2. run internal app 3. wait Actual Results: System hangs with a kernel Oops dump to the console. Expected Results: no kernel panic Additional info: Internal code has run on redhat 6.2, 7.0, 7.1, 7.2, kernel.org 2.4.18 kernel, suse 8.0 without this problem. I have numerous Oops dumps as examples.
Created attachment 75964 [details] console dump
Created attachment 75965 [details] console dump
what modules are you using ?
Created attachment 75966 [details] console dump
Here is the output of lsmod: nfs 89180 47 (autoclean) lockd 58080 1 (autoclean) [nfs] sunrpc 83444 1 (autoclean) [nfs lockd] pcmcia_core 55616 0 eepro100 20848 1 cpqasm 346016 2 cpqevt 6148 0 [cpqasm] ext3 69824 2 jbd 52896 2 [ext3] cciss 36672 10 sd_mod 12864 0 (unused) scsi_mod 114148 1 [cciss sd_mod]
Created attachment 75967 [details] console dump
Created attachment 75968 [details] console dump
Created attachment 75969 [details] console dump
Created attachment 75970 [details] console dump
Created attachment 75971 [details] console dump
Created attachment 75972 [details] console dump
Created attachment 75973 [details] console dump
Created attachment 75974 [details] console dump
Created attachment 75975 [details] console dump
Please try to reproduce this bug without binary only kernel modules loaded and then reopen the bug
Created attachment 75976 [details] console dump
It has failed without these kernel modules. It should also be noted that these modules are not binary only, the are open source written by HP and Compaq. I will post the code if you want? Also note that 7.2 works with these modules.
if you have an URL to the code, yes please; I'd like to take a look and check their stack behavior for one (assuming they are actually open source and not just a binary only blob with some glue code)
Created attachment 76004 [details] module code
attached module code, anything within that is not open source I can probably get HP to get/work with you off-line.
It should be noted that this problem also occures without the kernel module loaded.
this is a very HUGE binary only module with a tiny bit of sourcecode ;( Anyway please try the 2.4.18-12.5 or 2.4.18-14 kernel from the rawhide portion of our FTP site; it has a stack overflow detector that actually might give a backtrace BEFORE the stack overflows (the trace AFTER it does is basically useless ;( )
In order to get a reliable netdump "vmcore" on Red Hat 7.3 we need the "netconsole" module from Red Hat. It seems that Red Hat 7.3 bundles "netdump-server-0.6.4-1.i386.rpm" and "netdump-0.6.4-1.i386.rpm" but does the 2.4.18-3 bundled kernel _does_not_ include the "netconsole" module required for "netdump" client to function correctly. Where can we get the "netconsole" module required for Red Hat 7.3 (2.4.18-3 or any errata kernels) to function with bundled "netdump-0.6.4-1.i386.rpm" ??
the supported 7.3 kernel does support netdump
why provide the client then?
because if you actually use the supported kernel you DO have the netdump module. The HP guy just isn't using that....
Hi RedHat, > because if you actually use the supported kernel you DO have the > netdump module. The HP guy just isn't using that.... I'm using the supported kernel from Red Hat 7.3, i.e. bundled 2.4.18-3 kernel and this bundled release does_not contain the 'netconsole' module. Do you mean 'supported' as in 2.4.18-10 errata kernel??
correct
Thanks, that explains a lot...i.e. supported kernel is 2.4.18-10. Questions: Does this 'supported' kernel 2.4.18-10 contain the 'stack overflow detector' code?? If not as stated earlier only 2.4.18-12.5 rawhide kernel?? We can only locate 2.4.18-12.5 rawhide kernel and not 2.4.18-14??
Ye olde bug with no activity in well over a year. closing.