Red Hat Bugzilla – Bug 73880
multiple kernel panics, kernel null pointers, page faults, etc...
Last modified: 2007-04-18 12:46:37 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.0) Gecko/20020606
Description of problem:
RedHat 7.3 kernel 2.4.18-5bigmem panics with great frequency generating kernel
Oops. Stack seems to get overwritten with bad data. Servers are 4 way Xeon 1.6
GHz with hyterthreading turned on. Systems have 8 gig physcial RAM and 16 gig
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. boot system
2. run internal app
Actual Results: System hangs with a kernel Oops dump to the console.
Expected Results: no kernel panic
Internal code has run on redhat 6.2, 7.0, 7.1, 7.2, kernel.org 2.4.18 kernel,
suse 8.0 without this problem. I have numerous Oops dumps as examples.
Created attachment 75964 [details]
Created attachment 75965 [details]
what modules are you using ?
Created attachment 75966 [details]
Here is the output of lsmod:
nfs 89180 47 (autoclean)
lockd 58080 1 (autoclean) [nfs]
sunrpc 83444 1 (autoclean) [nfs lockd]
pcmcia_core 55616 0
eepro100 20848 1
cpqasm 346016 2
cpqevt 6148 0 [cpqasm]
ext3 69824 2
jbd 52896 2 [ext3]
cciss 36672 10
sd_mod 12864 0 (unused)
scsi_mod 114148 1 [cciss sd_mod]
Created attachment 75967 [details]
Created attachment 75968 [details]
Created attachment 75969 [details]
Created attachment 75970 [details]
Created attachment 75971 [details]
Created attachment 75972 [details]
Created attachment 75973 [details]
Created attachment 75974 [details]
Created attachment 75975 [details]
Please try to reproduce this bug without binary only kernel modules loaded and
then reopen the bug
Created attachment 75976 [details]
It has failed without these kernel modules. It should also be noted that these
modules are not binary only, the are open source written by HP and Compaq. I
will post the code if you want? Also note that 7.2 works with these modules.
if you have an URL to the code, yes please; I'd like to take a look and check
their stack behavior for one (assuming they are actually open source and not
just a binary only blob with some glue code)
Created attachment 76004 [details]
attached module code, anything within that is not open source I can probably get
HP to get/work with you off-line.
It should be noted that this problem also occures without the kernel module loaded.
this is a very HUGE binary only module with a tiny bit of sourcecode ;(
Anyway please try the 2.4.18-12.5 or 2.4.18-14 kernel from the rawhide portion
of our FTP site; it has a stack overflow detector that actually might give a
backtrace BEFORE the stack overflows (the trace AFTER it does is basically
useless ;( )
In order to get a reliable netdump "vmcore" on Red Hat 7.3 we need
the "netconsole" module from Red Hat. It seems that Red Hat 7.3
bundles "netdump-server-0.6.4-1.i386.rpm" and "netdump-0.6.4-1.i386.rpm" but
does the 2.4.18-3 bundled kernel _does_not_ include the "netconsole" module
required for "netdump" client to function correctly.
Where can we get the "netconsole" module required for Red Hat 7.3 (2.4.18-3 or
any errata kernels) to function with bundled "netdump-0.6.4-1.i386.rpm" ??
the supported 7.3 kernel does support netdump
why provide the client then?
because if you actually use the supported kernel you DO have the netdump module.
The HP guy just isn't using that....
> because if you actually use the supported kernel you DO have the
> netdump module. The HP guy just isn't using that....
I'm using the supported kernel from Red Hat 7.3, i.e. bundled 2.4.18-3 kernel
and this bundled release does_not contain the 'netconsole' module.
Do you mean 'supported' as in 2.4.18-10 errata kernel??
Thanks, that explains a lot...i.e. supported kernel is 2.4.18-10.
Does this 'supported' kernel 2.4.18-10 contain the 'stack overflow detector'
code?? If not as stated earlier only 2.4.18-12.5 rawhide kernel?? We can
only locate 2.4.18-12.5 rawhide kernel and not 2.4.18-14??
Ye olde bug with no activity in well over a year. closing.