From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)
Description of problem:
After some time (never more than two weeks), one of our servers stop
responding (freezes completely), and a reboot is needed to get the system
back up and running.
The server in question is an IBM xSeries eServer 330 with dual PIII-800MHz
and 1 GB ECC ram. Both CPUs have the same family, model and stepping. No
bugs are reported in /proc/cpuinfo.
We are running a self-compiled 2.2.19 kernel, but with no special or
experimental support compiled into the kernel.
Steps to Reproduce:
1. Boot server
2. Wait for anywhere between a few hours and 10-12 days
3. System hangs
Actual Results: After a few days up and running, the server freezes. With
100GB of disk fschk takes quite som time, and our users get angry and
frustrated waiting for the server to get back online.
Expected Results: The server should keep on truck^H^H^H^H^Hrunning.
The error message is as follows (abbreviated - stacks and registers
warning: kfree_skb passed an skb still on a list (from c0267fbf)
Unable to handle kernel NULL pointer dereference at virtual address
current -> tss.cr3 = 00101000, %cr3=00101000
*pde = 00000000
<snipped registers and stack... available if wanted>
Code: 89 58 04 89 03 c7 02 00 00 00 00 c7 42 04 00 00 00 00 c7 42
Aiee, killing interrupt handler
Kernel panic: Attempted to kill the idle task!
In swapping task - not syncing
A ksymoopsed version of the oops would be very welcome.
Dave: does this look familiar ? (feel free to assign the bug back if not)
"We are running a self-compiled 2.2.19 kernel, but with no special or
experimental support compiled into the kernel."
Is that RH 2.2.19 or a "upstream" 2.2.19 ?
And are you willing to try out a patch ?
It's a vanilla "kernel.org" 2.2.19 kernel, not a RH kernel.
Before we try out any patches, it would be nice to know what the patch is
supposed to do. The server is in full production, and we don't want to risk any
instability in addition to the occasional halt described in this bugzilla
The server seem to be running OK with a reboot every night, BTW. Could this be
an indicator for a resource leak of some sort?
That or some timeout which happens to be > 24 hours ;)
The patch I propose is a change that will also be in 2.2.20 whenever that comes
Send me the patch, and we'll see what happens... Can't guarantee anything
today, but maybe tomorrow or Thursday.
Long time no update. Since last, we have tried several kernels:
- 2.2.16 self compiled
- 2.2.19 self compiled
- 2.4.12 self compiled
- 2.4.14 self compiled
The server is running 2.4.14 now, but the only way to get it 100% stable is to
remove one CPU. With two CPUs it freezes after anything from 1 day to 1 month.
BTW, it's a NetFinity 5600 with dual PIII-800, 1 GB ram and an Adapter 3200s
RAID controller (lost faith in IBM ServeRAIDs after a while).
What network driver are you using ?
(As you're the only one seeing this problem something must be different)
The network drivers are one "3c59x" (main) and one "dmfe" (crossover to webmail
server). We are using TCP/IP only, with a NSF mount from one server to the
other. No ipchains, iptables, netfilter or pppoe.
We are not all alone with this bug, btw. I have seen one more report in a
Norwegian usenet group (no.it.os.unix.linux.diverse - read by at least one
RedHat employee), and the reporter did get the error on both dual PIIIs and
dual Athlon MPs using different 2.4 kernels. This guy was using the following
libraries: libc-5.3.12-31, glibc-devel-2.2.4-18.7.0, glibc-2.2.4-18.7.0, glibc-
common-2.2.4-18.7.0, compat-libstdc++-6.2-220.127.116.11 on a RedHat 7.0 system (which
he has modified).
We are only running libraries supplied with the RedHat distributions we have
tried, to keep it as simple as possible.
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases,
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/