From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003
Description of problem:
we use FSC Primary P250 boxes with 2 * 2,4 ghz xeon processors and a adaptec
2100S raid controller. the os is redhat 7.2 and a 2.4.18-18-7.xsmp kernel.
we installed a squid and an apache server. the normal system load is between 0.1
and 0.5 (/proc/loadavg). but sometimes the load goes up to 4-6, and we don't
know why. the cpus are 98% idle and there is no io traffic.
after 2 or 3 hours the load goes back down to 0.1. the same problem appears with
a uniprocessor kernel.
as soon as we stop the squid , the load goes back down to a normal value, but
after restarting the squid, the problem appears again.
interesting: starting a bonnie++ while the load is up, the load goes up to
10-12, killing the bonnie process 2 minutes later results in the load going back
down to a normal value (0.2) and stays down.
sometimes the top output and the sar tool shows wrong values. i saw an idle time
of 234567.98 for example or a cpu usage of 234567%.
the other values report that the system feels boring, and the squid response
times looks good :-)
we used 4 different setups: 4 boxes with smp kernel and redhat squid, 3 with up
kernel an redhat squid, 4 with smp kernel and our own squid with mod_gzip and 2
with up kernel and our squid. on all systems the problem appears.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.install a fsc xeon box with rh 7.2
2.install a squid and a apache
3.simulate traffic ( 100 requests/s )
4.wait 1 or 2 weeks and monitor the load
if you can enable sysreq ("echo 1 > /proc/sys/kernel/sysrq") then using the
alt-sysrq-t key combination will spew a kernel debug stream to syslog. based on
that it's possible to see why/where the load is so high, please attach such
output here. (but only in the problem scenario; in the "healthy" case it's no use)
There are two separate bugs here: the obviously incorrect cpu usage (which may
be related to a missed timer tick or incorrect time accounting in the kernel),
and the high load triggering process death (likely a vm issue).
Created attachment 88791 [details]
here is the kernel debug stream
looks like it got stuck on NFS :(
we had the same problems without nfs. we can unmount the nfs devices and send
you a new debug stream.
Created attachment 88795 [details]
new debug stream
after umount all nfs shares, stop the nfs services, and rmmod the nfs kernel
modul. the load is still 4.
fyi: we have the same problems as descriped in bug id: 64984. perhaps our
load problems may result from this bug.
ok, we have the same problems with redhat advanced server. we solved our nfs
problems and the problem is still here.
Created attachment 89234 [details]
debug output advanced server
Created attachment 89235 [details]
the same file again / without auto-detect content type
Created attachment 89236 [details]
Created attachment 89237 [details]
is there a way to get the kernel debug stream without pressing the sysrq keys ?
Created attachment 89243 [details]
interesting top output --> kswapd
Created attachment 89341 [details]
new rh 7.2 kernel debug stream
We would need some more information, related to your finding about kswapd.
We would need the data from:
when the problem occurs. It will give us more depth into the problem, now that
we know that kswapd might be a problem.
Created attachment 89356 [details]
top vmstat and ps output while the load is high
Let me know if this is still a problem.