From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003 Description of problem: hi, we use FSC Primary P250 boxes with 2 * 2,4 ghz xeon processors and a adaptec 2100S raid controller. the os is redhat 7.2 and a 2.4.18-18-7.xsmp kernel. we installed a squid and an apache server. the normal system load is between 0.1 and 0.5 (/proc/loadavg). but sometimes the load goes up to 4-6, and we don't know why. the cpus are 98% idle and there is no io traffic. after 2 or 3 hours the load goes back down to 0.1. the same problem appears with a uniprocessor kernel. as soon as we stop the squid , the load goes back down to a normal value, but after restarting the squid, the problem appears again. interesting: starting a bonnie++ while the load is up, the load goes up to 10-12, killing the bonnie process 2 minutes later results in the load going back down to a normal value (0.2) and stays down. sometimes the top output and the sar tool shows wrong values. i saw an idle time of 234567.98 for example or a cpu usage of 234567%. the other values report that the system feels boring, and the squid response times looks good :-) we used 4 different setups: 4 boxes with smp kernel and redhat squid, 3 with up kernel an redhat squid, 4 with smp kernel and our own squid with mod_gzip and 2 with up kernel and our squid. on all systems the problem appears. tobias Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.install a fsc xeon box with rh 7.2 2.install a squid and a apache 3.simulate traffic ( 100 requests/s ) 4.wait 1 or 2 weeks and monitor the load Additional info:
if you can enable sysreq ("echo 1 > /proc/sys/kernel/sysrq") then using the alt-sysrq-t key combination will spew a kernel debug stream to syslog. based on that it's possible to see why/where the load is so high, please attach such output here. (but only in the problem scenario; in the "healthy" case it's no use)
There are two separate bugs here: the obviously incorrect cpu usage (which may be related to a missed timer tick or incorrect time accounting in the kernel), and the high load triggering process death (likely a vm issue).
Created attachment 88791 [details] here is the kernel debug stream
looks like it got stuck on NFS :(
we had the same problems without nfs. we can unmount the nfs devices and send you a new debug stream.
Created attachment 88795 [details] new debug stream after umount all nfs shares, stop the nfs services, and rmmod the nfs kernel modul. the load is still 4.
fyi: we have the same problems as descriped in bug id: 64984. perhaps our load problems may result from this bug.
ok, we have the same problems with redhat advanced server. we solved our nfs problems and the problem is still here.
Created attachment 89234 [details] debug output advanced server
Created attachment 89235 [details] the same file again / without auto-detect content type
Created attachment 89236 [details] lspci output
Created attachment 89237 [details] /ets/fstab
is there a way to get the kernel debug stream without pressing the sysrq keys ?
Created attachment 89243 [details] interesting top output --> kswapd
Created attachment 89341 [details] new rh 7.2 kernel debug stream
Hello Tobias, We would need some more information, related to your finding about kswapd. We would need the data from: - readprofile - top - vmstat when the problem occurs. It will give us more depth into the problem, now that we know that kswapd might be a problem. Cheers
Created attachment 89356 [details] top vmstat and ps output while the load is high
Let me know if this is still a problem. Larry Woodman