79726 – strange system load on rh 7.2 web servers

Bug 79726 - strange system load on rh 7.2 web servers

Summary: strange system load on rh 7.2 web servers

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat Enterprise Linux 2.1
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	2.1
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Larry Woodman
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2002-12-16 09:28 UTC by Tobias Meier
Modified:	2007-11-30 22:06 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2005-09-28 11:40:59 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
here is the kernel debug stream (201.99 KB, text/plain) 2002-12-18 10:04 UTC, Tobias Meier	no flags	Details
new debug stream (98.81 KB, text/plain) 2002-12-18 15:06 UTC, Tobias Meier	no flags	Details
debug output advanced server (147.86 KB, application/octet-stream) 2003-01-09 13:10 UTC, Tobias Meier	no flags	Details
the same file again / without auto-detect content type (147.86 KB, text/plain) 2003-01-09 13:15 UTC, Tobias Meier	no flags	Details
lspci output (8.35 KB, text/plain) 2003-01-09 14:00 UTC, Tobias Meier	no flags	Details
/ets/fstab (1.04 KB, text/plain) 2003-01-09 14:04 UTC, Tobias Meier	no flags	Details
interesting top output --> kswapd (2.09 KB, text/plain) 2003-01-09 15:38 UTC, Tobias Meier	no flags	Details
new rh 7.2 kernel debug stream (543.30 KB, text/plain) 2003-01-13 21:56 UTC, Tobias Meier	no flags	Details
top vmstat and ps output while the load is high (30.88 KB, text/plain) 2003-01-14 18:14 UTC, Tobias Meier	no flags	Details
View All

Description Tobias Meier 2002-12-16 09:28:51 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003

Description of problem:
hi,
we use FSC Primary P250 boxes with 2 * 2,4 ghz xeon processors and a adaptec
2100S raid controller. the os is redhat 7.2 and a 2.4.18-18-7.xsmp kernel.
we installed a squid and an apache server. the normal system load is between 0.1
and 0.5 (/proc/loadavg). but sometimes the load goes up to 4-6, and we don't
know why. the cpus are 98% idle and there is no io traffic.
after 2 or 3 hours the load goes back down to 0.1. the same problem appears with
a uniprocessor kernel.
as soon as we stop the squid , the load goes back down to a normal value, but
after restarting the squid, the problem appears again. 

interesting: starting a bonnie++ while the load is up, the load goes up to
10-12, killing the bonnie process 2 minutes later results in the load going back
down to a normal value (0.2) and stays down.

sometimes the top output and the sar tool shows wrong values. i saw an idle time
of 234567.98 for example or a cpu usage of 234567%.
the other values report that the system feels boring, and the squid response
times looks good :-)

we used 4 different setups: 4 boxes with smp kernel and redhat squid, 3 with up
kernel an redhat squid, 4 with smp kernel and our own squid with mod_gzip and 2
with up kernel and our squid. on all systems the problem appears.

   tobias

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.install a fsc xeon box with rh 7.2
2.install a squid and a apache
3.simulate traffic ( 100 requests/s ) 
4.wait 1 or 2 weeks and monitor the load
    

Additional info:

Comment 1 Arjan van de Ven 2002-12-16 09:52:18 UTC

if you can enable sysreq ("echo 1 > /proc/sys/kernel/sysrq") then using the
alt-sysrq-t key combination will spew a kernel debug stream to syslog. based on
that it's possible to see why/where the load is so high, please attach such
output here. (but only in the problem scenario; in the "healthy" case it's no use)

Comment 2 Ben LaHaise 2002-12-16 16:31:50 UTC

There are two separate bugs here: the obviously incorrect cpu usage (which may
be related to a missed timer tick or incorrect time accounting in the kernel),
and the high load triggering process death (likely a vm issue).

Comment 3 Tobias Meier 2002-12-18 10:04:02 UTC

Created attachment 88791 [details]
here is the kernel debug stream

Comment 4 Arjan van de Ven 2002-12-18 10:19:08 UTC

looks like it got stuck on NFS :(

Comment 5 Tobias Meier 2002-12-18 10:32:32 UTC

we had the same problems without nfs. we can unmount the nfs devices and send
you a new debug stream.

Comment 6 Tobias Meier 2002-12-18 15:06:40 UTC

Created attachment 88795 [details]
new debug stream

after umount all nfs shares, stop the nfs services, and rmmod the nfs kernel
modul. the load is still 4.

Comment 7 Tobias Meier 2002-12-18 15:45:18 UTC

 fyi: we have the same problems as descriped in bug id: 64984. perhaps our 
load problems may result from this bug.

Comment 8 Tobias Meier 2003-01-09 13:00:57 UTC

ok, we have the same problems with redhat advanced server. we solved our nfs
problems and the problem is still here.

Comment 9 Tobias Meier 2003-01-09 13:10:42 UTC

Created attachment 89234 [details]
debug output advanced server

Comment 10 Tobias Meier 2003-01-09 13:15:26 UTC

Created attachment 89235 [details]
the same file again / without auto-detect content type

Comment 11 Tobias Meier 2003-01-09 14:00:17 UTC

Created attachment 89236 [details]
lspci output

Comment 12 Tobias Meier 2003-01-09 14:04:48 UTC

Created attachment 89237 [details]
/ets/fstab

Comment 13 Tobias Meier 2003-01-09 14:36:38 UTC

is there a way to get the kernel debug stream without pressing the sysrq keys ?

Comment 14 Tobias Meier 2003-01-09 15:38:47 UTC

Created attachment 89243 [details]
interesting top output --> kswapd

Comment 15 Tobias Meier 2003-01-13 21:56:27 UTC

Created attachment 89341 [details]
new rh 7.2 kernel debug stream

Comment 16 Bastien Nocera 2003-01-14 17:02:28 UTC

Hello Tobias,

We would need some more information, related to your finding about kswapd.

We would need the data from:
- readprofile
- top
- vmstat
when the problem occurs. It will give us more depth into the problem, now that
we know that kswapd might be a problem.

Cheers

Comment 17 Tobias Meier 2003-01-14 18:14:53 UTC

Created attachment 89356 [details]
top vmstat and ps output while the load is high

Comment 18 Larry Woodman 2005-09-28 11:40:59 UTC

Let me know if this is still a problem.

Larry Woodman

Note You need to log in before you can comment on or make changes to this bug.