118397 – system needlessly thrashing swap partition

Bug 118397 - system needlessly thrashing swap partition

Summary: system needlessly thrashing swap partition

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Larry Woodman
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-03-16 12:11 UTC by Need Real Name
Modified:	2007-11-30 22:07 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2004-04-27 12:53:07 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
vmstat output (2.20 KB, text/plain) 2004-03-24 11:23 UTC, Need Real Name	no flags	Details
top output (1.15 KB, text/plain) 2004-03-24 11:23 UTC, Need Real Name	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2004:188	0	normal	SHIPPED_LIVE	Important: Updated kernel packages available for Red Hat Enterprise Linux 3 Update 2	2004-05-11 04:00:00 UTC

Description Need Real Name 2004-03-16 12:11:52 UTC

Since changing from a box running squid on Red Hat 7.1 to a new box 
running squid on RHEL 3.0, web access through our squid proxy server 
has become very significantly slower, and load average has increased.

I can't see why this is.

I attach output from top and vmstat:

top:

 12:06:04  up 5 days, 16:29,  2 users,  load average: 1.04, 1.01, 0.94
59 processes: 58 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    
idle
           total   11.3%    0.0%    0.9%   0.0%     0.0%   37.6%   
50.0%
           cpu00    9.9%    0.0%    0.9%   0.0%     0.0%   10.8%   
78.2%
           cpu01   12.8%    0.0%    0.9%   0.0%     0.0%   64.3%   
21.7%
Mem:  1028484k av, 1012088k used,   16396k free,       0k shrd,  
272336k buff
                    785112k actv,  187964k in_d,    5000k in_c
Swap: 2096472k av,  269308k used, 1827164k free                  
589580k cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU 
COMMAND
22153 squid     25   0  202M 103M   800 D    12.3 10.3 108:17   0 
squid
23879 root      19   0  1180 1180   892 R     0.4  0.1   0:00   0 top
    1 root      15   0   488  456   436 S     0.0  0.0   0:09   0 init

vmstat:

procs                      memory      swap          io     
system         cpu
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us 
sy id wa
 1  2 270220  16296 272624 589028   23   14    15    28   29    32  
4  1  2  5

Comment 1 Need Real Name 2004-03-17 10:36:35 UTC

We suspect ext3 is causing the slow down, while squid waits for the 
data to be written to disk.

We've switched to ext2 for the cache partition and will see if that 
improves things.

Comment 2 Need Real Name 2004-03-18 12:27:47 UTC

Switching to ext2 fixed the problem.
Closing.

Comment 3 Need Real Name 2004-03-22 13:25:45 UTC

Switching to ext2 delayed the problem reoccurring. Now the problem is 
back.

Since RHEL 3 has the NPTL, should aufs or ufs be used?
Does squid work well with NTPL?

Comment 4 Need Real Name 2004-03-24 11:23:32 UTC

Created attachment 98822 [details]
vmstat output

Comment 5 Need Real Name 2004-03-24 11:23:55 UTC

Created attachment 98823 [details]
top output

Comment 6 Need Real Name 2004-03-24 11:24:53 UTC

This seems to be a kernel bug.
The swap partition is needlessly thrashing, despite free ram.

Comment 7 Need Real Name 2004-03-24 13:58:23 UTC

To confirm that the high iowait was due to the kernel needlessly 
swapping, the machine was rebooted with zero swap space.

iowait is low, squid is running happily, and the internet access is 
back to Red Hat 7.1 speeds.

Comment 8 Need Real Name 2004-03-26 16:11:51 UTC

kswapd hogged the computer for five minutes today.
~45% cpu for the whole time.

The machine was unusable as a web proxy.

Comment 9 Need Real Name 2004-04-01 17:51:10 UTC

Is there any progress on this?
There are now nine comments here, no comments from Red Hat.

Comment 10 Need Real Name 2004-04-05 10:15:39 UTC

Weeks after reporting this bug, I now have a box that will serve 
squid requests most of the time, but with kswapd going mad despite no 
swap space the rest of the time.

The lack of swap space means that grep has become slow - the kernel 
isn't caching disk access.

Despite being encouraged to submit bug reports, Red Hat *still* don't 
even appear to be doing anything at all, and I'm seriously 
considering switching to SuSE.

Comment 11 Arjan van de Ven 2004-04-05 10:20:47 UTC

I think you got the wrong impression about what bugzilla is.
Bugzilla is *not* support. Bugzilla has no SLA. 
If your production server has a problem you really should contact RH
support.

Comment 12 Ernie Petrides 2004-04-08 17:18:16 UTC

Reassigning to Larry.  -ernie

Comment 13 Larry Woodman 2004-04-20 19:13:57 UTC

Please try to reproduce this problem with the RHEL3 Update 2 kernel
and let me know how it goes.  We did make changes that reduce swap
aggression in U2.

Larry Woodman

Comment 14 Need Real Name 2004-04-20 19:29:07 UTC

We made our own independent change that involved echoing a value to
the proc virtual fs. It seems to have worked. We reduced the value
from 100 to 30.

I'll post what we did once I've confirmed it'll stay working.

Comment 15 Need Real Name 2004-04-27 12:53:07 UTC

It's still working. I'll close the bug - one question though:

Is RHEL3 Update 2 identical to RHEL3 Update 1 + updates from up2date?

Comment 16 Ernie Petrides 2004-05-03 20:22:41 UTC

Yes, assuming you're subscribed to the RHEL3 beta channel.  After
RHEL3 U2 is officially released (expected sometime next week),
then the answer is "yes" in any case.

Comment 17 John Flanagan 2004-05-12 01:08:38 UTC

An errata has been issued which should help the problem described in this bug report. 
This report is therefore being closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, please follow the link below. You may reopen 
this bug report if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2004-188.html

Note You need to log in before you can comment on or make changes to this bug.