Bug 118397 - system needlessly thrashing swap partition
system needlessly thrashing swap partition
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Larry Woodman
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-03-16 07:11 EST by Need Real Name
Modified: 2007-11-30 17:07 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-04-27 08:53:07 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
vmstat output (2.20 KB, text/plain)
2004-03-24 06:23 EST, Need Real Name
no flags Details
top output (1.15 KB, text/plain)
2004-03-24 06:23 EST, Need Real Name
no flags Details

  None (edit)
Description Need Real Name 2004-03-16 07:11:52 EST
Since changing from a box running squid on Red Hat 7.1 to a new box 
running squid on RHEL 3.0, web access through our squid proxy server 
has become very significantly slower, and load average has increased.

I can't see why this is.

I attach output from top and vmstat:

top:

 12:06:04  up 5 days, 16:29,  2 users,  load average: 1.04, 1.01, 0.94
59 processes: 58 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    
idle
           total   11.3%    0.0%    0.9%   0.0%     0.0%   37.6%   
50.0%
           cpu00    9.9%    0.0%    0.9%   0.0%     0.0%   10.8%   
78.2%
           cpu01   12.8%    0.0%    0.9%   0.0%     0.0%   64.3%   
21.7%
Mem:  1028484k av, 1012088k used,   16396k free,       0k shrd,  
272336k buff
                    785112k actv,  187964k in_d,    5000k in_c
Swap: 2096472k av,  269308k used, 1827164k free                  
589580k cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU 
COMMAND
22153 squid     25   0  202M 103M   800 D    12.3 10.3 108:17   0 
squid
23879 root      19   0  1180 1180   892 R     0.4  0.1   0:00   0 top
    1 root      15   0   488  456   436 S     0.0  0.0   0:09   0 init

vmstat:

procs                      memory      swap          io     
system         cpu
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us 
sy id wa
 1  2 270220  16296 272624 589028   23   14    15    28   29    32  
4  1  2  5
Comment 1 Need Real Name 2004-03-17 05:36:35 EST
We suspect ext3 is causing the slow down, while squid waits for the 
data to be written to disk.

We've switched to ext2 for the cache partition and will see if that 
improves things.
Comment 2 Need Real Name 2004-03-18 07:27:47 EST
Switching to ext2 fixed the problem.
Closing.
Comment 3 Need Real Name 2004-03-22 08:25:45 EST
Switching to ext2 delayed the problem reoccurring. Now the problem is 
back.

Since RHEL 3 has the NPTL, should aufs or ufs be used?
Does squid work well with NTPL?

Comment 4 Need Real Name 2004-03-24 06:23:32 EST
Created attachment 98822 [details]
vmstat output
Comment 5 Need Real Name 2004-03-24 06:23:55 EST
Created attachment 98823 [details]
top output
Comment 6 Need Real Name 2004-03-24 06:24:53 EST
This seems to be a kernel bug.
The swap partition is needlessly thrashing, despite free ram.
Comment 7 Need Real Name 2004-03-24 08:58:23 EST
To confirm that the high iowait was due to the kernel needlessly 
swapping, the machine was rebooted with zero swap space.

iowait is low, squid is running happily, and the internet access is 
back to Red Hat 7.1 speeds.

Comment 8 Need Real Name 2004-03-26 11:11:51 EST
kswapd hogged the computer for five minutes today.
~45% cpu for the whole time.

The machine was unusable as a web proxy.


Comment 9 Need Real Name 2004-04-01 12:51:10 EST
Is there any progress on this?
There are now nine comments here, no comments from Red Hat.
Comment 10 Need Real Name 2004-04-05 06:15:39 EDT
Weeks after reporting this bug, I now have a box that will serve 
squid requests most of the time, but with kswapd going mad despite no 
swap space the rest of the time.

The lack of swap space means that grep has become slow - the kernel 
isn't caching disk access.

Despite being encouraged to submit bug reports, Red Hat *still* don't 
even appear to be doing anything at all, and I'm seriously 
considering switching to SuSE.
Comment 11 Arjan van de Ven 2004-04-05 06:20:47 EDT
I think you got the wrong impression about what bugzilla is.
Bugzilla is *not* support. Bugzilla has no SLA. 
If your production server has a problem you really should contact RH
support.
Comment 12 Ernie Petrides 2004-04-08 13:18:16 EDT
Reassigning to Larry.  -ernie
Comment 13 Larry Woodman 2004-04-20 15:13:57 EDT
Please try to reproduce this problem with the RHEL3 Update 2 kernel
and let me know how it goes.  We did make changes that reduce swap
aggression in U2.

Larry Woodman
Comment 14 Need Real Name 2004-04-20 15:29:07 EDT
We made our own independent change that involved echoing a value to
the proc virtual fs. It seems to have worked. We reduced the value
from 100 to 30.

I'll post what we did once I've confirmed it'll stay working.
Comment 15 Need Real Name 2004-04-27 08:53:07 EDT
It's still working. I'll close the bug - one question though:

Is RHEL3 Update 2 identical to RHEL3 Update 1 + updates from up2date?
Comment 16 Ernie Petrides 2004-05-03 16:22:41 EDT
Yes, assuming you're subscribed to the RHEL3 beta channel.  After
RHEL3 U2 is officially released (expected sometime next week),
then the answer is "yes" in any case.
Comment 17 John Flanagan 2004-05-11 21:08:38 EDT
An errata has been issued which should help the problem described in this bug report. 
This report is therefore being closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, please follow the link below. You may reopen 
this bug report if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2004-188.html

Note You need to log in before you can comment on or make changes to this bug.