Bug 124058

Summary:

Frequent lockups with high kswapd cpu time usage

Product:

Red Hat Enterprise Linux 2.1

Reporter:

ITPlatformsGroup-staff <tao>

Component:

kernel

Assignee:

Larry Woodman <lwoodman>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Brian Brock <bbrock>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

2.1

CC:

jbaron, riel

Target Milestone:

---

Target Release:

---

Hardware:

i686

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2005-04-04 16:03:19 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
SysRq memory and tasks during kswapd race	none
readprofile and meminfo before kswapd race	none
readprofile and meminfo during kswapd race	none
"meminfo" output	none
mem stats and ps output, every 10 seconds	none

Description ITPlatformsGroup-staff 2004-05-23 16:20:57 UTC

From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)

Description of problem:
System is locking up/freezing from time to time.
When this happens kswapd is often seen in R state, using about 100% 
cpu time, and commands like ps and netstat takes several seconds to 
complete.

Usual pattern is: Lowfree decreases (constantly - why?), kswapd goes 
off at pages.low, and when it finally recovers, Buffers (Inact_dirty) 
drops about 5000 kB and Lowfree goes up same amount. Captures (with 
slabinfo/meminfo, done every minute) shows that kswapd can burn away 
for more than 60 minutes in one stretch!

It seems that system is in a state where no shortage is known via 
free_high/low or inactive_shortage/high/low, hence no calls are made 
to page_launder() or refill_inactive(). Is shortage hidden behind 
plenty of inactive and free highmem. Is the laundromat broken?

Will attach some SysRq M/T, meminfo and profiling next...

Version-Release number of selected component (if applicable):
2.4.9-e.40smp

How reproducible:
Always

Steps to Reproduce:
1. have the system up and running for a few days...
2. including running backups to file then to tape
3.
    

Additional info:

Comment 1 ITPlatformsGroup-staff 2004-05-23 16:30:18 UTC

useful commands when tracing:

readprofile -m /boot/System.map | sort -k3nr | head -20
ps ; ps 10    (10 is PID of kswapd)
ps 10; ps 10; ps 10
alias mi="egrep '[0-9]{3}    1 : ' /proc/slabinfo;egrep '^[ICA]
|^Buf|^LowF|HighF' /proc/meminfo"
echo t > /proc/sysrq-trigger
echo m > /proc/sysrq-trigger ; dmesg | tail â26

Comment 2 ITPlatformsGroup-staff 2004-05-23 16:31:36 UTC

Created attachment 100477 [details]
SysRq memory and tasks during kswapd race

Comment 3 ITPlatformsGroup-staff 2004-05-23 16:32:48 UTC

Created attachment 100478 [details]
readprofile and meminfo before kswapd race

Comment 4 ITPlatformsGroup-staff 2004-05-23 16:33:32 UTC

Created attachment 100479 [details]
readprofile and meminfo during kswapd race

Comment 5 Larry Woodman 2004-05-24 17:31:44 UTC

The problem you are running into here is the majority of lowmem is
consumed by buffermem caching file system meta-data. 

******************** AltSysrq M *************************************
SysRq : Show Memory
Mem-info:

active: 96135, inactive_dirty: 74744, inactive_clean: 0, free: 404
Buffer mem: 173410
*********************************************************************

The AS2.1 kernel is very aggressive in terms of holding onto buffermem
because it contains in-memory copies of the on-disk filesystem data
structures.  We have made a change to the kernel to be less aggressive
in terms of holding on to buffermem pages when buffermem consumes too
much lowmem.  

Please try the appropriate kernel in:

http://people.redhat.com/~lwoodman/.AS21kernels/


Also, please "echo 2 10 25 > /proc/sys/vm/buffermem" to force the
your system to give up buffermem more easily when more than 25% of 
lowmem is consumed by buffermem.


Larry Woodman

Comment 6 Jason Baron 2004-05-24 17:41:12 UTC

Actually, i just spoke with Larry, and we'd like you to try the latest
test kernel, at the link below. Same tuning for buffermem applies.

http://people.redhat.com/~jbaron/.private/testing/2.4.9-e.40.8

Comment 7 ITPlatformsGroup-staff 2004-06-09 07:51:09 UTC

Test kernel 2.4.9-e.40.8smp still has the symptoms
(even if system behaves somewhat different with this kernel).

Machine has been running (in a sort of idle prod state) on e.40.8 a 
couple of days now. Yesterday kswapd crazy kicked in, going on for 
several *hours* without producing any lowmem. See some "meminfo" 
output in next attachment. It begins at about 21:00 and at 22:00 it 
gets even worse, being in Running state most of the time. Note that 
at this time, after working hours, the system should be rather idle.

Is the buffermem parameter % of lowmem or total mem? Docu 
says "percentage amount of total system memory to be used for buffer 
memory". If doc is correct, 25% is 1GB already.
And isn't the middle percentage value what controls when to "give up 
buffermem more easily"?

Comment 8 ITPlatformsGroup-staff 2004-06-09 07:52:54 UTC

Created attachment 100986 [details]
"meminfo" output

Comment 9 Larry Woodman 2004-06-09 17:26:18 UTC

First question, did you set /proc/sys/vm/buffermem to "2 10 25"?
The reason I ask is that the buffermem is all the way up to 650MB!
and that is what one of the patches in the 40.8 kernel is supposed
to fix.

Second, if you are running with the correct buffermem values, can you
get me several "AltSysrq W" outputs and a kernel profile when the
system is being consumed in kswapd.

Larry Woodman

Comment 10 ITPlatformsGroup-staff 2004-06-09 17:57:46 UTC

No, vm.buffermem is default. (650MB does not seem much in a 4GB 
system?!)

Setting this parameter has never worked before despite being 
supposedly fixed in Update 1 (e.12) or later and it seems easier to 
try one thing at a time.

Please clarify:
Is the buffermem parameter % of lowmem or total mem?
Docu  says "percentage amount of total system memory to be used for 
buffer memory". If doc is correct, 25% is 1GB already.
And is it not the middle percentage value that controls when to "give 
up buffermem more easily"?

Comment 11 Larry Woodman 2004-06-09 18:05:16 UTC

The buffermem.maxpercent is a percent of lowmem.  The reason we did
this is because lowmem is always the same size to 25% of the same size
is a constant.  If we made it a percent of total memory every memory
configuration would need different tuning.  So, setting it to 25%
means that buffermem will be reclaimed very aggressively when there is
more that ~250MB of buffermem and the system is out of memory.

Please "echo 2 10 25 > /proc/sys/vm/buffermem" before you start the test.

Also, please use the very latest RHEL2.1 U5 beta kernel located in:

http://people.redhat.com/~jbaron/.private/u5/2.4.9-e.40.10

Comment 12 ITPlatformsGroup-staff 2004-06-11 15:25:38 UTC

What if main mem user is buffermem?
Buffers being at over 700MB and nothing else is in much need of 
lowmem (when system is not actively serving users). Currently it 
looks like system will do a loop: allocate buffers and deallocate a 
lot of buffers.

We will be watching the system now for a few days, having this setup:
[root@hydra root]# uname -r ; sysctl vm.buffermem
2.4.9-e.40.8smp
vm.buffermem = 2        10      25

It seems as if borrow (middle) percentage sets the aim for how much 
(of total mem) should be left to buffers - is this true? (seing 
Buffers dropping to ~400MB when lowmem is freed)

I don't know when we can schedule for another kernel upgrade 
(e.40.10). Maybe after vacations period ie. in September.

Comment 13 ITPlatformsGroup-staff 2004-06-14 11:55:10 UTC

Is some memory excepted from lru list?
I noticed zone normal free amount go 69MB -> 61MB -> 76MB (other 
zones unchanged) but corresponding change did not show in active or 
inactive?!

  active: 62713, inactive_dirty: 76355, inactive_clean: 0, free: 
17896 (255 510 765)

  active: 62739, inactive_dirty: 76367, inactive_clean: 0, free: 
15722 (255 510 765)

  active: 62849, inactive_dirty: 76379, inactive_clean: 0, free: 
19486 (255 510 765)

Other than LowFree, no change is recorded on meminfo / slabinfo.

Comment 14 ITPlatformsGroup-staff 2004-06-17 11:36:19 UTC

Kswapd strikes again!

System has been up for a week now and kswap goes crazy. Yesterday it 
ran for about two hours trying to free memory, consuming as much as 
10 cpu seconds over 10 seconds. See next attachment (log of output 
from egrep 'Buffer|LowFree' /proc/meminfo ; ps 10; sleep 10;).

Was memory managment system rewritten for EL 3?

Comment 15 ITPlatformsGroup-staff 2004-06-17 11:37:57 UTC

Created attachment 101216 [details]
mem stats and ps output, every 10 seconds

Comment 16 Larry Woodman 2004-08-13 18:56:47 UTC

Have you tried the latest AS2.1 kernel?  Please grab this kernel and
if you still see kswapd performance problems collect some "AltSysrq M"
outputs.

http://people.redhat.com/~jbaron/.private/u5/2.4.9-e.49/

Comment 17 Larry Woodman 2004-11-29 19:57:23 UTC

This problem has been fixed and we would like to close thios bug.  Is
this OK with everyone?

Larry Woodman