111711 – performance regression in swap behavior

Bug 111711 - performance regression in swap behavior

Summary: performance regression in swap behavior

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	i686
OS:	Linux
Priority:	high
Severity:	medium
Target Milestone:	---
Assignee:	Dave Anderson
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2003-12-09 05:33 UTC by Ben Woodard
Modified:	2007-11-30 22:06 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2003-12-16 06:04:32 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
test program that illustrates the problem. (1.23 KB, text/plain) 2003-12-09 05:34 UTC, Ben Woodard	no flags	Details
new version of program that reproduces the problem (1.47 KB, text/plain) 2003-12-09 22:13 UTC, Ben Woodard	no flags	Details
Show Obsolete (1) View All

Description Ben Woodard 2003-12-09 05:33:35 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1)
Gecko/20030225

Description of problem:
Our application guys are complaining about some performance issues
with the latest RHEL3 kernel 2.4.21-5EL. They first noticed the
problem about a week ago with the 2.4.21-4.0.1 kernel and I hoped that
the U1 kernel with all its VM fixes for Oracle might help us out with
them and so I gave them the U1 kernel.

They have simplified the problem down to two separate issues and
provided me with a simple reproducer which illustrates the problems
some of the simulation codes are running into. This test program
simply makes a very long doubly linked list. In creating the list it
does not ever traverse the list while building it. It always appends
to the tail. Theoretically this should make the beginning of the list
the oldest pages in the machine. The list is specifically designed to
be bigger than the available RAM of the machine and so it does hit
swap. Therefore, the pages reflecting the top of the list should be
swapped out.

Once the list is of the specified length. It traverses it from tail to
head. Therefore, the pages it accesses first should be the pages which
are the newest.

#1
There seems to be a problem with the performance for the run. On a 7.3
based distro still using a 2.4.18 based kernel they see consistant
performance of around 2 minutes and 45 seconds on a 2.2 GHz box. On a
RHEL3 machine running a 2.4.21-5EL kernel, in the best case the runs
take 5 minutes and 45 seconds.

#2
With the old 2.4.18 based kernel they could run the program over and
over and get about the same performance. However, with the RHEL3
kernel the performance on the second and subsequent runs drops to
around 8 minutes.

The reason why they caught this problem in early testing and why we
are still running the last 2.4.18 errata kernel in production is
because we first saw this problem when we were testing out RHL9. Then
when the errata kernels for 7.3 moved to the 2.4.20 series we saw the
problem appear there. This was some of the leverage I used in
convincing them to move to RHEL3. The fact that I'm now seeing the
same problem with RHEL3's kernel is not making me look very good and
it is putting our plans to move to RHEL3 in production on hold.

The reason why this 2nd problem is such an important issue is that the
people running the simulation software tend to write their codes in
such a way that they seldom if ever touch swap. They want to use every
last bit of memory available without touching swap and having to pay
the performance penalty. If this number changes from run to run, then
they are rather upset. Also, it appears that on the second and
subsequent runs, it hits swap much sooner. On the first run it would
get to about 1.8 GB before it started swapping. This is comparable to
what we have been seeing with 2.4.18. However, the subsequent runs
seem to begin swapping at around 1.2GB. This upsets the application
developers who feel that they have lost 600 MB of available RAM.

Currently, it appears like this problem only happens on ia32. ia64
seems to either not have the problem or it takes more to provoke it.

I have attached the little test program. Could someone with a deeper
understanding of the kernel's VM subsystem, please run this little
test program. You may have to comment out the printing and tweak with
how many nodes you create on the linked list based upon how much
memory is in your box.





Version-Release number of selected component (if applicable):
2.4.21-5EL

How reproducible:
Always

Steps to Reproduce:
1. compile the attached program (the program is designed to trigger
the bug on a 2GB machine. If you have more or less RAM you may need to
tweak the number of items it puts on the linked list. I also found
removing the printf's helpful.)
2. time ./a.out
3. repeat several times
    

Actual Results:  1) in the best case performance about half as good as
seen on the last 2.4.18 errata kernel for 7.3
2) inconsistant performance between runs.
3) A sharp decrease in the amount of memory that can be used before
there is a notable performance degregation between runs.

Expected Results:  1) performance on par with 7.3
2) consistant performance between runs of the program
3) the same amount of memory available between runs before a
performance degregation kicks in.

Additional info:

Comment 1 Ben Woodard 2003-12-09 05:34:50 UTC

Created attachment 96415 [details]
test program that illustrates the problem.

Comment 2 Ben Woodard 2003-12-09 05:46:18 UTC

Additionally under Expected results, we would like to add:
4) roughly the same amount of available RAM before performance is
impacted as we see with the latter 2.4.18 errata kernels.

Comment 3 Ben Woodard 2003-12-09 22:13:21 UTC

Created attachment 96436 [details]
new version of program that reproduces the problem

Comment 4 Ben Woodard 2003-12-09 22:21:22 UTC

These are results from 2.4.21-4.0.1EL

toad5@ben:/usr/bin/time -v ./a.out
build 1071007855
traverse 1071007883
done 1071008120
build=28 sec    traverse 237 sec        total=265 sec
        Command being timed: "./a.out"
        User time (seconds): 1.74
        System time (seconds): 16.43
        Percent of CPU this job got: 6%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 4:26.82
<snip>
        Major (requiring I/O) page faults: 43291
        Minor (reclaiming a frame) page faults: 1042095
<snip>

These are results from 2.4.21-5EL

toad6@ben:/usr/bin/time -v ./a.out
build 1071007394
traverse 1071007423
done 1071007653
build=29 sec    traverse 230 sec        total=259 sec
        Command being timed: "./a.out"
        User time (seconds): 1.40
        System time (seconds): 18.79
        Percent of CPU this job got: 7%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 4:20.44
<snip>
        Major (requiring I/O) page faults: 264199
        Minor (reclaiming a frame) page faults: 822589
<snip>

These are the results with a 2.4.18-27 kernel
        Command being timed: "./a.out"
        User time (seconds): 1.73
        System time (seconds): 25.35
        Percent of CPU this job got: 16%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 2:48.36
<snip>
        Major (requiring I/O) page faults: 309483
        Minor (reclaiming a frame) page faults: 819566
<snip>

Comment 5 Ben Woodard 2003-12-16 06:04:03 UTC

I have to eat crow on this one. We determined that a non-obvious
difference in the node configuration (different drive speeds and
partitioning) lead to the vast majority of the performance differences
between the two runs. Once we corrected for this problem the
performance difference dropped from 120% slow down to a 10% slowdown
which may or may not be caused by the VM.

However, the customer is still spooked that there may still be
gremlins hiding in the VM subsystem and that our reproducer may have
just just failed to replicate the issue. We have seen some anomolous
performance variations between the two kernels but we have yet to
isolate them down to a reproducible state.

Note You need to log in before you can comment on or make changes to this bug.