Red Hat Bugzilla – Bug 111711
performance regression in swap behavior
Last modified: 2007-11-30 17:06:59 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1)
Description of problem:
Our application guys are complaining about some performance issues
with the latest RHEL3 kernel 2.4.21-5EL. They first noticed the
problem about a week ago with the 2.4.21-4.0.1 kernel and I hoped that
the U1 kernel with all its VM fixes for Oracle might help us out with
them and so I gave them the U1 kernel.
They have simplified the problem down to two separate issues and
provided me with a simple reproducer which illustrates the problems
some of the simulation codes are running into. This test program
simply makes a very long doubly linked list. In creating the list it
does not ever traverse the list while building it. It always appends
to the tail. Theoretically this should make the beginning of the list
the oldest pages in the machine. The list is specifically designed to
be bigger than the available RAM of the machine and so it does hit
swap. Therefore, the pages reflecting the top of the list should be
Once the list is of the specified length. It traverses it from tail to
head. Therefore, the pages it accesses first should be the pages which
are the newest.
There seems to be a problem with the performance for the run. On a 7.3
based distro still using a 2.4.18 based kernel they see consistant
performance of around 2 minutes and 45 seconds on a 2.2 GHz box. On a
RHEL3 machine running a 2.4.21-5EL kernel, in the best case the runs
take 5 minutes and 45 seconds.
With the old 2.4.18 based kernel they could run the program over and
over and get about the same performance. However, with the RHEL3
kernel the performance on the second and subsequent runs drops to
around 8 minutes.
The reason why they caught this problem in early testing and why we
are still running the last 2.4.18 errata kernel in production is
because we first saw this problem when we were testing out RHL9. Then
when the errata kernels for 7.3 moved to the 2.4.20 series we saw the
problem appear there. This was some of the leverage I used in
convincing them to move to RHEL3. The fact that I'm now seeing the
same problem with RHEL3's kernel is not making me look very good and
it is putting our plans to move to RHEL3 in production on hold.
The reason why this 2nd problem is such an important issue is that the
people running the simulation software tend to write their codes in
such a way that they seldom if ever touch swap. They want to use every
last bit of memory available without touching swap and having to pay
the performance penalty. If this number changes from run to run, then
they are rather upset. Also, it appears that on the second and
subsequent runs, it hits swap much sooner. On the first run it would
get to about 1.8 GB before it started swapping. This is comparable to
what we have been seeing with 2.4.18. However, the subsequent runs
seem to begin swapping at around 1.2GB. This upsets the application
developers who feel that they have lost 600 MB of available RAM.
Currently, it appears like this problem only happens on ia32. ia64
seems to either not have the problem or it takes more to provoke it.
I have attached the little test program. Could someone with a deeper
understanding of the kernel's VM subsystem, please run this little
test program. You may have to comment out the printing and tweak with
how many nodes you create on the linked list based upon how much
memory is in your box.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. compile the attached program (the program is designed to trigger
the bug on a 2GB machine. If you have more or less RAM you may need to
tweak the number of items it puts on the linked list. I also found
removing the printf's helpful.)
2. time ./a.out
3. repeat several times
Actual Results: 1) in the best case performance about half as good as
seen on the last 2.4.18 errata kernel for 7.3
2) inconsistant performance between runs.
3) A sharp decrease in the amount of memory that can be used before
there is a notable performance degregation between runs.
Expected Results: 1) performance on par with 7.3
2) consistant performance between runs of the program
3) the same amount of memory available between runs before a
performance degregation kicks in.
Created attachment 96415 [details]
test program that illustrates the problem.
Additionally under Expected results, we would like to add:
4) roughly the same amount of available RAM before performance is
impacted as we see with the latter 2.4.18 errata kernels.
Created attachment 96436 [details]
new version of program that reproduces the problem
These are results from 2.4.21-4.0.1EL
toad5@ben:/usr/bin/time -v ./a.out
build=28 sec traverse 237 sec total=265 sec
Command being timed: "./a.out"
User time (seconds): 1.74
System time (seconds): 16.43
Percent of CPU this job got: 6%
Elapsed (wall clock) time (h:mm:ss or m:ss): 4:26.82
Major (requiring I/O) page faults: 43291
Minor (reclaiming a frame) page faults: 1042095
These are results from 2.4.21-5EL
toad6@ben:/usr/bin/time -v ./a.out
build=29 sec traverse 230 sec total=259 sec
Command being timed: "./a.out"
User time (seconds): 1.40
System time (seconds): 18.79
Percent of CPU this job got: 7%
Elapsed (wall clock) time (h:mm:ss or m:ss): 4:20.44
Major (requiring I/O) page faults: 264199
Minor (reclaiming a frame) page faults: 822589
These are the results with a 2.4.18-27 kernel
Command being timed: "./a.out"
User time (seconds): 1.73
System time (seconds): 25.35
Percent of CPU this job got: 16%
Elapsed (wall clock) time (h:mm:ss or m:ss): 2:48.36
Major (requiring I/O) page faults: 309483
Minor (reclaiming a frame) page faults: 819566
I have to eat crow on this one. We determined that a non-obvious
difference in the node configuration (different drive speeds and
partitioning) lead to the vast majority of the performance differences
between the two runs. Once we corrected for this problem the
performance difference dropped from 120% slow down to a 10% slowdown
which may or may not be caused by the VM.
However, the customer is still spooked that there may still be
gremlins hiding in the VM subsystem and that our reproducer may have
just just failed to replicate the issue. We have seen some anomolous
performance variations between the two kernels but we have yet to
isolate them down to a reproducible state.