Bug 439043 - Swap Token issue with RHEL4
Summary: Swap Token issue with RHEL4
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.8
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Michal Schmidt
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks: RHEL4u8_relnotes 461297
TreeView+ depends on / blocked
 
Reported: 2008-03-26 17:26 UTC by Tomen Tse
Modified: 2009-05-20 15:31 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
With this update, the "swap_token_timeout" parameter has been added to /proc/sys/vm. This file contains valid hold time of swap out protection token. The Linux Virtual Memory (VM) subsystem has a token based thrashing control mechanism and uses the token to prevent unnecessary page faults in thrashing situation. The unit of the value is in `second`. The value would be useful to tune thrashing behavior. Setting it to 0 will disable the swap token mechanism.
Clone Of:
Environment:
Last Closed: 2009-05-18 19:21:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
RHEL4's 2.6.9-67.0.1.ELsmp demonstrates the dramatic dropoff in query throughput when the kernel reports nearly all the physical pages as being "active" (28.68 KB, image/png)
2008-03-28 17:14 UTC, Tomen Tse
no flags Details
This graph demonstrates the effect of changing the kernel so it will not respect the Swap Token and will consider reclaiming pages from the process that holds the token (24.54 KB, image/png)
2008-03-28 17:14 UTC, Tomen Tse
no flags Details
This graph demonstrates the effect of running the same previous experiment on RHEL5 (24.05 KB, image/png)
2008-03-28 17:15 UTC, Tomen Tse
no flags Details
RHEL4 patch (3.73 KB, patch)
2008-09-04 14:48 UTC, Michal Schmidt
no flags Details | Diff
RHEL4: add /proc/sys/vm/swap_token_timeout (3.71 KB, patch)
2008-12-16 13:55 UTC, Michal Schmidt
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:1024 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 4.8 kernel security and bug fix update 2009-05-18 14:57:26 UTC

Description Tomen Tse 2008-03-26 17:26:02 UTC
Description of problem:
Here is the description of the problem as explained by an ISV partner's IT team:
********
The problem we are encountering is that we have a single process which
mmaps more than the size of physical memory. Its working set is also
larger than memory, so it causes major page faults frequently. As a
result, it ends up owning almost all of the memory in the system.
Because it page faults often, it also ends up getting and holding the
swap token most of the time. When it has the swap token, no other
process, including kswapd, can make any progress in reclaiming pages,
because the swap token exempts our process from having its pages
reclaimed (unless the other processes become *really* desperate), and
there are very few pages we don't own.  Because kswapd can't keep pace,
we eventually run out of pages that are available to be reclaimed (all
pages are considered "active"), and any process which wants a page has
to synchronously reclaim one. But they can't, because we have the swap
token. The result is that kswapd and all processes except our process
end up spinning on the cpu trying to reclaim pages which are essentially
pegged because we own the swap token. This causes horrible performance
for all processes until our hold on the swap token times out. Then
things recover to a reasonable state until we relcaim the swap token
(which will happen as soon as another SWAP_TOKEN_TIMEOUT period of time
elapses).

A reasonably conservative proposal for dealing with this in an update to
RHEL4 would be to simply include a couple subsequent kernel patches: one
which adds a sysctl knob for tuning the swap token timeout, and then
another patch which makes it so the swap token is totally disabled if
swap_token_timeout is set to zero. This seems reasonable, as you could
default the knob to its old setting (300 seconds on RHEL4) and thus
avoid changing default behavior, while allowing people for whom the swap
token is pathological to mitigate the problem. In addition, this knob
already exists in RHEL5, and provides us with the desired behavior.
Below are references for the two patches. If you would like, I can
create a  patch that includes both of these and applies against RHEL4
update 6.

Adding a sysctl for swap_token_timeout:
http://git.kernel.org/?p=linux/kernel/git/torvalds/old-2.6-bkcvs.git;a=commit;h=146f46fa1ec0b76fa76bced34b4849934791532c
http://lwn.net/Articles/105136/

Allow turning the swap token off. Note this patch also sets it to be off
by default, but it probably makes more sense to have it default to its
old setting to avoid changing behavior on people:
http://git.kernel.org/?p=linux/kernel/git/torvalds/old-2.6-bkcvs.git;a=commit;h=0fdce62a8fe7c8beac20f59d67ea4438075bcab6

We can also produce a sample program that demonstrates the bad behavior if that
is helpful for you.
********

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Tomen Tse 2008-03-28 17:14:02 UTC
Created attachment 299501 [details]
RHEL4's 2.6.9-67.0.1.ELsmp demonstrates the dramatic dropoff in query throughput when the kernel reports nearly all the physical pages as being "active"

Comment 2 Tomen Tse 2008-03-28 17:14:59 UTC
Created attachment 299502 [details]
This graph demonstrates the effect of changing the kernel so it will not respect the Swap Token and will consider reclaiming pages from the process that holds the token

Comment 3 Tomen Tse 2008-03-28 17:15:39 UTC
Created attachment 299503 [details]
This graph demonstrates the effect of running the same previous experiment on RHEL5

Comment 4 Tomen Tse 2008-03-28 17:16:13 UTC
An ISV partner is experiencing a severe performance issue on RHEL4 when using server product with our customers' datasets that are larger than physical memory. This ISV partner develops and sells an Information Access Platform that includes a server component that allows our customers to answer queries that involve wsearch and navigation. The issue that we have encountered will impact all customers using RHEL4 with datasets that don't fit into the available physical memory.

As part of our internal performance testing using customer datasets, we have
discovered serious issues with the way that the Swap Token is handled in RHEL4
and therefore the way in which memory is reclaimed. In particular, we have
found that when our server process is using a majority of the memory on a
machine the kernel does not properly reclaim pages from our process to make
room for future allocations or page-ins. The graphs that I have attached show
the extreme negative impact that the swap token behavior is having on the
performance of our server, as measured in queries per second. These graphs also
demonstrate that the problem is mostly alleviated on RHEL5. From our analysis
and comparison of the 2.6.9 and 2.6.18 kernel source and as confirmed by Red
Hat's engineers, the issue was addressed in RHEL5 through changes to how the
Swap Token is managed as well as how it can be controlled through Virtual
Memory configuration options.

We have done extensive research into this problem in an effort to find an
acceptable workaround that could be implemented in our code or that involved
RHEL4 configuration options that we could recommend to our customers. In short,
we have not found a reasonable answer. It appears that the only viable option
for our customers today is to consider running a different kernel. In
particular, please consider the three attached graphs that chart the throughput
(measured in queries answered per second) as well as various Virtual Memory
statistics as gathered from /proc/meminfo and vmstat. The three graphs show the
same version of the ISV partner's software being run on 3 different kernel versions (2.6.9-67.0.1.ELsmp with and without a custom patch, as well as 2.6.18-8.el5). The first graph, RHEL4's 2.6.9-67.0.1.ELsmp demonstrates the dramatic dropoff in query throughput when the kernel reports nearly all the physical pages as being "active". The second graph demonstrates the effect of changing the kernel so it will not respect the Swap Token and will consider reclaiming pages from the process that holds the token. The last graph demonstrates the effect of running the same experiment on RHEL5 - as you can see, the changes that have been made to the kernel greatly alleviate this problem. More importantly, however, RHEL5 introduced new configuration options that permit our customers to avoid this problem altogether by effectively disabling the Swap Token on their system if they know that our server component will be the primary process running on the machine.

Our engineers found these results particularly surprising because most of
them consider Linux to be their primary Operating System and did not
expect that our scaling characteristics would be significantly worse on a RHEL
platform as compared to other operating systems on the same hardware. As currently released, RHEL4 does not behave reasonably for large-scale processes that dominate physical memory and moreover is not configurable in a way that allows our customers to get good performance out of their system as they scale. We believe that this is a critical issue for making our customers successful and will also positively effect many other RHEL4 users based on the discussions and posts that we have seen on kernel discussion lists

Comment 5 Michal Schmidt 2008-04-07 14:05:53 UTC
Thanks for the detailed description.
Granting devel ACK.

Comment 6 RHEL Program Management 2008-04-07 14:19:05 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 7 RHEL Program Management 2008-09-03 12:50:54 UTC
Updating PM score.

Comment 8 Michal Schmidt 2008-09-04 14:48:46 UTC
Created attachment 315760 [details]
RHEL4 patch

This patch adds /proc/sys/vm/swap_token_timeout to RHEL4. The default is 300 s. The value 0 disables the swap token.
It's similar to the two upstream patches mentioned, but does not change the default behavior.

A scratch Brew build is here:
https://brewweb.devel.redhat.com/taskinfo?taskID=1456231
Could you test it?

Comment 9 Tomen Tse 2008-09-19 17:45:27 UTC
Here is the response from Mike Tucker of the ISV partner who is having this issue:

"Thanks for your work on this. We haven't had a chance to verify this yet as we are in the middle of a very busy period for our Dev team and have been using a work-around to date. I will let you know as soon as we have results that verify your change."

Comment 10 Michal Schmidt 2008-12-16 13:55:32 UTC
Created attachment 327101 [details]
RHEL4: add /proc/sys/vm/swap_token_timeout

This patch was posted to rhkernel-list.

Brew scratch build: https://brewweb.devel.redhat.com/taskinfo?taskID=1615076

Comment 11 Vivek Goyal 2009-01-05 14:17:56 UTC
Committed in 78.23.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 12 Linda Wang 2009-01-23 20:28:01 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
RHEL4.8 adds the "swap_token_timeout" parameter in /proc/sys/vm.

This file contains valid hold time of swap out protection token. The Linux
VM has token based thrashing control mechanism and uses the token to prevent
unnecessary page faults in thrashing situation. The unit of the value is in 
`second`. The value would be useful to tune thrashing behavior.

Comment 13 Michael Tucker 2009-01-23 20:37:23 UTC
Hi - I noticed there was no note on here about verification. In fact, we took the build from Michal and determined that the changes resolved our issue. Thanks for your help with this.

Comment 15 Michal Schmidt 2009-01-29 10:07:31 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -3,4 +3,4 @@
 This file contains valid hold time of swap out protection token. The Linux
 VM has token based thrashing control mechanism and uses the token to prevent
 unnecessary page faults in thrashing situation. The unit of the value is in 
-`second`. The value would be useful to tune thrashing behavior.+`second`. The value would be useful to tune thrashing behavior. Setting it to 0 will disable the swap token mechanism.

Comment 17 Ryan Lerch 2009-02-19 00:27:23 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1,6 +1,3 @@
-RHEL4.8 adds the "swap_token_timeout" parameter in /proc/sys/vm.
+With this update, the "swap_token_timeout" parameter has been added to /proc/sys/vm.
 
-This file contains valid hold time of swap out protection token. The Linux
+This file contains valid hold time of swap out protection token. The Linux Virtual Memory (VM) subsystem has a token based thrashing control mechanism and uses the token to prevent unnecessary page faults in thrashing situation. The unit of the value is in `second`. The value would be useful to tune thrashing behavior. Setting it to 0 will disable the swap token mechanism.-VM has token based thrashing control mechanism and uses the token to prevent
-unnecessary page faults in thrashing situation. The unit of the value is in 
-`second`. The value would be useful to tune thrashing behavior. Setting it to 0 will disable the swap token mechanism.

Comment 19 Jan Tluka 2009-04-17 15:01:40 UTC
Patch is in -88.EL.

Comment 22 errata-xmlrpc 2009-05-18 19:21:48 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1024.html

Comment 23 Michael Tucker 2009-05-18 19:57:37 UTC
The link in the previous comment doesn't work. Can you fix the link and/or post its contents in the bug report?

Comment 24 Michal Schmidt 2009-05-20 15:31:17 UTC
Michael, the link works for me. Can you retry? Maybe there was a temporary glitch.


Note You need to log in before you can comment on or make changes to this bug.