Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 670063 - pages stuck in ksm pages_volatile
pages stuck in ksm pages_volatile
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.1
x86_64 Linux
medium Severity medium
: rc
: ---
Assigned To: Andrea Arcangeli
Caspar Zhang
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2011-01-16 22:14 EST by Qian Cai
Modified: 2013-07-03 03:27 EDT (History)
4 users (show)

See Also:
Fixed In Version: kernel-2.6.32-121.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-05-23 16:37:48 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0542 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 6.1 kernel security, bug fix and enhancement update 2011-05-19 07:58:07 EDT

  None (edit)
Description Qian Cai 2011-01-16 22:14:31 EST
Description of problem:
LTP ksm01 failed due to incorrect pages_volatile values.

# ./ksm01 -s 512
ksm01       0  TINFO  :  child 0 stops.
ksm01       0  TINFO  :  KSM merging...
ksm01       0  TINFO  :  child 1 stops.
ksm01       0  TINFO  :  wait for all children to stop.
ksm01       0  TINFO  :  child 2 stops.
ksm01       0  TINFO  :  resume all children.
ksm01       0  TINFO  :  child 0 continues...
ksm01       0  TINFO  :  child 1 continues...
ksm01       0  TINFO  :  child 2 continues...
ksm01       0  TINFO  :  child 0 allocates 512 MB filled with 'c'.
ksm01       0  TINFO  :  child 1 allocates 512 MB filled with 'a'.
ksm01       0  TINFO  :  child 2 allocates 512 MB filled with 'a'.
ksm01       0  TINFO  :  child 1 stops.
ksm01       0  TINFO  :  child 2 stops.
ksm01       0  TINFO  :  child 0 stops.
ksm01       0  TINFO  :  check!
ksm01       0  TINFO  :  run is 1.
ksm01       0  TINFO  :  pages_shared is 2.
ksm01       0  TINFO  :  pages_sharing is 393214.
ksm01       0  TINFO  :  pages_volatile is 0.
ksm01       0  TINFO  :  pages_unshared is 0.
ksm01       0  TINFO  :  sleep_millisecs is 0.
ksm01       0  TINFO  :  pages_to_scan is 393216.
ksm01       0  TINFO  :  wait for child 1 to stop.
ksm01       0  TINFO  :  resume child 1.
ksm01       0  TINFO  :  child 1 continues...
ksm01       0  TINFO  :  child 1 verifies memory content.
ksm01       0  TINFO  :  child 1 changes memory content to 'b'.
ksm01       0  TINFO  :  child 1 stops.
ksm01       0  TINFO  :  check!
ksm01       0  TINFO  :  run is 1.
ksm01       0  TINFO  :  pages_shared is 3.
ksm01       0  TINFO  :  pages_sharing is 393213.
ksm01       0  TINFO  :  pages_volatile is 0.
ksm01       0  TINFO  :  pages_unshared is 0.
ksm01       0  TINFO  :  sleep_millisecs is 0.
ksm01       0  TINFO  :  pages_to_scan is 393216.
ksm01       0  TINFO  :  wait for child 1 to stop.
ksm01       0  TINFO  :  resume all children.
ksm01       0  TINFO  :  child 1 continues...
ksm01       0  TINFO  :  child 0 continues...
ksm01       0  TINFO  :  child 2 continues...
ksm01       0  TINFO  :  child 1 verifies memory content.
ksm01       0  TINFO  :  child 2 changes memory content to 'd'
ksm01       0  TINFO  :  child 0 verifies memory content.
ksm01       0  TINFO  :  child 0 changes memory content to 'd'.
ksm01       0  TINFO  :  child 1 changes memory content to 'd'
ksm01       0  TINFO  :  child 2 stops.
ksm01       0  TINFO  :  check!
ksm01       0  TINFO  :  run is 1.
ksm01       0  TINFO  :  pages_shared is 3.
ksm01       1  TFAIL  :  pages_shared is not 1.
ksm01       0  TINFO  :  pages_sharing is 393192.
ksm01       2  TFAIL  :  pages_sharing is not 393215.
ksm01       0  TINFO  :  pages_volatile is 22.
ksm01       3  TFAIL  :  pages_volatile is not 0.
ksm01       0  TINFO  :  pages_unshared is 0.
ksm01       0  TINFO  :  sleep_millisecs is 0.
ksm01       0  TINFO  :  pages_to_scan is 393216.
ksm01       0  TINFO  :  wait for all children to stop.
ksm01       0  TINFO  :  child 0 stops.
ksm01       0  TINFO  :  resume child 1.
ksm01       0  TINFO  :  child 1 continues...
ksm01       0  TINFO  :  child 1 verifies memory content.
ksm01       0  TINFO  :  child 1 changes one page to 'e'.
ksm01       0  TINFO  :  child 1 stops.
ksm01       0  TINFO  :  check!
ksm01       0  TINFO  :  run is 1.
ksm01       0  TINFO  :  pages_shared is 1.
ksm01       0  TINFO  :  pages_sharing is 393214.
ksm01       0  TINFO  :  pages_volatile is 0.
ksm01       0  TINFO  :  pages_unshared is 1.
ksm01       0  TINFO  :  sleep_millisecs is 0.
ksm01       0  TINFO  :  pages_to_scan is 393216.
ksm01       0  TINFO  :  wait for child 1 to stop.
ksm01       0  TINFO  :  resume all children.
ksm01       0  TINFO  :  KSM unmerging...
ksm01       0  TINFO  :  child 0 continues...
ksm01       0  TINFO  :  child 1 continues...
ksm01       0  TINFO  :  child 2 continues...
ksm01       0  TINFO  :  child 1 verifies memory content.
ksm01       0  TINFO  :  child 2 stops.
ksm01       0  TINFO  :  child 1 verifies memory content.
ksm01       0  TINFO  :  child 0 verifies memory content.
ksm01       0  TINFO  :  child 1 stops.
ksm01       0  TINFO  :  child 0 stops.
ksm01       0  TINFO  :  check!
ksm01       0  TINFO  :  run is 2.
ksm01       0  TINFO  :  pages_shared is 0.
ksm01       0  TINFO  :  pages_sharing is 0.
ksm01       0  TINFO  :  pages_volatile is 0.
ksm01       0  TINFO  :  pages_unshared is 0.
ksm01       0  TINFO  :  sleep_millisecs is 0.
ksm01       0  TINFO  :  pages_to_scan is 393216.
ksm01       0  TINFO  :  wait for all children to stop.
ksm01       0  TINFO  :  resume all children.
ksm01       0  TINFO  :  stop KSM.
ksm01       0  TINFO  :  child 0 continues...
ksm01       0  TINFO  :  child 2 continues...
ksm01       0  TINFO  :  child 1 continues...
ksm01       0  TINFO  :  check!
ksm01       0  TINFO  :  run is 0.
ksm01       0  TINFO  :  pages_shared is 0.
ksm01       0  TINFO  :  pages_sharing is 0.
ksm01       0  TINFO  :  pages_volatile is 0.
ksm01       0  TINFO  :  pages_unshared is 0.
ksm01       0  TINFO  :  sleep_millisecs is 0.
ksm01       0  TINFO  :  pages_to_scan is 393216.

# numactl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 4 5
node 0 size: 2047 MB
node 0 free: 1652 MB
node 1 cpus: 6 7 8 9 10 11
node 1 size: 2045 MB
node 1 free: 1791 MB
node 2 cpus: 18 19 20 21 22 23
node 2 size: 2048 MB
node 2 free: 1929 MB
node 3 cpus: 12 13 14 15 16 17
node 3 size: 2048 MB
node 3 free: 1743 MB
node distances:
node   0   1   2   3 
  0:  10  16  16  16 
  1:  16  10  16  16 
  2:  16  16  10  16 
  3:  16  16  16  10 

Version-Release number of selected component (if applicable):
kernel in bug 647334#c7

How reproducible:
always

Steps to Reproduce:
1. git clone git://git.engineering.redhat.com/users/qcai/ltp.git
2. cd ltp; make autotools; ./configure; make
3. cd testcases/kernel/mem/ksm/
4. ./ksm01 -s 512 (depends on memory size)
  
Actual results:
The test failed

Expected results:
The test passed

Additional info:
If the memory allocation size in the test went up above a certain size, the test then started to fail. For example, in the above example on the AMD Magny-Cours system, it failed when memory allocation size was 512M and beyond. When it was 384M and below, it passed.

On an Intel Nehalem-EX system, when it was 640M and beyond, it failed.

# numactl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 4 5 6 7 32 33 34 35 36 37 38 39
node 0 size: 16265 MB
node 0 free: 15234 MB
node 1 cpus: 8 9 10 11 12 13 14 15 40 41 42 43 44 45 46 47
node 1 size: 16384 MB
node 1 free: 15718 MB
node 2 cpus: 16 17 18 19 20 21 22 23 48 49 50 51 52 53 54 55
node 2 size: 16384 MB
node 2 free: 15781 MB
node 3 cpus: 24 25 26 27 28 29 30 31 56 57 58 59 60 61 62 63
node 3 size: 16384 MB
node 3 free: 15726 MB
node distances:
node   0   1   2   3 
  0:  10  21  21  21 
  1:  21  10  21  21 
  2:  21  21  10  21 
  3:  21  21  21  10 

# ./ksm01 -s 640
ksm01       0  TINFO  :  child 0 stops.
ksm01       0  TINFO  :  KSM merging...
ksm01       0  TINFO  :  wait for all children to stop.
ksm01       0  TINFO  :  child 1 stops.
ksm01       0  TINFO  :  child 2 stops.
ksm01       0  TINFO  :  resume all children.
ksm01       0  TINFO  :  child 1 continues...
ksm01       0  TINFO  :  child 1 allocates 640 MB filled with 'a'.
ksm01       0  TINFO  :  child 2 continues...
ksm01       0  TINFO  :  child 2 allocates 640 MB filled with 'a'.
ksm01       0  TINFO  :  child 0 continues...
ksm01       0  TINFO  :  child 0 allocates 640 MB filled with 'c'.
ksm01       0  TINFO  :  child 2 stops.
ksm01       0  TINFO  :  child 0 stops.
ksm01       0  TINFO  :  child 1 stops.
ksm01       0  TINFO  :  check!
ksm01       0  TINFO  :  run is 1.
ksm01       0  TINFO  :  pages_shared is 2.
ksm01       0  TINFO  :  pages_sharing is 491518.
ksm01       0  TINFO  :  pages_volatile is 0.
ksm01       0  TINFO  :  pages_unshared is 0.
ksm01       0  TINFO  :  sleep_millisecs is 0.
ksm01       0  TINFO  :  pages_to_scan is 491520.
ksm01       0  TINFO  :  wait for child 1 to stop.
ksm01       0  TINFO  :  resume child 1.
ksm01       0  TINFO  :  child 1 continues...
ksm01       0  TINFO  :  child 1 verifies memory content.
ksm01       0  TINFO  :  child 1 changes memory content to 'b'.
ksm01       0  TINFO  :  child 1 stops.
ksm01       0  TINFO  :  check!
ksm01       0  TINFO  :  run is 1.
ksm01       0  TINFO  :  pages_shared is 3.
ksm01       0  TINFO  :  pages_sharing is 491517.
ksm01       0  TINFO  :  pages_volatile is 0.
ksm01       0  TINFO  :  pages_unshared is 0.
ksm01       0  TINFO  :  sleep_millisecs is 0.
ksm01       0  TINFO  :  pages_to_scan is 491520.
ksm01       0  TINFO  :  wait for child 1 to stop.
ksm01       0  TINFO  :  resume all children.
ksm01       0  TINFO  :  child 2 continues...
ksm01       0  TINFO  :  child 2 changes memory content to 'd'
ksm01       0  TINFO  :  child 0 continues...
ksm01       0  TINFO  :  child 1 continues...
ksm01       0  TINFO  :  child 1 verifies memory content.
ksm01       0  TINFO  :  child 0 verifies memory content.
ksm01       0  TINFO  :  child 2 stops.
ksm01       0  TINFO  :  child 0 changes memory content to 'd'.
ksm01       0  TINFO  :  child 1 changes memory content to 'd'
ksm01       0  TINFO  :  child 0 stops.
ksm01       0  TINFO  :  check!
ksm01       0  TINFO  :  run is 1.
ksm01       0  TINFO  :  pages_shared is 1.
ksm01       0  TINFO  :  pages_sharing is 438415.
ksm01       1  TFAIL  :  pages_sharing is not 491519.
ksm01       0  TINFO  :  pages_volatile is 53099.
ksm01       2  TFAIL  :  pages_volatile is not 0.
ksm01       0  TINFO  :  pages_unshared is 0.
ksm01       0  TINFO  :  sleep_millisecs is 0.
ksm01       0  TINFO  :  pages_to_scan is 491520.
ksm01       0  TINFO  :  wait for all children to stop.
ksm01       0  TINFO  :  resume child 1.
ksm01       0  TINFO  :  child 1 continues...
ksm01       0  TINFO  :  child 1 verifies memory content.
ksm01       0  TINFO  :  child 1 changes one page to 'e'.
ksm01       0  TINFO  :  child 1 stops.
ksm01       0  TINFO  :  check!
ksm01       0  TINFO  :  run is 1.
ksm01       0  TINFO  :  pages_shared is 1.
ksm01       0  TINFO  :  pages_sharing is 491518.
ksm01       0  TINFO  :  pages_volatile is 0.
ksm01       0  TINFO  :  pages_unshared is 1.
ksm01       0  TINFO  :  sleep_millisecs is 0.
ksm01       0  TINFO  :  pages_to_scan is 491520.
ksm01       0  TINFO  :  wait for child 1 to stop.
ksm01       0  TINFO  :  resume all children.
ksm01       0  TINFO  :  KSM unmerging...
ksm01       0  TINFO  :  child 2 continues...
ksm01       0  TINFO  :  child 0 continues...
ksm01       0  TINFO  :  child 2 stops.
ksm01       0  TINFO  :  child 1 continues...
ksm01       0  TINFO  :  child 1 verifies memory content.
ksm01       0  TINFO  :  child 1 verifies memory content.
ksm01       0  TINFO  :  child 0 verifies memory content.
ksm01       0  TINFO  :  child 0 stops.
ksm01       0  TINFO  :  child 1 stops.
ksm01       0  TINFO  :  check!
ksm01       0  TINFO  :  run is 2.
ksm01       0  TINFO  :  pages_shared is 0.
ksm01       0  TINFO  :  pages_sharing is 0.
ksm01       0  TINFO  :  pages_volatile is 0.
ksm01       0  TINFO  :  pages_unshared is 0.
ksm01       0  TINFO  :  sleep_millisecs is 0.
ksm01       0  TINFO  :  pages_to_scan is 491520.
ksm01       0  TINFO  :  wait for all children to stop.
ksm01       0  TINFO  :  resume all children.
ksm01       0  TINFO  :  stop KSM.
ksm01       0  TINFO  :  child 2 continues...
ksm01       0  TINFO  :  child 0 continues...
ksm01       0  TINFO  :  child 1 continues...
ksm01       0  TINFO  :  check!
ksm01       0  TINFO  :  run is 0.
ksm01       0  TINFO  :  pages_shared is 0.
ksm01       0  TINFO  :  pages_sharing is 0.
ksm01       0  TINFO  :  pages_volatile is 0.
ksm01       0  TINFO  :  pages_unshared is 0.
ksm01       0  TINFO  :  sleep_millisecs is 0.
ksm01       0  TINFO  :  pages_to_scan is 491520.
Comment 1 Andrea Arcangeli 2011-01-19 20:34:34 EST
I assume you're using my last kernel build that should include the fix for this.

The lru_add_drain_all happens only at the end of a full ksm scan. That means the bigger the memory size to scan, the less frequently the flush will happen.

Before reading pages_volatile, can you wait /sys/kernel/mm/ksm/full_scans to increase 2 times? that may fix it if it was a timing issue because we only drain the lru at the end of the scan.
Comment 2 Qian Cai 2011-01-19 22:42:01 EST
Yes, you are right. I'll fix the test cases.
Comment 3 Andrea Arcangeli 2011-02-14 11:39:06 EST
fix posted to rhkernel-list Message-ID: <20110214163829.GF6494@random.random>
Comment 4 RHEL Product and Program Management 2011-02-14 15:59:55 EST
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.
Comment 5 Aristeu Rozanski 2011-03-10 12:57:24 EST
Patch(es) available on kernel-2.6.32-121.el6
Comment 9 errata-xmlrpc 2011-05-23 16:37:48 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html

Note You need to log in before you can comment on or make changes to this bug.