RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 670063 - pages stuck in ksm pages_volatile
Summary: pages stuck in ksm pages_volatile
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.1
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Andrea Arcangeli
QA Contact: Caspar Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-01-17 03:14 UTC by Qian Cai
Modified: 2013-07-03 07:27 UTC (History)
4 users (show)

Fixed In Version: kernel-2.6.32-121.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-05-23 20:37:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0542 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 6.1 kernel security, bug fix and enhancement update 2011-05-19 11:58:07 UTC

Description Qian Cai 2011-01-17 03:14:31 UTC
Description of problem:
LTP ksm01 failed due to incorrect pages_volatile values.

# ./ksm01 -s 512
ksm01       0  TINFO  :  child 0 stops.
ksm01       0  TINFO  :  KSM merging...
ksm01       0  TINFO  :  child 1 stops.
ksm01       0  TINFO  :  wait for all children to stop.
ksm01       0  TINFO  :  child 2 stops.
ksm01       0  TINFO  :  resume all children.
ksm01       0  TINFO  :  child 0 continues...
ksm01       0  TINFO  :  child 1 continues...
ksm01       0  TINFO  :  child 2 continues...
ksm01       0  TINFO  :  child 0 allocates 512 MB filled with 'c'.
ksm01       0  TINFO  :  child 1 allocates 512 MB filled with 'a'.
ksm01       0  TINFO  :  child 2 allocates 512 MB filled with 'a'.
ksm01       0  TINFO  :  child 1 stops.
ksm01       0  TINFO  :  child 2 stops.
ksm01       0  TINFO  :  child 0 stops.
ksm01       0  TINFO  :  check!
ksm01       0  TINFO  :  run is 1.
ksm01       0  TINFO  :  pages_shared is 2.
ksm01       0  TINFO  :  pages_sharing is 393214.
ksm01       0  TINFO  :  pages_volatile is 0.
ksm01       0  TINFO  :  pages_unshared is 0.
ksm01       0  TINFO  :  sleep_millisecs is 0.
ksm01       0  TINFO  :  pages_to_scan is 393216.
ksm01       0  TINFO  :  wait for child 1 to stop.
ksm01       0  TINFO  :  resume child 1.
ksm01       0  TINFO  :  child 1 continues...
ksm01       0  TINFO  :  child 1 verifies memory content.
ksm01       0  TINFO  :  child 1 changes memory content to 'b'.
ksm01       0  TINFO  :  child 1 stops.
ksm01       0  TINFO  :  check!
ksm01       0  TINFO  :  run is 1.
ksm01       0  TINFO  :  pages_shared is 3.
ksm01       0  TINFO  :  pages_sharing is 393213.
ksm01       0  TINFO  :  pages_volatile is 0.
ksm01       0  TINFO  :  pages_unshared is 0.
ksm01       0  TINFO  :  sleep_millisecs is 0.
ksm01       0  TINFO  :  pages_to_scan is 393216.
ksm01       0  TINFO  :  wait for child 1 to stop.
ksm01       0  TINFO  :  resume all children.
ksm01       0  TINFO  :  child 1 continues...
ksm01       0  TINFO  :  child 0 continues...
ksm01       0  TINFO  :  child 2 continues...
ksm01       0  TINFO  :  child 1 verifies memory content.
ksm01       0  TINFO  :  child 2 changes memory content to 'd'
ksm01       0  TINFO  :  child 0 verifies memory content.
ksm01       0  TINFO  :  child 0 changes memory content to 'd'.
ksm01       0  TINFO  :  child 1 changes memory content to 'd'
ksm01       0  TINFO  :  child 2 stops.
ksm01       0  TINFO  :  check!
ksm01       0  TINFO  :  run is 1.
ksm01       0  TINFO  :  pages_shared is 3.
ksm01       1  TFAIL  :  pages_shared is not 1.
ksm01       0  TINFO  :  pages_sharing is 393192.
ksm01       2  TFAIL  :  pages_sharing is not 393215.
ksm01       0  TINFO  :  pages_volatile is 22.
ksm01       3  TFAIL  :  pages_volatile is not 0.
ksm01       0  TINFO  :  pages_unshared is 0.
ksm01       0  TINFO  :  sleep_millisecs is 0.
ksm01       0  TINFO  :  pages_to_scan is 393216.
ksm01       0  TINFO  :  wait for all children to stop.
ksm01       0  TINFO  :  child 0 stops.
ksm01       0  TINFO  :  resume child 1.
ksm01       0  TINFO  :  child 1 continues...
ksm01       0  TINFO  :  child 1 verifies memory content.
ksm01       0  TINFO  :  child 1 changes one page to 'e'.
ksm01       0  TINFO  :  child 1 stops.
ksm01       0  TINFO  :  check!
ksm01       0  TINFO  :  run is 1.
ksm01       0  TINFO  :  pages_shared is 1.
ksm01       0  TINFO  :  pages_sharing is 393214.
ksm01       0  TINFO  :  pages_volatile is 0.
ksm01       0  TINFO  :  pages_unshared is 1.
ksm01       0  TINFO  :  sleep_millisecs is 0.
ksm01       0  TINFO  :  pages_to_scan is 393216.
ksm01       0  TINFO  :  wait for child 1 to stop.
ksm01       0  TINFO  :  resume all children.
ksm01       0  TINFO  :  KSM unmerging...
ksm01       0  TINFO  :  child 0 continues...
ksm01       0  TINFO  :  child 1 continues...
ksm01       0  TINFO  :  child 2 continues...
ksm01       0  TINFO  :  child 1 verifies memory content.
ksm01       0  TINFO  :  child 2 stops.
ksm01       0  TINFO  :  child 1 verifies memory content.
ksm01       0  TINFO  :  child 0 verifies memory content.
ksm01       0  TINFO  :  child 1 stops.
ksm01       0  TINFO  :  child 0 stops.
ksm01       0  TINFO  :  check!
ksm01       0  TINFO  :  run is 2.
ksm01       0  TINFO  :  pages_shared is 0.
ksm01       0  TINFO  :  pages_sharing is 0.
ksm01       0  TINFO  :  pages_volatile is 0.
ksm01       0  TINFO  :  pages_unshared is 0.
ksm01       0  TINFO  :  sleep_millisecs is 0.
ksm01       0  TINFO  :  pages_to_scan is 393216.
ksm01       0  TINFO  :  wait for all children to stop.
ksm01       0  TINFO  :  resume all children.
ksm01       0  TINFO  :  stop KSM.
ksm01       0  TINFO  :  child 0 continues...
ksm01       0  TINFO  :  child 2 continues...
ksm01       0  TINFO  :  child 1 continues...
ksm01       0  TINFO  :  check!
ksm01       0  TINFO  :  run is 0.
ksm01       0  TINFO  :  pages_shared is 0.
ksm01       0  TINFO  :  pages_sharing is 0.
ksm01       0  TINFO  :  pages_volatile is 0.
ksm01       0  TINFO  :  pages_unshared is 0.
ksm01       0  TINFO  :  sleep_millisecs is 0.
ksm01       0  TINFO  :  pages_to_scan is 393216.

# numactl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 4 5
node 0 size: 2047 MB
node 0 free: 1652 MB
node 1 cpus: 6 7 8 9 10 11
node 1 size: 2045 MB
node 1 free: 1791 MB
node 2 cpus: 18 19 20 21 22 23
node 2 size: 2048 MB
node 2 free: 1929 MB
node 3 cpus: 12 13 14 15 16 17
node 3 size: 2048 MB
node 3 free: 1743 MB
node distances:
node   0   1   2   3 
  0:  10  16  16  16 
  1:  16  10  16  16 
  2:  16  16  10  16 
  3:  16  16  16  10 

Version-Release number of selected component (if applicable):
kernel in bug 647334#c7

How reproducible:
always

Steps to Reproduce:
1. git clone git://git.engineering.redhat.com/users/qcai/ltp.git
2. cd ltp; make autotools; ./configure; make
3. cd testcases/kernel/mem/ksm/
4. ./ksm01 -s 512 (depends on memory size)
  
Actual results:
The test failed

Expected results:
The test passed

Additional info:
If the memory allocation size in the test went up above a certain size, the test then started to fail. For example, in the above example on the AMD Magny-Cours system, it failed when memory allocation size was 512M and beyond. When it was 384M and below, it passed.

On an Intel Nehalem-EX system, when it was 640M and beyond, it failed.

# numactl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 4 5 6 7 32 33 34 35 36 37 38 39
node 0 size: 16265 MB
node 0 free: 15234 MB
node 1 cpus: 8 9 10 11 12 13 14 15 40 41 42 43 44 45 46 47
node 1 size: 16384 MB
node 1 free: 15718 MB
node 2 cpus: 16 17 18 19 20 21 22 23 48 49 50 51 52 53 54 55
node 2 size: 16384 MB
node 2 free: 15781 MB
node 3 cpus: 24 25 26 27 28 29 30 31 56 57 58 59 60 61 62 63
node 3 size: 16384 MB
node 3 free: 15726 MB
node distances:
node   0   1   2   3 
  0:  10  21  21  21 
  1:  21  10  21  21 
  2:  21  21  10  21 
  3:  21  21  21  10 

# ./ksm01 -s 640
ksm01       0  TINFO  :  child 0 stops.
ksm01       0  TINFO  :  KSM merging...
ksm01       0  TINFO  :  wait for all children to stop.
ksm01       0  TINFO  :  child 1 stops.
ksm01       0  TINFO  :  child 2 stops.
ksm01       0  TINFO  :  resume all children.
ksm01       0  TINFO  :  child 1 continues...
ksm01       0  TINFO  :  child 1 allocates 640 MB filled with 'a'.
ksm01       0  TINFO  :  child 2 continues...
ksm01       0  TINFO  :  child 2 allocates 640 MB filled with 'a'.
ksm01       0  TINFO  :  child 0 continues...
ksm01       0  TINFO  :  child 0 allocates 640 MB filled with 'c'.
ksm01       0  TINFO  :  child 2 stops.
ksm01       0  TINFO  :  child 0 stops.
ksm01       0  TINFO  :  child 1 stops.
ksm01       0  TINFO  :  check!
ksm01       0  TINFO  :  run is 1.
ksm01       0  TINFO  :  pages_shared is 2.
ksm01       0  TINFO  :  pages_sharing is 491518.
ksm01       0  TINFO  :  pages_volatile is 0.
ksm01       0  TINFO  :  pages_unshared is 0.
ksm01       0  TINFO  :  sleep_millisecs is 0.
ksm01       0  TINFO  :  pages_to_scan is 491520.
ksm01       0  TINFO  :  wait for child 1 to stop.
ksm01       0  TINFO  :  resume child 1.
ksm01       0  TINFO  :  child 1 continues...
ksm01       0  TINFO  :  child 1 verifies memory content.
ksm01       0  TINFO  :  child 1 changes memory content to 'b'.
ksm01       0  TINFO  :  child 1 stops.
ksm01       0  TINFO  :  check!
ksm01       0  TINFO  :  run is 1.
ksm01       0  TINFO  :  pages_shared is 3.
ksm01       0  TINFO  :  pages_sharing is 491517.
ksm01       0  TINFO  :  pages_volatile is 0.
ksm01       0  TINFO  :  pages_unshared is 0.
ksm01       0  TINFO  :  sleep_millisecs is 0.
ksm01       0  TINFO  :  pages_to_scan is 491520.
ksm01       0  TINFO  :  wait for child 1 to stop.
ksm01       0  TINFO  :  resume all children.
ksm01       0  TINFO  :  child 2 continues...
ksm01       0  TINFO  :  child 2 changes memory content to 'd'
ksm01       0  TINFO  :  child 0 continues...
ksm01       0  TINFO  :  child 1 continues...
ksm01       0  TINFO  :  child 1 verifies memory content.
ksm01       0  TINFO  :  child 0 verifies memory content.
ksm01       0  TINFO  :  child 2 stops.
ksm01       0  TINFO  :  child 0 changes memory content to 'd'.
ksm01       0  TINFO  :  child 1 changes memory content to 'd'
ksm01       0  TINFO  :  child 0 stops.
ksm01       0  TINFO  :  check!
ksm01       0  TINFO  :  run is 1.
ksm01       0  TINFO  :  pages_shared is 1.
ksm01       0  TINFO  :  pages_sharing is 438415.
ksm01       1  TFAIL  :  pages_sharing is not 491519.
ksm01       0  TINFO  :  pages_volatile is 53099.
ksm01       2  TFAIL  :  pages_volatile is not 0.
ksm01       0  TINFO  :  pages_unshared is 0.
ksm01       0  TINFO  :  sleep_millisecs is 0.
ksm01       0  TINFO  :  pages_to_scan is 491520.
ksm01       0  TINFO  :  wait for all children to stop.
ksm01       0  TINFO  :  resume child 1.
ksm01       0  TINFO  :  child 1 continues...
ksm01       0  TINFO  :  child 1 verifies memory content.
ksm01       0  TINFO  :  child 1 changes one page to 'e'.
ksm01       0  TINFO  :  child 1 stops.
ksm01       0  TINFO  :  check!
ksm01       0  TINFO  :  run is 1.
ksm01       0  TINFO  :  pages_shared is 1.
ksm01       0  TINFO  :  pages_sharing is 491518.
ksm01       0  TINFO  :  pages_volatile is 0.
ksm01       0  TINFO  :  pages_unshared is 1.
ksm01       0  TINFO  :  sleep_millisecs is 0.
ksm01       0  TINFO  :  pages_to_scan is 491520.
ksm01       0  TINFO  :  wait for child 1 to stop.
ksm01       0  TINFO  :  resume all children.
ksm01       0  TINFO  :  KSM unmerging...
ksm01       0  TINFO  :  child 2 continues...
ksm01       0  TINFO  :  child 0 continues...
ksm01       0  TINFO  :  child 2 stops.
ksm01       0  TINFO  :  child 1 continues...
ksm01       0  TINFO  :  child 1 verifies memory content.
ksm01       0  TINFO  :  child 1 verifies memory content.
ksm01       0  TINFO  :  child 0 verifies memory content.
ksm01       0  TINFO  :  child 0 stops.
ksm01       0  TINFO  :  child 1 stops.
ksm01       0  TINFO  :  check!
ksm01       0  TINFO  :  run is 2.
ksm01       0  TINFO  :  pages_shared is 0.
ksm01       0  TINFO  :  pages_sharing is 0.
ksm01       0  TINFO  :  pages_volatile is 0.
ksm01       0  TINFO  :  pages_unshared is 0.
ksm01       0  TINFO  :  sleep_millisecs is 0.
ksm01       0  TINFO  :  pages_to_scan is 491520.
ksm01       0  TINFO  :  wait for all children to stop.
ksm01       0  TINFO  :  resume all children.
ksm01       0  TINFO  :  stop KSM.
ksm01       0  TINFO  :  child 2 continues...
ksm01       0  TINFO  :  child 0 continues...
ksm01       0  TINFO  :  child 1 continues...
ksm01       0  TINFO  :  check!
ksm01       0  TINFO  :  run is 0.
ksm01       0  TINFO  :  pages_shared is 0.
ksm01       0  TINFO  :  pages_sharing is 0.
ksm01       0  TINFO  :  pages_volatile is 0.
ksm01       0  TINFO  :  pages_unshared is 0.
ksm01       0  TINFO  :  sleep_millisecs is 0.
ksm01       0  TINFO  :  pages_to_scan is 491520.

Comment 1 Andrea Arcangeli 2011-01-20 01:34:34 UTC
I assume you're using my last kernel build that should include the fix for this.

The lru_add_drain_all happens only at the end of a full ksm scan. That means the bigger the memory size to scan, the less frequently the flush will happen.

Before reading pages_volatile, can you wait /sys/kernel/mm/ksm/full_scans to increase 2 times? that may fix it if it was a timing issue because we only drain the lru at the end of the scan.

Comment 2 Qian Cai 2011-01-20 03:42:01 UTC
Yes, you are right. I'll fix the test cases.

Comment 3 Andrea Arcangeli 2011-02-14 16:39:06 UTC
fix posted to rhkernel-list Message-ID: <20110214163829.GF6494>

Comment 4 RHEL Program Management 2011-02-14 20:59:55 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 5 Aristeu Rozanski 2011-03-10 17:57:24 UTC
Patch(es) available on kernel-2.6.32-121.el6

Comment 9 errata-xmlrpc 2011-05-23 20:37:48 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html


Note You need to log in before you can comment on or make changes to this bug.