Hide Forgot
Description of problem: LTP ksm01 failed due to incorrect pages_volatile values. # ./ksm01 -s 512 ksm01 0 TINFO : child 0 stops. ksm01 0 TINFO : KSM merging... ksm01 0 TINFO : child 1 stops. ksm01 0 TINFO : wait for all children to stop. ksm01 0 TINFO : child 2 stops. ksm01 0 TINFO : resume all children. ksm01 0 TINFO : child 0 continues... ksm01 0 TINFO : child 1 continues... ksm01 0 TINFO : child 2 continues... ksm01 0 TINFO : child 0 allocates 512 MB filled with 'c'. ksm01 0 TINFO : child 1 allocates 512 MB filled with 'a'. ksm01 0 TINFO : child 2 allocates 512 MB filled with 'a'. ksm01 0 TINFO : child 1 stops. ksm01 0 TINFO : child 2 stops. ksm01 0 TINFO : child 0 stops. ksm01 0 TINFO : check! ksm01 0 TINFO : run is 1. ksm01 0 TINFO : pages_shared is 2. ksm01 0 TINFO : pages_sharing is 393214. ksm01 0 TINFO : pages_volatile is 0. ksm01 0 TINFO : pages_unshared is 0. ksm01 0 TINFO : sleep_millisecs is 0. ksm01 0 TINFO : pages_to_scan is 393216. ksm01 0 TINFO : wait for child 1 to stop. ksm01 0 TINFO : resume child 1. ksm01 0 TINFO : child 1 continues... ksm01 0 TINFO : child 1 verifies memory content. ksm01 0 TINFO : child 1 changes memory content to 'b'. ksm01 0 TINFO : child 1 stops. ksm01 0 TINFO : check! ksm01 0 TINFO : run is 1. ksm01 0 TINFO : pages_shared is 3. ksm01 0 TINFO : pages_sharing is 393213. ksm01 0 TINFO : pages_volatile is 0. ksm01 0 TINFO : pages_unshared is 0. ksm01 0 TINFO : sleep_millisecs is 0. ksm01 0 TINFO : pages_to_scan is 393216. ksm01 0 TINFO : wait for child 1 to stop. ksm01 0 TINFO : resume all children. ksm01 0 TINFO : child 1 continues... ksm01 0 TINFO : child 0 continues... ksm01 0 TINFO : child 2 continues... ksm01 0 TINFO : child 1 verifies memory content. ksm01 0 TINFO : child 2 changes memory content to 'd' ksm01 0 TINFO : child 0 verifies memory content. ksm01 0 TINFO : child 0 changes memory content to 'd'. ksm01 0 TINFO : child 1 changes memory content to 'd' ksm01 0 TINFO : child 2 stops. ksm01 0 TINFO : check! ksm01 0 TINFO : run is 1. ksm01 0 TINFO : pages_shared is 3. ksm01 1 TFAIL : pages_shared is not 1. ksm01 0 TINFO : pages_sharing is 393192. ksm01 2 TFAIL : pages_sharing is not 393215. ksm01 0 TINFO : pages_volatile is 22. ksm01 3 TFAIL : pages_volatile is not 0. ksm01 0 TINFO : pages_unshared is 0. ksm01 0 TINFO : sleep_millisecs is 0. ksm01 0 TINFO : pages_to_scan is 393216. ksm01 0 TINFO : wait for all children to stop. ksm01 0 TINFO : child 0 stops. ksm01 0 TINFO : resume child 1. ksm01 0 TINFO : child 1 continues... ksm01 0 TINFO : child 1 verifies memory content. ksm01 0 TINFO : child 1 changes one page to 'e'. ksm01 0 TINFO : child 1 stops. ksm01 0 TINFO : check! ksm01 0 TINFO : run is 1. ksm01 0 TINFO : pages_shared is 1. ksm01 0 TINFO : pages_sharing is 393214. ksm01 0 TINFO : pages_volatile is 0. ksm01 0 TINFO : pages_unshared is 1. ksm01 0 TINFO : sleep_millisecs is 0. ksm01 0 TINFO : pages_to_scan is 393216. ksm01 0 TINFO : wait for child 1 to stop. ksm01 0 TINFO : resume all children. ksm01 0 TINFO : KSM unmerging... ksm01 0 TINFO : child 0 continues... ksm01 0 TINFO : child 1 continues... ksm01 0 TINFO : child 2 continues... ksm01 0 TINFO : child 1 verifies memory content. ksm01 0 TINFO : child 2 stops. ksm01 0 TINFO : child 1 verifies memory content. ksm01 0 TINFO : child 0 verifies memory content. ksm01 0 TINFO : child 1 stops. ksm01 0 TINFO : child 0 stops. ksm01 0 TINFO : check! ksm01 0 TINFO : run is 2. ksm01 0 TINFO : pages_shared is 0. ksm01 0 TINFO : pages_sharing is 0. ksm01 0 TINFO : pages_volatile is 0. ksm01 0 TINFO : pages_unshared is 0. ksm01 0 TINFO : sleep_millisecs is 0. ksm01 0 TINFO : pages_to_scan is 393216. ksm01 0 TINFO : wait for all children to stop. ksm01 0 TINFO : resume all children. ksm01 0 TINFO : stop KSM. ksm01 0 TINFO : child 0 continues... ksm01 0 TINFO : child 2 continues... ksm01 0 TINFO : child 1 continues... ksm01 0 TINFO : check! ksm01 0 TINFO : run is 0. ksm01 0 TINFO : pages_shared is 0. ksm01 0 TINFO : pages_sharing is 0. ksm01 0 TINFO : pages_volatile is 0. ksm01 0 TINFO : pages_unshared is 0. ksm01 0 TINFO : sleep_millisecs is 0. ksm01 0 TINFO : pages_to_scan is 393216. # numactl --hardware available: 4 nodes (0-3) node 0 cpus: 0 1 2 3 4 5 node 0 size: 2047 MB node 0 free: 1652 MB node 1 cpus: 6 7 8 9 10 11 node 1 size: 2045 MB node 1 free: 1791 MB node 2 cpus: 18 19 20 21 22 23 node 2 size: 2048 MB node 2 free: 1929 MB node 3 cpus: 12 13 14 15 16 17 node 3 size: 2048 MB node 3 free: 1743 MB node distances: node 0 1 2 3 0: 10 16 16 16 1: 16 10 16 16 2: 16 16 10 16 3: 16 16 16 10 Version-Release number of selected component (if applicable): kernel in bug 647334#c7 How reproducible: always Steps to Reproduce: 1. git clone git://git.engineering.redhat.com/users/qcai/ltp.git 2. cd ltp; make autotools; ./configure; make 3. cd testcases/kernel/mem/ksm/ 4. ./ksm01 -s 512 (depends on memory size) Actual results: The test failed Expected results: The test passed Additional info: If the memory allocation size in the test went up above a certain size, the test then started to fail. For example, in the above example on the AMD Magny-Cours system, it failed when memory allocation size was 512M and beyond. When it was 384M and below, it passed. On an Intel Nehalem-EX system, when it was 640M and beyond, it failed. # numactl --hardware available: 4 nodes (0-3) node 0 cpus: 0 1 2 3 4 5 6 7 32 33 34 35 36 37 38 39 node 0 size: 16265 MB node 0 free: 15234 MB node 1 cpus: 8 9 10 11 12 13 14 15 40 41 42 43 44 45 46 47 node 1 size: 16384 MB node 1 free: 15718 MB node 2 cpus: 16 17 18 19 20 21 22 23 48 49 50 51 52 53 54 55 node 2 size: 16384 MB node 2 free: 15781 MB node 3 cpus: 24 25 26 27 28 29 30 31 56 57 58 59 60 61 62 63 node 3 size: 16384 MB node 3 free: 15726 MB node distances: node 0 1 2 3 0: 10 21 21 21 1: 21 10 21 21 2: 21 21 10 21 3: 21 21 21 10 # ./ksm01 -s 640 ksm01 0 TINFO : child 0 stops. ksm01 0 TINFO : KSM merging... ksm01 0 TINFO : wait for all children to stop. ksm01 0 TINFO : child 1 stops. ksm01 0 TINFO : child 2 stops. ksm01 0 TINFO : resume all children. ksm01 0 TINFO : child 1 continues... ksm01 0 TINFO : child 1 allocates 640 MB filled with 'a'. ksm01 0 TINFO : child 2 continues... ksm01 0 TINFO : child 2 allocates 640 MB filled with 'a'. ksm01 0 TINFO : child 0 continues... ksm01 0 TINFO : child 0 allocates 640 MB filled with 'c'. ksm01 0 TINFO : child 2 stops. ksm01 0 TINFO : child 0 stops. ksm01 0 TINFO : child 1 stops. ksm01 0 TINFO : check! ksm01 0 TINFO : run is 1. ksm01 0 TINFO : pages_shared is 2. ksm01 0 TINFO : pages_sharing is 491518. ksm01 0 TINFO : pages_volatile is 0. ksm01 0 TINFO : pages_unshared is 0. ksm01 0 TINFO : sleep_millisecs is 0. ksm01 0 TINFO : pages_to_scan is 491520. ksm01 0 TINFO : wait for child 1 to stop. ksm01 0 TINFO : resume child 1. ksm01 0 TINFO : child 1 continues... ksm01 0 TINFO : child 1 verifies memory content. ksm01 0 TINFO : child 1 changes memory content to 'b'. ksm01 0 TINFO : child 1 stops. ksm01 0 TINFO : check! ksm01 0 TINFO : run is 1. ksm01 0 TINFO : pages_shared is 3. ksm01 0 TINFO : pages_sharing is 491517. ksm01 0 TINFO : pages_volatile is 0. ksm01 0 TINFO : pages_unshared is 0. ksm01 0 TINFO : sleep_millisecs is 0. ksm01 0 TINFO : pages_to_scan is 491520. ksm01 0 TINFO : wait for child 1 to stop. ksm01 0 TINFO : resume all children. ksm01 0 TINFO : child 2 continues... ksm01 0 TINFO : child 2 changes memory content to 'd' ksm01 0 TINFO : child 0 continues... ksm01 0 TINFO : child 1 continues... ksm01 0 TINFO : child 1 verifies memory content. ksm01 0 TINFO : child 0 verifies memory content. ksm01 0 TINFO : child 2 stops. ksm01 0 TINFO : child 0 changes memory content to 'd'. ksm01 0 TINFO : child 1 changes memory content to 'd' ksm01 0 TINFO : child 0 stops. ksm01 0 TINFO : check! ksm01 0 TINFO : run is 1. ksm01 0 TINFO : pages_shared is 1. ksm01 0 TINFO : pages_sharing is 438415. ksm01 1 TFAIL : pages_sharing is not 491519. ksm01 0 TINFO : pages_volatile is 53099. ksm01 2 TFAIL : pages_volatile is not 0. ksm01 0 TINFO : pages_unshared is 0. ksm01 0 TINFO : sleep_millisecs is 0. ksm01 0 TINFO : pages_to_scan is 491520. ksm01 0 TINFO : wait for all children to stop. ksm01 0 TINFO : resume child 1. ksm01 0 TINFO : child 1 continues... ksm01 0 TINFO : child 1 verifies memory content. ksm01 0 TINFO : child 1 changes one page to 'e'. ksm01 0 TINFO : child 1 stops. ksm01 0 TINFO : check! ksm01 0 TINFO : run is 1. ksm01 0 TINFO : pages_shared is 1. ksm01 0 TINFO : pages_sharing is 491518. ksm01 0 TINFO : pages_volatile is 0. ksm01 0 TINFO : pages_unshared is 1. ksm01 0 TINFO : sleep_millisecs is 0. ksm01 0 TINFO : pages_to_scan is 491520. ksm01 0 TINFO : wait for child 1 to stop. ksm01 0 TINFO : resume all children. ksm01 0 TINFO : KSM unmerging... ksm01 0 TINFO : child 2 continues... ksm01 0 TINFO : child 0 continues... ksm01 0 TINFO : child 2 stops. ksm01 0 TINFO : child 1 continues... ksm01 0 TINFO : child 1 verifies memory content. ksm01 0 TINFO : child 1 verifies memory content. ksm01 0 TINFO : child 0 verifies memory content. ksm01 0 TINFO : child 0 stops. ksm01 0 TINFO : child 1 stops. ksm01 0 TINFO : check! ksm01 0 TINFO : run is 2. ksm01 0 TINFO : pages_shared is 0. ksm01 0 TINFO : pages_sharing is 0. ksm01 0 TINFO : pages_volatile is 0. ksm01 0 TINFO : pages_unshared is 0. ksm01 0 TINFO : sleep_millisecs is 0. ksm01 0 TINFO : pages_to_scan is 491520. ksm01 0 TINFO : wait for all children to stop. ksm01 0 TINFO : resume all children. ksm01 0 TINFO : stop KSM. ksm01 0 TINFO : child 2 continues... ksm01 0 TINFO : child 0 continues... ksm01 0 TINFO : child 1 continues... ksm01 0 TINFO : check! ksm01 0 TINFO : run is 0. ksm01 0 TINFO : pages_shared is 0. ksm01 0 TINFO : pages_sharing is 0. ksm01 0 TINFO : pages_volatile is 0. ksm01 0 TINFO : pages_unshared is 0. ksm01 0 TINFO : sleep_millisecs is 0. ksm01 0 TINFO : pages_to_scan is 491520.
I assume you're using my last kernel build that should include the fix for this. The lru_add_drain_all happens only at the end of a full ksm scan. That means the bigger the memory size to scan, the less frequently the flush will happen. Before reading pages_volatile, can you wait /sys/kernel/mm/ksm/full_scans to increase 2 times? that may fix it if it was a timing issue because we only drain the lru at the end of the scan.
Yes, you are right. I'll fix the test cases.
fix posted to rhkernel-list Message-ID: <20110214163829.GF6494>
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Patch(es) available on kernel-2.6.32-121.el6
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0542.html