Hide Forgot
Description of problem: AIM7 runs 2X faster on community 2.6.32.y than RHEL6.1. Testing was done on a 48p system using 8 RAM-back tmpfs file systems. The performance degradation in RHEL6.1 appears to be related to handling of anon_vma locks. The problem affects workloads that do frequent fork's w/o exec's, ie. they build large trees of shared ANON space. AIM7 is a prime example of this type of workload. Other similar workloads are some file servers, mail servers, web servers, etc. The problem is not unique to SGI UV. Any multi-socket system will likely show a performance regression on RHEL6.1 running fork intensive workloads. SuSE sles11sp1 performs similar to community 2.6.33. Version-Release number of selected component (if applicable): RHEL 6.1. How reproducible: 100%. Steps to Reproduce: 1. Run AIM7. Actual results: AIM7 war run on a 48p system on both 2.6.32.y & RHEL6.1. Results show that 2.6.32.y is ~2X faster for jobs/sec at higher loads. Tasks Jobs/Min Jobs/Min 2.6.32.y RHEL6.1 ---- ------- ------- 1 526 521 5 2678 2658 10 5352 5332 20 10704 10644 50 26340 24338 100 48387 36655 200 86429 61990 500 162393 98992 1000 227091 123617 2000 279274 153225 4000 321625 164978 8000 340756 171081 16000 354395 185592 32000 356535 192537 * Higher is better Expected results: Performance comparable to community 2.6.32.y. Additional info: AFAICT, the performance degradation in RHEL6.1 is caused by a very hot anon_vma spinlock. Running with a user load of 1000, kernel profiling (perf top) shows long periods of 50%-70% of the time in _spin_lock. The same run on 2.6.33 rarely shows more than ~10% in _spin_lock. An NMI on RHEL6 shows numerous tasks in _spin_lock with the following backtraces: [<ffffffff814ddae1>] ? _spin_lock+0x21/0x30 [<ffffffff811407f4>] ? unlink_anon_vmas+0x94/0xd0 [<ffffffff81133f94>] ? free_pgtables+0x44/0x120 [<ffffffff8113cc3d>] ? unmap_region+0xcd/0x130 [<ffffffff8113d2c6>] ? do_munmap+0x2b6/0x3a0 [<ffffffff8113d8a1>] ? sys_brk+0x121/0x130 [<ffffffff8100b172>] ? system_call_fastpath+0x16/0x1b [<ffffffff814ddae1>] ? _spin_lock+0x21/0x30 [<ffffffff811402d5>] ? anon_vma_chain_link+0x35/0x60 [<ffffffff8114087d>] ? anon_vma_clone+0x4d/0x90 [<ffffffff8113be7d>] ? __split_vma+0xcd/0x280 [<ffffffff8113d197>] ? do_munmap+0x187/0x3a0 [<ffffffff810d1b62>] ? audit_syscall_entry+0x272/0x2a0 [<ffffffff8113d8a1>] ? sys_brk+0x121/0x130 [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20 [<ffffffff8100b172>] ? system_call_fastpath+0x16/0x1b A google search for "linux aim7 anon:" shows numerous references to what appears to be this same problem: http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.27.36 http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-09/msg06653.html http://linuxkernelpanic.blogspot.com/2010/05/while-getting-in-touch-recently-with-ex.html http://comments.gmane.org/gmane.linux.kernel.mm/62645 The code in 2.6.32.y & RHEL6.1 for handling anon VMAs is different. Neither matches recent upstream kernels. The above links include a patch that is suppose to address a scaling issue with anon locks. Both SLES & RH appears to have a variant of the patch but neither matches exactly. However, more analysis is required. To get further data on the difference, I ran several different kernels. The runs were on a different hardware system (uvmid1) & the results are close but cannot be directly compared to the results above. The following is a single datapoint at 1000 users. Full AIM7 curves were not run. OS-VERSION Task/sec Wall CPU ---------- ------ ---- ------ RHEL6.1 134751 42.3 1535.8 SLES11SP1 210099 27.1 712.6 2.6.32.y 212845 26.8 772.0 2.6.39 155398 36.6 1101.0 3.0.0-rc2 127831 44.6 1103.4 Looks like a definite regression in RHEL6.1. Both SLES11SP1 & RHEL6.1 are based on the 2.6.32 kernel. The community 2.6.32.y kernel (the base for both distros) has performance similar to SLES11SP1. More recent upstream kernels have changed the code related to ANON VMAs & show regressions. In fact, there was mail yesterday about a very recent regression in this area for 3.0.0: Subject: REGRESSION: Performance regressions from switching anon_vma->lock to mutex It seems like that the recent changes to make the anon_vma->lock into a mutex (commit 2b575eb6) causes a 52% regression in throughput (2.6.39 vs 3.0-rc2) on exim mail server workload in the MOSBENCH test suite. Our test setup is on a 4 socket Westmere EX system, with 10 cores per socket. 40 clients are created on the test machine which send email to the exim server residing on the sam test machine. There is an ongoing community discussion about this 3.0.0 regression. FWIW, upstream folks claim that the locking issues are seen on 4-socket whiteboxes, but not 2-socket whiteboxes.
So is this a 6.1 regression compared to 6.0 or is 6.x slower than RHEL5 in general??? Larry Woodman
I'm not sure offhand the performance on rhel 6.0 or rhel 5. I'll see if I can get those numbers.
Created attachment 512274 [details] NMI stack trace for Aim7 on medium size UV system Partial all cpu stacktrace while running Aim7 at 2000 tasks on 512cpus. This list is representative, I eliminated idle cpu and didn't include a significant number of duplicates. George
This patch looks like its related to the NMI stacktraces George posted in comment #4: commit c35a56a090eacefca07afeb994029b57d8dd8025 Author: Theodore Ts'o <tytso> Date: Sun May 16 05:00:00 2010 -0400 jbd2: Improve scalability by not taking j_state_lock in jbd2_journal_stop() One of the most contended locks in the jbd2 layer is j_state_lock when running dbench. This is especially true if using the real-time kernel with its "sleeping spinlocks" patch that replaces spinlocks with priority inheriting mutexes --- but it also shows up on large SMP benchmarks. Thanks to John Stultz for pointing this out. Reviewed by Mingming Cao and Jan Kara. Signed-off-by: "Theodore Ts'o" <tytso> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index bfc70f5..e214d68 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -1311,7 +1311,6 @@ int jbd2_journal_stop(handle_t *handle) if (handle->h_sync) transaction->t_synchronous_commit = 1; current->journal_info = NULL; - spin_lock(&journal->j_state_lock); spin_lock(&transaction->t_handle_lock); transaction->t_outstanding_credits -= handle->h_buffer_credits; transaction->t_updates--; @@ -1340,8 +1339,7 @@ int jbd2_journal_stop(handle_t *handle) jbd_debug(2, "transaction too old, requesting commit for " "handle %p\n", handle); /* This is non-blocking */ - __jbd2_log_start_commit(journal, transaction->t_tid); - spin_unlock(&journal->j_state_lock); + jbd2_log_start_commit(journal, transaction->t_tid); /* * Special case: JBD2_SYNC synchronous updates require us @@ -1351,7 +1349,6 @@ int jbd2_journal_stop(handle_t *handle) err = jbd2_log_wait_commit(journal, tid); } else { spin_unlock(&transaction->t_handle_lock); - spin_unlock(&journal->j_state_lock); } lock_map_release(&handle->h_lockdep_map); diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index aad50fe..b8e0806 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -1314,7 +1314,6 @@ int jbd2_journal_stop(handle_t *handle) if (handle->h_sync) transaction->t_synchronous_commit = 1; current->journal_info = NULL; - spin_lock(&journal->j_state_lock); spin_lock(&transaction->t_handle_lock); transaction->t_outstanding_credits -= handle->h_buffer_credits; transaction->t_updates--; @@ -1343,8 +1342,7 @@ int jbd2_journal_stop(handle_t *handle) jbd_debug(2, "transaction too old, requesting commit for " "handle %p\n", handle); /* This is non-blocking */ - __jbd2_log_start_commit(journal, transaction->t_tid); - spin_unlock(&journal->j_state_lock); + jbd2_log_start_commit(journal, transaction->t_tid); /* * Special case: JBD2_SYNC synchronous updates require us @@ -1354,7 +1352,6 @@ int jbd2_journal_stop(handle_t *handle) err = jbd2_log_wait_commit(journal, tid); } else { spin_unlock(&transaction->t_handle_lock); - spin_unlock(&journal->j_state_lock); } lock_map_release(&handle->h_lockdep_map); 78,1 Bot
The commit in the previous comment is not in 2.6.32.y, so I'm not sure it explains the performance difference. It'd be worth testing, though.
The patch actually seems to have made things significantly worse. (NOTE, -165 kernel is base because there is a UV regression in -166 [root@uvsw-sys aim7]# /usr/bin/time --verbose ./runt "2.6.32-165 unpatched" --------------------------------------------------------------------------- Linux version 2.6.32-165.el6.x86_64 (mockbuild.bos.redhat.com) (gcc version 4.4.5 20110214 (Red Hat 4.4.5-6) (GCC) ) #1 SMP Fri Jul 1 13:16:59 EDT 2011 DATE = Mon Jul 18 08:43:21 CDT 2011 ARGS = -f -nl -y -D3600 2.6.32-165 unpatched HOST = uvsw-sys CPUS = 256 DIRS = 2 DISKS= 0 FS = ext4 CMDLINE = ro root=LABEL=uvsw-sysR14 rd_NO_LUKS rd_NO_LVM rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us crashkernel=512M virtefi selinux=0 nmi_watchdog=0 add_efi_memmap nortsched processor.max_cstate=1 log_buf_len=8M pci=hpiosize=0,hpmemsize=0,nobar nohz=off cgroup_disable=memory earlyprintk=ttyS0,115200n8 pcie_aspm=on nosoftlockup console=ttyS0,115200n8 ID = 2.6.32-165 unpatched Run 1 of 1 AIM Multiuser Benchmark - Suite VII v1.1, January 22, 1996 Copyright (C) 1996 AIM Technology All Rights Reserved Datapoint file : HZ is <100> AIM Multiuser Benchmark - Suite VII Run Beginning Tasks jobs/min jti jobs/min/task real cpu 1 510.08 100 510.0789 11.41 1.39 Mon Jul 18 08:43:33 2011 2 1030.09 99 515.0442 11.30 2.53 Mon Jul 18 08:43:45 2011 3 1526.22 99 508.7413 11.44 4.18 Mon Jul 18 08:43:57 2011 4 2034.97 99 508.7413 11.44 5.58 Mon Jul 18 08:44:09 2011 5 2528.24 99 505.6473 11.51 7.17 Mon Jul 18 08:44:20 2011 10 4511.63 98 451.1628 12.90 25.16 Mon Jul 18 08:44:33 2011 20 7320.75 96 366.0377 15.90 89.20 Mon Jul 18 08:44:50 2011 50 8749.25 92 174.9850 33.26 1055.27 Mon Jul 18 08:45:23 2011 100 10878.50 92 108.7850 53.50 3985.79 Mon Jul 18 08:46:17 2011 150 11308.29 92 75.3886 77.20 8875.31 Mon Jul 18 08:47:34 2011 200 11808.87 91 59.0443 98.57 14908.60 Mon Jul 18 08:49:13 2011 500 12690.24 81 25.3805 229.31 47362.00 Mon Jul 18 08:53:03 2011 1000 13056.65 78 13.0566 445.75 101236.98 Mon Jul 18 09:00:29 2011 2000 13244.13 75 6.6221 878.88 197502.93 Mon Jul 18 09:15:09 2011 ============================================================================ [root@uvsw-sys aim7]# /usr/bin/time --verbose ./runt "2.6.32-165.bz713953" --------------------------------------------------------------------------- Linux version 2.6.32-165.el6.bz713953.x86_64 (root.bos.redhat.com) (gcc vsion 4.4.5 20110214 (Red Hat 4.4.5-6) (GCC) ) #1 SMP Mon Jul 18 08:49:11 EDT 2011 DATE = Mon Jul 18 11:26:03 CDT 2011 ARGS = -f -nl -y -D3600 2.6.32-165.bz713953 HOST = uvsw-sys CPUS = 256 DIRS = 2 DISKS= 0 FS = ext4 CMDLINE = ro root=LABEL=uvsw-sysR14 rd_NO_LUKS rd_NO_LVM rd_NO_MD rd_NO_DM LANG=en_US.UTF-SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us crashkernel=512M virtefi selinux=0 n_watchdog=0 add_efi_memmap nortsched processor.max_cstate=1 log_buf_len=8M pci=hpiosize=0,memsize=0,nobar nohz=off cgroup_disable=memory earlyprintk=ttyS0,115200n8 pcie_aspm=on noftlockup console=ttyS0,115200n8 ID = 2.6.32-165.bz713953 Run 1 of 1 AIM Multiuser Benchmark - Suite VII v1.1, January 22, 1996 Copyright (C) 1996 AIM Technology All Rights Reserved Datapoint file : HZ is <100> AIM Multiuser Benchmark - Suite VII Run Beginning Tasks jobs/min jti jobs/min/task real cpu 1 512.78 100 512.7753 11.35 1.35 Mon Jul 18 11:26:16 2011 2 1038.36 99 519.1793 11.21 2.37 Mon Jul 18 11:26:27 2011 3 1511.69 99 503.8961 11.55 4.49 Mon Jul 18 11:26:39 2011 4 1943.24 99 485.8097 11.98 7.49 Mon Jul 18 11:26:52 2011 5 2333.60 99 466.7201 12.47 12.13 Mon Jul 18 11:27:04 2011 10 3798.96 99 379.8956 15.32 50.38 Mon Jul 18 11:27:20 2011 20 4752.96 99 237.6480 24.49 287.17 Mon Jul 18 11:27:45 2011 50 4990.57 99 99.8114 58.31 2397.91 Mon Jul 18 11:28:43 2011 100 5860.44 99 58.6044 99.31 8775.86 Mon Jul 18 11:30:23 2011 150 6158.30 99 41.0553 141.76 19220.22 Mon Jul 18 11:32:45 2011 200 6293.59 98 31.4680 184.95 33049.52 Mon Jul 18 11:35:50 2011 500 6602.38 84 13.2048 440.75 100128.64 Mon Jul 18 11:43:12 2011 1000 6702.52 81 6.7025 868.33 207472.07 Mon Jul 18 11:57:41 2011
Created attachment 513655 [details] Aim7 used in testing I used a simple ./runt as shown in the posted results.
Created attachment 513656 [details] NMI stack trace of 2.6.32-165 with proposed patch applied.
I posted the 2 patches that Shak verified fixes this problem: 1.) commit c35a56a090eacefca07afeb994029b57d8dd8025 Author: Theodore Ts'o <tytso> Date: Sun May 16 05:00:00 2010 -0400 jbd2: Improve scalability by not taking j_state_lock in jbd2_journal_stop() One of the most contended locks in the jbd2 layer is j_state_lock when running dbench. This is especially true if using the real-time kernel with its "sleeping spinlocks" patch that replaces spinlocks with priority inheriting mutexes --- but it also shows up on large SMP benchmarks. Thanks to John Stultz for pointing this out. Reviewed by Mingming Cao and Jan Kara. Signed-off-by: "Theodore Ts'o" <tytso> 2.) commit 965f55dea0e331152fa53941a51e4e16f9f06fae Author: Shaohua Li <shaohua.li> Date: Tue May 24 17:11:20 2011 -0700 mmap: avoid merging cloned VMAs Avoid merging a VMA with another VMA which is cloned from the parent process. The cloned VMA shares the anon_vma lock with the parent process's VMA. If we do the merge, more vmas (even the new range is only for current process) use the perent process's anon_vma lock. This introduces scalability issues. find_mergeable_anon_vma() already considers this case. Signed-off-by: Shaohua Li <shaohua.li> Cc: Rik van Riel <riel> Cc: Hugh Dickins <hughd> Cc: Andi Kleen <andi> Signed-off-by: Andrew Morton <akpm> Signed-off-by: Linus Torvalds <torvalds> AIM7 runs 2X faster on 2.6.32.y than RHEL6.1 https://bugzilla.redhat.com/show_bug.cgi?id=713953 Summary: 1.9x speedup with AIM7 on FusionIO Tasks jobs/min jti jobs/min/task real cpu RHEL6.1 10000 265289.15 64 26.5289 228.43 12142.19 Sat Jun 12 22:14:34 2010 Larry6.2 10000 504117.79 69 50.4118 120.21 7697.32 Thu Jul 21 14:28:36 2011 Larry6.2 == 2.6.32-169.el6.andi.x86_64 I will post to the BZ but wanted to give quick feedback to Larry/all, Details: RHEL6.1 2.6.32-131.0.15.el6.x86_64 AIM Multiuser Benchmark - Suite VII Run Beginning Tasks jobs/min jti jobs/min/task real cpu 1 397.64 100 397.6378 15.24 5.24 Sat Jun 12 21:52:06 2010 101 21355.90 94 211.4445 28.66 1131.42 Sat Jun 12 21:52:35 2010 201 42530.03 94 211.5922 28.64 1132.11 Sat Jun 12 21:53:04 2010 301 65874.32 91 218.8516 27.69 1019.74 Sat Jun 12 21:53:32 2010 401 85205.47 90 212.4825 28.52 1084.43 Sat Jun 12 21:54:01 2010 501 111907.85 88 223.3690 27.13 981.28 Sat Jun 12 21:54:29 2010 601 119844.03 87 199.4077 30.39 1155.42 Sat Jun 12 21:54:59 2010 701 133460.89 85 190.3864 31.83 1211.88 Sat Jun 12 21:55:31 2010 905 149885.21 84 165.6190 36.59 1473.14 Sat Jun 12 21:56:08 2010 1343 191360.92 77 142.4877 42.53 1797.61 Sat Jun 12 21:56:51 2010 2290 218472.92 68 95.4030 63.52 2927.11 Sat Jun 12 21:57:55 2010 2669 222968.57 70 83.5401 72.54 3360.75 Sat Jun 12 21:59:09 2010 3480 237406.28 67 68.2202 88.83 4342.04 Sat Jun 12 22:00:39 2010 4291 243455.29 66 56.7363 106.81 5299.78 Sat Jun 12 22:02:27 2010 5102 249782.84 66 48.9578 123.78 6263.25 Sat Jun 12 22:04:32 2010 6852 258582.14 65 37.7382 160.58 8388.74 Sat Jun 12 22:07:14 2010 8602 252509.78 64 29.3548 206.44 10774.35 Sat Jun 12 22:10:43 2010 10000 265289.15 64 26.5289 228.43 12142.19 Sat Jun 12 22:14:34 2010 AIM Multiuser Benchmark - Suite VII With 2.6.32-169.el6.andi.x86_64 (4 of the 5 patches) AIM Multiuser Benchmark - Suite VII Run Beginning Tasks jobs/min jti jobs/min/task real cpu 1 483.25 100 483.2536 12.54 2.53 Thu Jul 21 14:15:22 2011 101 53548.56 98 530.1837 11.43 78.82 Thu Jul 21 14:15:34 2011 201 97056.57 96 482.8685 12.55 152.97 Thu Jul 21 14:15:47 2011 301 134716.40 94 447.5628 13.54 233.41 Thu Jul 21 14:16:00 2011 401 164082.38 92 409.1830 14.81 304.59 Thu Jul 21 14:16:15 2011 501 193256.52 91 385.7416 15.71 384.27 Thu Jul 21 14:16:31 2011 601 217436.42 90 361.7910 16.75 463.08 Thu Jul 21 14:16:48 2011 701 237853.30 88 339.3057 17.86 539.16 Thu Jul 21 14:17:06 2011 907 272910.63 86 300.8937 20.14 695.99 Thu Jul 21 14:17:26 2011 1345 327864.04 83 243.7651 24.86 1032.19 Thu Jul 21 14:17:51 2011 1783 362704.93 80 203.4240 29.79 1366.86 Thu Jul 21 14:18:21 2011 2745 412976.66 77 150.4469 40.28 2104.72 Thu Jul 21 14:19:02 2011 3153 427358.09 76 135.5401 44.71 2417.29 Thu Jul 21 14:19:47 2011 4013 448685.98 74 111.8081 54.20 3080.88 Thu Jul 21 14:20:41 2011 4873 464972.13 73 95.4180 63.51 3735.04 Thu Jul 21 14:21:45 2011 6740 484110.47 71 71.8265 84.37 5165.47 Thu Jul 21 14:23:10 2011 7541 492652.65 70 65.3299 92.76 5801.03 Thu Jul 21 14:24:44 2011 9222 503471.35 69 54.5946 111.00 7072.96 Thu Jul 21 14:26:35 2011 10000 504117.79 69 50.4118 120.21 7697.32 Thu Jul 21 14:28:36 2011
(Thanks a lot, Larry!) Russ -- Could you please test and verify those patches as well? Thanks!
The results look good. --------------------------------------------------------------------------- Linux version 2.6.32-131.0.15.el6.x86_64 (mockbuild.bos.redhat.com) (gcc version 4.4.4 20100726 (Red Hat 4.4.4-13) (GCC) ) #1 SMP Tue May 10 15:42:40 EDT 2011 DATE = Mon Sep 26 10:21:57 CDT 2011 ARGS = -f -nl -y -D3600 2.6.32-131.0.15.el6.x86_64 HOST = uvmid5-sys CPUS = 48 DIRS = 1 DISKS= 0 FS = ext4 CMDLINE = ro root=LABEL=mid5-sysR14 rd_NO_LUKS rd_NO_LVM rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us crashkernel=512M virtefi selinux=0 nmi_watchdog=0 add_efi_memmap nortsched processor.max_cstate=1 log_buf_len=8M pci=hpiosize=0,hpmemsize=0,nobar nohz=off cgroup_disable=memory earlyprintk=ttyS0,115200n8 pcie_aspm=on console=ttyS0,115200n8 ID = 2.6.32-131.0.15.el6.x86_64 Run 1 of 1 AIM Multiuser Benchmark - Suite VII v1.1, January 22, 1996 Copyright (C) 1996 AIM Technology All Rights Reserved Datapoint file : HZ is <100> AIM Multiuser Benchmark - Suite VII Run Beginning Tasks jobs/min jti jobs/min/task real cpu 1 523.85 100 523.8524 11.11 1.12 Mon Sep 26 10:22:08 2011 2 1057.22 99 528.6104 11.01 2.02 Mon Sep 26 10:22:20 2011 3 1516.94 99 505.6473 11.51 4.49 Mon Sep 26 10:22:31 2011 4 1989.74 99 497.4359 11.70 6.49 Mon Sep 26 10:22:43 2011 5 2393.09 99 478.6184 12.16 10.51 Mon Sep 26 10:22:55 2011 10 3801.44 99 380.1437 15.31 37.79 Mon Sep 26 10:23:11 2011 20 6388.58 99 319.4292 18.22 159.16 Mon Sep 26 10:23:30 2011 50 7987.92 99 159.7584 36.43 1251.69 Mon Sep 26 10:24:06 2011 100 9953.82 97 99.5382 58.47 2308.86 Mon Sep 26 10:25:05 2011 150 10846.07 96 72.3071 80.49 3360.18 Mon Sep 26 10:26:26 2011 200 11373.85 94 56.8693 102.34 4416.56 Mon Sep 26 10:28:08 2011 500 12459.86 90 24.9197 233.55 10707.00 Mon Sep 26 10:32:02 2011 1000 12882.38 87 12.8824 451.78 21166.39 Mon Sep 26 10:39:34 2011 2000 13070.13 82 6.5351 890.58 42207.51 Mon Sep 26 10:54:25 2011 4000 13097.86 79 3.2745 1777.39 84693.22 Mon Sep 26 11:24:04 2011 8000 13153.32 76 1.6442 3539.79 169012.14 Mon Sep 26 12:23:06 2011 RHEL6.2 (development) kernel --------------------------------------------------------------------------- [root@uvmid5-sys aim7]# /usr/bin/time --verbose ./runt "2.6.32-71.el6.x86_64.uv" --------------------------------------------------------------------------- Linux version 2.6.32-71.el6.x86_64.uv (abuild@alcatraz) (gcc version 4.4.4 20100726 (Red Hat 4.4.4-13) (GCC) ) #1 SMP Fri Sep 23 01:12:00 EDT 2011 DATE = Mon Sep 26 17:47:51 CDT 2011 ARGS = -f -nl -y -D3600 2.6.32-71.el6.x86_64.uv HOST = uvmid5-sys CPUS = 48 DIRS = 1 DISKS= 0 FS = ext4 CMDLINE = ro root=LABEL=mid5-sysR14 rd_NO_LUKS rd_NO_LVM rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us crashkernel=512M virtefi selinux=0 nmi_watchdog=0 add_efi_memmap nortsched processor.max_cstate=1 log_buf_len=8M pci=hpiosize=0,hpmemsize=0,nobar nohz=off cgroup_disable=memory earlyprintk=ttyS0,115200n8 pcie_aspm=on console=ttyS0,115200n8 ID = 2.6.32-71.el6.x86_64.uv Run 1 of 1 AIM Multiuser Benchmark - Suite VII v1.1, January 22, 1996 Copyright (C) 1996 AIM Technology All Rights Reserved Datapoint file : HZ is <100> AIM Multiuser Benchmark - Suite VII Run Beginning Tasks jobs/min jti jobs/min/task real cpu 1 522.44 100 522.4417 11.14 1.12 Mon Sep 26 17:48:02 2011 2 1058.18 99 529.0909 11.00 1.98 Mon Sep 26 17:48:13 2011 3 1567.32 99 522.4417 11.14 3.37 Mon Sep 26 17:48:25 2011 4 2056.54 99 514.1343 11.32 4.88 Mon Sep 26 17:48:36 2011 5 2554.87 99 510.9745 11.39 6.78 Mon Sep 26 17:48:48 2011 10 4667.20 97 466.7201 12.47 19.21 Mon Sep 26 17:49:00 2011 20 8214.54 96 410.7269 14.17 40.80 Mon Sep 26 17:49:14 2011 50 13049.33 99 260.9865 22.30 574.54 Mon Sep 26 17:49:37 2011 100 17946.35 97 179.4635 32.43 1063.28 Mon Sep 26 17:50:09 2011 150 19800.41 94 132.0027 44.09 1620.03 Mon Sep 26 17:50:54 2011 200 21393.13 90 106.9656 54.41 2111.85 Mon Sep 26 17:51:48 2011 500 24982.83 86 49.9657 116.48 5087.39 Mon Sep 26 17:53:44 2011 1000 26419.72 80 26.4197 220.29 10049.33 Mon Sep 26 17:57:25 2011 2000 27218.52 77 13.6093 427.65 19991.90 Mon Sep 26 18:04:33 2011 ---------------------------------------------------------------------------
Great! Hey Larry -- I believe you can take it from here. Thanks everyone!
What else do I need to do? The patches are in 6.2. Larry
I guess this is a Q for Aris then. This BZ is still on POST, which would indicate they have not been committed to the RHEL 6.2 tree yet. I'll send him a note. Thanks!
Sorry for the confusion here I opened 2 separate BZs(721044 & 725855) for these 2 separate patches since they also fixed other reported problems that reported by Intel on 2 separate occasions. Together they also fixed BZ713953. These are the 2 commits: commit b562ced54d5e23f86ef8523f2e6e87d8e4a8e5d7 Author: Larry Woodman <lwoodman> Date: Tue Jul 26 18:37:51 2011 -0400 [fs] jbd2: Improve scalability by not taking j_state_lock in jbd2_journal_stop() Message-id: <4E2F097F.3070700> Patchwork-id: 39103 O-Subject: [RHEL6.2 V2 Patch] jbd2: Improve scalability by not taking j_state_lock in jbd2_journal_stop() Bugzilla: 721044 RH-Acked-by: Eric Sandeen <sandeen> RH-Acked-by: Rik van Riel <riel> fixes BZ721044 commit c35a56a090eacefca07afeb994029b57d8dd8025 Author: Theodore Ts'o <tytso> Date: Sun May 16 05:00:00 2010 -0400 commit 347d4e7a137ff704bb8fa0ec66155f2ef068e869 Author: Larry Woodman <lwoodman> Date: Tue Jul 12 20:38:26 2011 -0400 [mm] Avoid merging a VMA with another VMA which is cloned from the parent process. Message-id: <4E1CB0C2.5050004> Patchwork-id: 37430 O-Subject: [RHEL6.2 Patch] Avoid merging a VMA with another VMA which is cloned from the parent process. Bugzilla: 725855 RH-Acked-by: Rik van Riel <riel> RH-Acked-by: Johannes Weiner <jweiner> During weekly partner performance meetings it was brought to our attention that RHEL6 is missing this upstream performance optimization that avoids merging some VMAs which are cloned from the parent process: commit 965f55dea0e331152fa53941a51e4e16f9f06fae Author: Shaohua Li <shaohua.li> Date: Tue May 24 17:11:20 2011 -0700 mmap: avoid merging cloned VMAs
No worries! Closing this item as a DUP then. *** This bug has been marked as a duplicate of bug 725855 ***