1. Please describe the problem: Starting from kernel-6.5, performance drop in Libmicro's mmap and unmap compared to RHEL-9 5.14.0-339.el9 kernel. First detected in 6.5 rc3: http://reports.perfqe.tpb.lab.eng.brq.redhat.com/testing/sched/reports/Libmicro-test/ibm-p10-01-lp13.build.eng.rdu2.redhat.com/RHEL-9.3.0-20230718.0vsRHEL-9.3.0-20230521.45/2023-07-20T14:03:11.500000vs2023-07-26T10:07:47.801664/511c51ff-b149-5134-8085-aa34639d040a/index.html mmap_a128k: -141% unmap_z128k: -208% mprot_z128k: -159% All tested arches are affected: x86_64,aarch64, ppc64le. http://reports.perfqe.tpb.lab.eng.brq.redhat.com/testing/sched/reports/Libmicro-test/intel-icelake-platinum-8351n-1s.lab.eng.brq2.redhat.com/RHEL-9.3.0-20230718.0vsRHEL-9.3.0-20230718.0/2023-07-20T14:03:11.500000vs2023-09-13T08:52:32.238798/13242074-ed62-50b0-b211-15b64f64c1c1/index.html The last tested kernel is 6.6 rc2: http://reports.perfqe.tpb.lab.eng.brq.redhat.com/testing/sched/reports/Libmicro-test/gold-4s-b.tpb.lab.eng.brq.redhat.com/RHEL-9.3.0-20230718.0vsRHEL-9.3.0-20230718.0/2023-07-20T14:03:11.500000vs2023-08-31T12:36:27.700000/3a9ead53-077a-564a-9d20-441a071c5498/index.html 2. What is the Version-Release number of the kernel: Confirmed with 6.5.0-0.rc3.23.eln128 6.5.0-57.eln130 6.6.0-0.rc2.20.eln130 Reproducible: Always
Hello, I discovered that the performance drop was introduced already in kernel 6.1 rc1 with the introduction of maple trees. I'm going to mark it as a duplicate of BZ2149636 - Performance drop visible on mmapmany test starting from kernel-6.1.0-0.rc1.15.eln122 I have also created a simple .c reproducer that I'm going to attach here: gcc -Wall -Wextra -O1 mmap_munmap.c -o mmap_munmap ./run_mmap_munmap.sh Here are the results: $ tail -v -n+1 *mmap_munmap.log ==> 5.14.0-339.el9.x86_64_mmap_munmap.log <== TSC for 1048576 munmap calls with len of 8kiB: 2862804 K-cycles. Avg: 2 K-cycles/call ==> 6.0.0-54.eln121.x86_64_mmap_munmap.log <== TSC for 1048576 munmap calls with len of 8kiB: 2853526 K-cycles. Avg: 2 K-cycles/call ==> 6.1.0-0.rc1.15.eln122.x86_64_mmap_munmap.log <== TSC for 1048576 munmap calls with len of 8kiB: 5328092 K-cycles. Avg: 5 K-cycles/call ==> 6.6.0-0.rc2.20.eln130.x86_64_mmap_munmap.log <== TSC for 1048576 munmap calls with len of 8kiB: 5024177 K-cycles. Avg: 4 K-cycles/call As you can see, the drop has started in the 6.1 rc1 kernel. The latest kernel, 6.6, shows a minor improvement. Here is the perf record/report result for kernel 6.6: ============================================================================================== $ head -30 6.6.0-0.rc2.20.eln130.x86_64_mmap_munmap.perf # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 20K of event 'cycles' # Event count (approx.): 18298403215 # # Overhead Command Shared Object Symbol # ........ ........... ................. ........................................... # 7.51% mmap_munmap mmap_munmap [.] main 3.91% mmap_munmap [kernel.kallsyms] [k] sync_regs 3.87% mmap_munmap [kernel.kallsyms] [k] get_mem_cgroup_from_mm 3.60% mmap_munmap [kernel.kallsyms] [k] perf_iterate_ctx 3.53% mmap_munmap [kernel.kallsyms] [k] __folio_throttle_swaprate 3.07% mmap_munmap [kernel.kallsyms] [k] native_flush_tlb_one_user 3.03% mmap_munmap [kernel.kallsyms] [k] native_irq_return_iret 2.17% mmap_munmap [kernel.kallsyms] [k] mas_wr_node_store 2.01% mmap_munmap [kernel.kallsyms] [k] __slab_free 1.77% mmap_munmap [kernel.kallsyms] [k] kmem_cache_alloc 1.53% mmap_munmap [kernel.kallsyms] [k] charge_memcg 1.38% mmap_munmap [kernel.kallsyms] [k] __count_memcg_events 1.33% mmap_munmap [kernel.kallsyms] [k] kmem_cache_free 1.16% mmap_munmap [kernel.kallsyms] [k] mtree_range_walk 1.12% mmap_munmap [kernel.kallsyms] [k] up_write 1.09% mmap_munmap [kernel.kallsyms] [k] __rcu_read_lock 1.07% mmap_munmap [kernel.kallsyms] [k] __rcu_read_unlock 1.07% mmap_munmap [kernel.kallsyms] [k] mas_rev_awalk 1.01% mmap_munmap [kernel.kallsyms] [k] memcg_slab_post_alloc_hook ============================================================================================== Please note maple trees related functions (mtree_*, mas_*) - see also https://docs.kernel.org/core-api/maple_tree.html#:~:text=The%20Maple%20Tree%20is%20a,a%20user%20written%20search%20method. Thanks Jirka
Created attachment 1991089 [details] Simple reproducer written in .c This is a minimal reproducer in .c to show performance degradation for the unmap. gcc -Wall -Wextra -O1 mmap_munmap.c -o mmap_munmap ./run_mmap_munmap.sh
*** This bug has been marked as a duplicate of bug 2149636 ***