Bug 2239808 - Starting from kernel-6.5, performance drop in Libmicro's mmap and unmap
Summary: Starting from kernel-6.5, performance drop in Libmicro's mmap and unmap
Keywords:
Status: CLOSED DUPLICATE of bug 2149636
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 38
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-09-20 10:08 UTC by Jiri Hladky
Modified: 2023-09-29 13:23 UTC (History)
21 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2023-09-29 13:23:21 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Simple reproducer written in .c (10.94 KB, application/x-xz)
2023-09-29 13:22 UTC, Jiri Hladky
no flags Details

Comment 1 Jiri Hladky 2023-09-29 10:41:02 UTC
Hello,

I discovered that the performance drop was introduced already in kernel 6.1 rc1 with the introduction of maple trees. I'm going to mark it as a duplicate of 

BZ2149636 - Performance drop visible on mmapmany test starting from kernel-6.1.0-0.rc1.15.eln122

I have also created a simple .c reproducer that I'm going to attach here:

gcc -Wall -Wextra -O1 mmap_munmap.c -o mmap_munmap
./run_mmap_munmap.sh

Here are the results:

$ tail -v -n+1 *mmap_munmap.log
==> 5.14.0-339.el9.x86_64_mmap_munmap.log <==
TSC for 1048576 munmap calls with len of 8kiB: 2862804 K-cycles.  Avg: 2 K-cycles/call

==> 6.0.0-54.eln121.x86_64_mmap_munmap.log <==
TSC for 1048576 munmap calls with len of 8kiB: 2853526 K-cycles.  Avg: 2 K-cycles/call

==> 6.1.0-0.rc1.15.eln122.x86_64_mmap_munmap.log <==
TSC for 1048576 munmap calls with len of 8kiB: 5328092 K-cycles.  Avg: 5 K-cycles/call

==> 6.6.0-0.rc2.20.eln130.x86_64_mmap_munmap.log <==
TSC for 1048576 munmap calls with len of 8kiB: 5024177 K-cycles.  Avg: 4 K-cycles/call

As you can see, the drop has started in the 6.1 rc1 kernel. The latest kernel, 6.6, shows a minor improvement. 

Here is the perf record/report result for kernel 6.6:
==============================================================================================
$ head -30 6.6.0-0.rc2.20.eln130.x86_64_mmap_munmap.perf
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 20K of event 'cycles'
# Event count (approx.): 18298403215
#
# Overhead  Command      Shared Object      Symbol                                     
# ........  ...........  .................  ...........................................
#
     7.51%  mmap_munmap  mmap_munmap        [.] main
     3.91%  mmap_munmap  [kernel.kallsyms]  [k] sync_regs
     3.87%  mmap_munmap  [kernel.kallsyms]  [k] get_mem_cgroup_from_mm
     3.60%  mmap_munmap  [kernel.kallsyms]  [k] perf_iterate_ctx
     3.53%  mmap_munmap  [kernel.kallsyms]  [k] __folio_throttle_swaprate
     3.07%  mmap_munmap  [kernel.kallsyms]  [k] native_flush_tlb_one_user
     3.03%  mmap_munmap  [kernel.kallsyms]  [k] native_irq_return_iret
     2.17%  mmap_munmap  [kernel.kallsyms]  [k] mas_wr_node_store
     2.01%  mmap_munmap  [kernel.kallsyms]  [k] __slab_free
     1.77%  mmap_munmap  [kernel.kallsyms]  [k] kmem_cache_alloc
     1.53%  mmap_munmap  [kernel.kallsyms]  [k] charge_memcg
     1.38%  mmap_munmap  [kernel.kallsyms]  [k] __count_memcg_events
     1.33%  mmap_munmap  [kernel.kallsyms]  [k] kmem_cache_free
     1.16%  mmap_munmap  [kernel.kallsyms]  [k] mtree_range_walk
     1.12%  mmap_munmap  [kernel.kallsyms]  [k] up_write
     1.09%  mmap_munmap  [kernel.kallsyms]  [k] __rcu_read_lock
     1.07%  mmap_munmap  [kernel.kallsyms]  [k] __rcu_read_unlock
     1.07%  mmap_munmap  [kernel.kallsyms]  [k] mas_rev_awalk
     1.01%  mmap_munmap  [kernel.kallsyms]  [k] memcg_slab_post_alloc_hook
==============================================================================================

Please note maple trees related functions (mtree_*, mas_*) - see also https://docs.kernel.org/core-api/maple_tree.html#:~:text=The%20Maple%20Tree%20is%20a,a%20user%20written%20search%20method.

Thanks
Jirka

Comment 2 Jiri Hladky 2023-09-29 13:22:47 UTC
Created attachment 1991089 [details]
Simple reproducer written in .c

This is a minimal reproducer in .c to show performance degradation for the unmap. 

gcc -Wall -Wextra -O1 mmap_munmap.c -o mmap_munmap
./run_mmap_munmap.sh

Comment 3 Jiri Hladky 2023-09-29 13:23:21 UTC

*** This bug has been marked as a duplicate of bug 2149636 ***


Note You need to log in before you can comment on or make changes to this bug.