Bug 2239808

Summary: Starting from kernel-6.5, performance drop in Libmicro's mmap and unmap
Product: [Fedora] Fedora Reporter: Jiri Hladky <jhladky>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 38CC: acaringi, adscvr, airlied, alciregi, bskeggs, dzickus, fweimer, hdegoede, hpa, jarod, jhladky, josef, jvozar, kernel-maint, kkolakow, lgoncalv, linville, masami256, mchehab, ptalbert, steved
Target Milestone: ---Keywords: Performance, Regression
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-09-29 13:23:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Simple reproducer written in .c none

Comment 1 Jiri Hladky 2023-09-29 10:41:02 UTC
Hello,

I discovered that the performance drop was introduced already in kernel 6.1 rc1 with the introduction of maple trees. I'm going to mark it as a duplicate of 

BZ2149636 - Performance drop visible on mmapmany test starting from kernel-6.1.0-0.rc1.15.eln122

I have also created a simple .c reproducer that I'm going to attach here:

gcc -Wall -Wextra -O1 mmap_munmap.c -o mmap_munmap
./run_mmap_munmap.sh

Here are the results:

$ tail -v -n+1 *mmap_munmap.log
==> 5.14.0-339.el9.x86_64_mmap_munmap.log <==
TSC for 1048576 munmap calls with len of 8kiB: 2862804 K-cycles.  Avg: 2 K-cycles/call

==> 6.0.0-54.eln121.x86_64_mmap_munmap.log <==
TSC for 1048576 munmap calls with len of 8kiB: 2853526 K-cycles.  Avg: 2 K-cycles/call

==> 6.1.0-0.rc1.15.eln122.x86_64_mmap_munmap.log <==
TSC for 1048576 munmap calls with len of 8kiB: 5328092 K-cycles.  Avg: 5 K-cycles/call

==> 6.6.0-0.rc2.20.eln130.x86_64_mmap_munmap.log <==
TSC for 1048576 munmap calls with len of 8kiB: 5024177 K-cycles.  Avg: 4 K-cycles/call

As you can see, the drop has started in the 6.1 rc1 kernel. The latest kernel, 6.6, shows a minor improvement. 

Here is the perf record/report result for kernel 6.6:
==============================================================================================
$ head -30 6.6.0-0.rc2.20.eln130.x86_64_mmap_munmap.perf
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 20K of event 'cycles'
# Event count (approx.): 18298403215
#
# Overhead  Command      Shared Object      Symbol                                     
# ........  ...........  .................  ...........................................
#
     7.51%  mmap_munmap  mmap_munmap        [.] main
     3.91%  mmap_munmap  [kernel.kallsyms]  [k] sync_regs
     3.87%  mmap_munmap  [kernel.kallsyms]  [k] get_mem_cgroup_from_mm
     3.60%  mmap_munmap  [kernel.kallsyms]  [k] perf_iterate_ctx
     3.53%  mmap_munmap  [kernel.kallsyms]  [k] __folio_throttle_swaprate
     3.07%  mmap_munmap  [kernel.kallsyms]  [k] native_flush_tlb_one_user
     3.03%  mmap_munmap  [kernel.kallsyms]  [k] native_irq_return_iret
     2.17%  mmap_munmap  [kernel.kallsyms]  [k] mas_wr_node_store
     2.01%  mmap_munmap  [kernel.kallsyms]  [k] __slab_free
     1.77%  mmap_munmap  [kernel.kallsyms]  [k] kmem_cache_alloc
     1.53%  mmap_munmap  [kernel.kallsyms]  [k] charge_memcg
     1.38%  mmap_munmap  [kernel.kallsyms]  [k] __count_memcg_events
     1.33%  mmap_munmap  [kernel.kallsyms]  [k] kmem_cache_free
     1.16%  mmap_munmap  [kernel.kallsyms]  [k] mtree_range_walk
     1.12%  mmap_munmap  [kernel.kallsyms]  [k] up_write
     1.09%  mmap_munmap  [kernel.kallsyms]  [k] __rcu_read_lock
     1.07%  mmap_munmap  [kernel.kallsyms]  [k] __rcu_read_unlock
     1.07%  mmap_munmap  [kernel.kallsyms]  [k] mas_rev_awalk
     1.01%  mmap_munmap  [kernel.kallsyms]  [k] memcg_slab_post_alloc_hook
==============================================================================================

Please note maple trees related functions (mtree_*, mas_*) - see also https://docs.kernel.org/core-api/maple_tree.html#:~:text=The%20Maple%20Tree%20is%20a,a%20user%20written%20search%20method.

Thanks
Jirka

Comment 2 Jiri Hladky 2023-09-29 13:22:47 UTC
Created attachment 1991089 [details]
Simple reproducer written in .c

This is a minimal reproducer in .c to show performance degradation for the unmap. 

gcc -Wall -Wextra -O1 mmap_munmap.c -o mmap_munmap
./run_mmap_munmap.sh

Comment 3 Jiri Hladky 2023-09-29 13:23:21 UTC

*** This bug has been marked as a duplicate of bug 2149636 ***