Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2180328

Summary: Pass 2 for a relatively simple script excessively slow
Product: Red Hat Enterprise Linux 9 Reporter: Martin Cermak <mcermak>
Component: systemtapAssignee: Frank Ch. Eigler <fche>
systemtap sub component: system-version QA Contact: qe-baseos-tools-bugs
Status: CLOSED DUPLICATE Docs Contact:
Severity: unspecified    
Priority: unspecified CC: lberk, mcermak, mjw
Version: 9.2Keywords: Triaged
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-11 12:12:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Martin Cermak 2023-03-21 08:05:06 UTC
With fresh rhel-9.3 nightlies, following simple stap script started to elaborate (pass 2) very slowly:

9 x86_64 # uname -r
5.14.0-285.el9.x86_64
9 x86_64 # time stap -gp4 -v -e 'probe kernel.function("kernel_clone").return { $return = -1 }' -vvvvv

[ ... stuff deleted ... ]

Skipping tapset "/usr/share/systemtap/tapset/tzinfo.stp", duplicate inode.
Skipping tapset "/usr/share/systemtap/tapset/uconversions-guru.stp", duplicate inode.
Skipping tapset "/usr/share/systemtap/tapset/uconversions.stp", duplicate inode.
Searched: "/usr/share/systemtap/tapset", found: 41, processed: 0
Pass 1: parsed user script and 501 library scripts using 132076virt/104328res/15232shr/88444data kb, in 310usr/100sys/476real ms.
derive-probes (location #0): kernel.function("kernel_clone").return of keyword at <input>:1:1
checking pattern '' vs. module 'kernel' file '/usr/lib/debug/lib/modules/5.14.0-285.el9.x86_64/vmlinux'
Attempting to extract kernel debuginfo build ID from /lib/modules/5.14.0-285.el9.x86_64/build/vmlinux.id
blocklist regexps:
blfn: ^(atomic_notifier_call_chain|default_do_nmi|__die|die_nmi|do_debug|do_general_protection|do_int3|do_IRQ|do_page_fault|do_sparc64_fault|do_trap|dummy_nmi_callback|flush_icache_range|ia64_bad_break|ia64_do_page_fault|ia64_fault|io_check_error|mem_parity_error|nmi_watchdog_tick|notifier_call_chain|oops_begin|oops_end|program_check_exception|single_step_exception|sync_regs|unhandled_fault|unknown_nmi_error|xen_[gs]et_debugreg|xen_irq_.*|xen_.*_fl_direct.*|check_events|xen_adjust_exception_frame|xen_iret.*|xen_sysret64.*|test_ti_thread_flag.*|inat_get_opcode_attribute|system_call_after_swapgs|HYPERVISOR_[gs]et_debugreg|HYPERVISOR_event_channel_op|hash_64|hash_ptr|native_set_pte|.*raw_.*_lock.*|.*raw_.*_unlock.*|.*raw_.*_trylock.*|.*read_lock.*|.*read_unlock.*|.*read_trylock.*|.*write_lock.*|.*write_unlock.*|.*write_trylock.*|.*write_seqlock.*|.*write_sequnlock.*|.*spin_lock.*|.*spin_unlock.*|.*spin_trylock.*|.*spin_is_locked.*|rwsem_.*lock.*|.*mutex_.*lock.*|atomic_.*|atomic64_.*|get_bh|put_bh|.*apic.*|.*APIC.*|.*softirq.*|.*IRQ.*|.*_intr.*|__delay|.*kernel_text.*|get_current|current_.*|.*exception_tables.*|.*setup_rt_frame.*|.*preempt_count.*|preempt_schedule|__switch_to|special_mapping_.*|.*_pte_.*)$
blfn_ret: ^(do_exit|sys_exit|sys_exit_group)$
blfile: ^(kernel/kprobes\.c|arch/.*/kernel/kprobes\.c|.*/include/asm/io\.h|.*/include/asm/io-defs\.h|.*/include/asm/io_64\.h|.*/include/asm/bitops\.h|drivers/ide/ide-iops\.c|arch/.*/kernel/paravirt\.c|.*/include/asm/paravirt\.h|fs/seq_file\.c)$
blsection: ^(\.init\.|\.exit\.|\.devinit\.|\.devexit\.|\.cpuinit\.|\.cpuexit\.|\.meminit\.|\.memexit\.)
dwarf_builder::build for 
parse 'kernel_clone', func 'kernel_clone'
pattern 'kernel' matches module 'kernel'
focused on module 'kernel' = [0xffffffff81000000-0xffffffff84400000, bias 0 file /usr/lib/debug/lib/modules/5.14.0-285.el9.x86_64/vmlinux ELF machine |x86_64 (code 62)
focused on module 'kernel'
module function cache kernel size 169214

module function cache kernel hit kernel_clone
querying dwarf func in file kernel/fork.c count 0 (func dieoffset: 0xe58ddb)
selected function kernel_clone
kernel_clone = dwfl_module_addrinfo(entrypc=0xffffffff810f6250 + (nil))
added to filtered_functions (dieoffset: 0xe58ddb)
function cache kernel:kernel/fork.c size 431
function cache kernel:kernel/fork.c hit kernel_clone
querying dwarf func in file kernel/fork.c count 0 (func dieoffset: 0xe58ddb)
selected function kernel_clone
kernel_clone = dwfl_module_addrinfo(entrypc=0xffffffff810f6250 + (nil))
added to filtered_functions (dieoffset: 0xe58ddb)
probe kernel_clone@kernel/fork.c:2607 kernel reloc=.dynamic pc=0xffffffff810f6250
literal_stmt_for_return: finding return value for kernel_clone (kernel/fork.c)


[ ... stall for apprx 25 mins ... ]


replaced $return with __private__input___dwarf_tvar_set_return_0()
replaced (__private__input___dwarf_tvar_set_return_0(-(1))) = (-(1)) with __private__input___dwarf_tvar_set_return_0(-(1))
Collapsing constant unary operator '-' at <input>:1:58
replaced -(1) with -1
Rerunning the code filters.
symbol resolution for derived-probe probe kernel.function("kernel_clone@kernel/fork.c:2607").return /* pc=_stext+0xf6250 */ /* <- kernel.function("kernel_clone@kernel/fork.c:2607").return */{
__private__input___dwarf_tvar_set_return_0(-1);
}
symbol resolution for function function __private__input___dwarf_tvar_set_return_0 (rvalue)
{
return (@kregister(0)) = (rvalue);
}
      local rvalue is formal parameter
number of probes with global-variable conditions: 0
Eliding side-effect-free singleton block operator '{' at <input>:1:46
replaced {
__private__input___dwarf_tvar_set_return_0(-1);
} with __private__input___dwarf_tvar_set_return_0(-1)
Eliding side-effect-free singleton block identifier '$return' at <input>:1:48
replaced {
return (@kregister(0)) = (rvalue);
} with return (@kregister(0)) = (rvalue)
resolved type long to identifier '$return' at <input>:1:48
resolved type long to identifier '$return' at <input>:1:48
deleting module_cache
Pass 2: analyzed script: 1 probe, 1 function, 0 embeds, 0 globals using 2634060virt/1861208res/81504shr/2479040data kb, in 1924630usr/4980sys/1472297real ms.
/root/.systemtap/cache/6c/stap_6c2ecbbca07f472cb787c2b35c9ee51a_1139.ko
Pass 3: using cached /root/.systemtap/cache/6c/stap_6c2ecbbca07f472cb787c2b35c9ee51a_1139.c
Pass 4: using cached /root/.systemtap/cache/6c/stap_6c2ecbbca07f472cb787c2b35c9ee51a_1139.ko
Running rm -rf /tmp/staphsC0mK
Spawn waitpid result (0x0): 0
Removed temporary directory "/tmp/staphsC0mK"

real    24m37.166s
user    32m6.321s
sys     0m7.637s
9 x86_64 #

Comment 1 Martin Cermak 2023-03-21 08:06:48 UTC
It isn't a --runtime=dyninst use-case, although dyninst appears to play some role in it.  Perf says:

Samples: 207K of event 'cycles', Event count (approx.): 4061997803739
  Children      Self  Command  Shared Object                Symbol
+   32.54%    32.21%  stap     libstdc++.so.6.0.29          [.] std::_Rb_tree_increment
+   24.03%     0.00%  stap     [unknown]                    [.] 0x00007fbfd3ffdc80
+   14.74%    14.55%  stap     libc.so.6                    [.] _int_malloc
+   12.26%    12.14%  stap     libparseAPI.so.12.1.0        [.] std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, bool>, std::allocator<std::pair<unsigned long const,
+    8.50%     8.39%  stap     libparseAPI.so.12.1.0        [.] Dyninst::ParseAPI::Block::hasCallSource
+    5.90%     0.00%  stap     [unknown]                    [.] 0x8948555441002685
+    5.90%     0.00%  stap     libparseAPI.so.12.1.0        [.] Dyninst::ParseAPI::Block::~Block
     5.86%     5.80%  stap     libparseAPI.so.12.1.0        [.] std::_Rb_tree<Dyninst::ParseAPI::Edge*, Dyninst::ParseAPI::Edge*, std::_Identity<Dyninst::ParseAPI::Edge*>, std::less<Dynins
+    4.68%     4.64%  stap     libparseAPI.so.12.1.0        [.] std::_Rb_tree<rose::BinaryAnalysis::InstructionSemantics2::BaseSemantics::InputOutputProperty, rose::BinaryAnalysis::Instruc
+    4.68%     0.00%  stap     [unknown]                    [k] 0000000000000000
+    4.41%     4.37%  stap     libc.so.6                    [.] malloc_consolidate
+    3.96%     0.00%  stap     [unknown]                    [k] 0x0000000000000021
+    3.90%     0.00%  stap     [unknown]                    [.] 0xf000000000000072
+    3.42%     3.38%  stap     libparseAPI.so.12.1.0        [.] std::_Hashtable<unsigned long, std::pair<unsigned long const, bool>, std::allocator<std::pair<unsigned long const, bool> >,
+    3.01%     0.00%  stap     libparseAPI.so.12.1.0        [.] Dyninst::ParseAPI::Function::finalize
+    2.65%     0.00%  stap     [unknown]                    [.] 0x0000000000841f0f
+    2.63%     0.00%  stap     libparseAPI.so.12.1.0        [.] Dyninst::ParseAPI::Edge::~Edge
+    2.34%     0.00%  stap     [unknown]                    [.] 0x0000000000007274
     2.31%     2.28%  stap     libparseAPI.so.12.1.0        [.] Dyninst::ParseAPI::Function::delayed_link_return
+    2.23%     0.00%  stap     [unknown]                    [.] 0x0000000000014ced
+    2.22%     0.00%  stap     [unknown]                    [.] 0x523d968600000000
+    2.04%     2.02%  stap     libc.so.6                    [.] _int_free
+    1.70%     0.00%  stap     [unknown]                    [.] 0x0000000000000001
+    1.69%     1.68%  stap     libparseAPI.so.12.1.0        [.] Dyninst::ParseAPI::Edge::trg
+    1.66%     0.00%  stap     [unknown]                    [.] 0xffffffff812d27c9
     1.50%     1.47%  stap     libparseAPI.so.12.1.0        [.] Dyninst::ParseAPI::Parser::finalize
+    1.25%     1.24%  stap     libc.so.6                    [.] malloc
+    1.20%     0.00%  stap     [unknown]                    [.] 0x8141e8c02b000202
+    1.19%     0.00%  stap     [unknown]                    [.] 0x000000000000a43b
+    1.18%     0.00%  stap     [unknown]                    [.] 0xffffffff81075c8a
     0.86%     0.85%  stap     libc.so.6                    [.] cfree.5
+    0.85%     0.00%  stap     [unknown]                    [.] 0x00007fbfcbcadf40
+    0.84%     0.08%  stap     [kernel.vmlinux]             [k] asm_sysvec_apic_timer_interrupt
+    0.79%     0.00%  stap     [unknown]                    [.] 0x41e8c02b000202d3
+    0.74%     0.00%  stap     [unknown]                    [.] 0xffffffff81010736
+    0.71%     0.00%  stap     [unknown]                    [.] 0xffffffff815d7fa9
+    0.71%     0.00%  stap     [unknown]                    [.] 0xffffffff815f0dfe

Comment 2 Martin Cermak 2023-03-21 08:08:10 UTC
From #systemtap chat:

<@fche> eww 25 mins for that probe is crazy yea
<@fche> hmmmm oh dyninst doing things
<@fche> wcohen, was there going to be something on the stap/guru-mode/dyninst verification that could focus dyninst's attention on the functions in question
<@fche> rather than analyze the entire kernel image?
< wcohen> mcermak, fche the dynsinst analysis use to be much faster.  The kernel changed everything to return through a thunk in the kernel and that made the dyninst binary analysis incredibly slow.
<@fche> was there any upstream dyninst status on that, or on analysis focus on specific functions only?
 * wcohen looking up the dyninst issue discussion getting dyninst to focus on a specific function.
< wcohen> https://github.com/dyninst/dyninst/issues/1305  mentioned in rhbz2126805
< wcohen> mcermak, fche ^^^
<@fche> thanks
< wcohen> the sept 27,2022 comment in https://github.com/dyninst/dyninst/issues/1305 shows how the kernel config change caused the dynsinst analysis to require dramatically more time.
<@fche> yes
<@fche> was there a separate PR for focusing dyninst on given target functions only?
< wcohen> all the returns at the end of functions being converted in the jumps to the thunk code that actually does the return causes the analysis to see a huge connected blob of code rather than and many smaller individual functions.

Comment 3 Frank Ch. Eigler 2023-04-19 15:38:37 UTC
Upstream commit 71f89cccaf46 should offer a paper-over workaround for this.

Comment 4 Martin Cermak 2023-05-11 12:12:59 UTC

*** This bug has been marked as a duplicate of bug 2126805 ***