Bug 2180328
| Summary: | Pass 2 for a relatively simple script excessively slow | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Martin Cermak <mcermak> |
| Component: | systemtap | Assignee: | Frank Ch. Eigler <fche> |
| systemtap sub component: | system-version | QA Contact: | qe-baseos-tools-bugs |
| Status: | CLOSED DUPLICATE | Docs Contact: | |
| Severity: | unspecified | ||
| Priority: | unspecified | CC: | lberk, mcermak, mjw |
| Version: | 9.2 | Keywords: | Triaged |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-05-11 12:12:59 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
It isn't a --runtime=dyninst use-case, although dyninst appears to play some role in it. Perf says:
Samples: 207K of event 'cycles', Event count (approx.): 4061997803739
Children Self Command Shared Object Symbol
+ 32.54% 32.21% stap libstdc++.so.6.0.29 [.] std::_Rb_tree_increment
+ 24.03% 0.00% stap [unknown] [.] 0x00007fbfd3ffdc80
+ 14.74% 14.55% stap libc.so.6 [.] _int_malloc
+ 12.26% 12.14% stap libparseAPI.so.12.1.0 [.] std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, bool>, std::allocator<std::pair<unsigned long const,
+ 8.50% 8.39% stap libparseAPI.so.12.1.0 [.] Dyninst::ParseAPI::Block::hasCallSource
+ 5.90% 0.00% stap [unknown] [.] 0x8948555441002685
+ 5.90% 0.00% stap libparseAPI.so.12.1.0 [.] Dyninst::ParseAPI::Block::~Block
5.86% 5.80% stap libparseAPI.so.12.1.0 [.] std::_Rb_tree<Dyninst::ParseAPI::Edge*, Dyninst::ParseAPI::Edge*, std::_Identity<Dyninst::ParseAPI::Edge*>, std::less<Dynins
+ 4.68% 4.64% stap libparseAPI.so.12.1.0 [.] std::_Rb_tree<rose::BinaryAnalysis::InstructionSemantics2::BaseSemantics::InputOutputProperty, rose::BinaryAnalysis::Instruc
+ 4.68% 0.00% stap [unknown] [k] 0000000000000000
+ 4.41% 4.37% stap libc.so.6 [.] malloc_consolidate
+ 3.96% 0.00% stap [unknown] [k] 0x0000000000000021
+ 3.90% 0.00% stap [unknown] [.] 0xf000000000000072
+ 3.42% 3.38% stap libparseAPI.so.12.1.0 [.] std::_Hashtable<unsigned long, std::pair<unsigned long const, bool>, std::allocator<std::pair<unsigned long const, bool> >,
+ 3.01% 0.00% stap libparseAPI.so.12.1.0 [.] Dyninst::ParseAPI::Function::finalize
+ 2.65% 0.00% stap [unknown] [.] 0x0000000000841f0f
+ 2.63% 0.00% stap libparseAPI.so.12.1.0 [.] Dyninst::ParseAPI::Edge::~Edge
+ 2.34% 0.00% stap [unknown] [.] 0x0000000000007274
2.31% 2.28% stap libparseAPI.so.12.1.0 [.] Dyninst::ParseAPI::Function::delayed_link_return
+ 2.23% 0.00% stap [unknown] [.] 0x0000000000014ced
+ 2.22% 0.00% stap [unknown] [.] 0x523d968600000000
+ 2.04% 2.02% stap libc.so.6 [.] _int_free
+ 1.70% 0.00% stap [unknown] [.] 0x0000000000000001
+ 1.69% 1.68% stap libparseAPI.so.12.1.0 [.] Dyninst::ParseAPI::Edge::trg
+ 1.66% 0.00% stap [unknown] [.] 0xffffffff812d27c9
1.50% 1.47% stap libparseAPI.so.12.1.0 [.] Dyninst::ParseAPI::Parser::finalize
+ 1.25% 1.24% stap libc.so.6 [.] malloc
+ 1.20% 0.00% stap [unknown] [.] 0x8141e8c02b000202
+ 1.19% 0.00% stap [unknown] [.] 0x000000000000a43b
+ 1.18% 0.00% stap [unknown] [.] 0xffffffff81075c8a
0.86% 0.85% stap libc.so.6 [.] cfree.5
+ 0.85% 0.00% stap [unknown] [.] 0x00007fbfcbcadf40
+ 0.84% 0.08% stap [kernel.vmlinux] [k] asm_sysvec_apic_timer_interrupt
+ 0.79% 0.00% stap [unknown] [.] 0x41e8c02b000202d3
+ 0.74% 0.00% stap [unknown] [.] 0xffffffff81010736
+ 0.71% 0.00% stap [unknown] [.] 0xffffffff815d7fa9
+ 0.71% 0.00% stap [unknown] [.] 0xffffffff815f0dfe
From #systemtap chat: <@fche> eww 25 mins for that probe is crazy yea <@fche> hmmmm oh dyninst doing things <@fche> wcohen, was there going to be something on the stap/guru-mode/dyninst verification that could focus dyninst's attention on the functions in question <@fche> rather than analyze the entire kernel image? < wcohen> mcermak, fche the dynsinst analysis use to be much faster. The kernel changed everything to return through a thunk in the kernel and that made the dyninst binary analysis incredibly slow. <@fche> was there any upstream dyninst status on that, or on analysis focus on specific functions only? * wcohen looking up the dyninst issue discussion getting dyninst to focus on a specific function. < wcohen> https://github.com/dyninst/dyninst/issues/1305 mentioned in rhbz2126805 < wcohen> mcermak, fche ^^^ <@fche> thanks < wcohen> the sept 27,2022 comment in https://github.com/dyninst/dyninst/issues/1305 shows how the kernel config change caused the dynsinst analysis to require dramatically more time. <@fche> yes <@fche> was there a separate PR for focusing dyninst on given target functions only? < wcohen> all the returns at the end of functions being converted in the jumps to the thunk code that actually does the return causes the analysis to see a huge connected blob of code rather than and many smaller individual functions. Upstream commit 71f89cccaf46 should offer a paper-over workaround for this. *** This bug has been marked as a duplicate of bug 2126805 *** |
With fresh rhel-9.3 nightlies, following simple stap script started to elaborate (pass 2) very slowly: 9 x86_64 # uname -r 5.14.0-285.el9.x86_64 9 x86_64 # time stap -gp4 -v -e 'probe kernel.function("kernel_clone").return { $return = -1 }' -vvvvv [ ... stuff deleted ... ] Skipping tapset "/usr/share/systemtap/tapset/tzinfo.stp", duplicate inode. Skipping tapset "/usr/share/systemtap/tapset/uconversions-guru.stp", duplicate inode. Skipping tapset "/usr/share/systemtap/tapset/uconversions.stp", duplicate inode. Searched: "/usr/share/systemtap/tapset", found: 41, processed: 0 Pass 1: parsed user script and 501 library scripts using 132076virt/104328res/15232shr/88444data kb, in 310usr/100sys/476real ms. derive-probes (location #0): kernel.function("kernel_clone").return of keyword at <input>:1:1 checking pattern '' vs. module 'kernel' file '/usr/lib/debug/lib/modules/5.14.0-285.el9.x86_64/vmlinux' Attempting to extract kernel debuginfo build ID from /lib/modules/5.14.0-285.el9.x86_64/build/vmlinux.id blocklist regexps: blfn: ^(atomic_notifier_call_chain|default_do_nmi|__die|die_nmi|do_debug|do_general_protection|do_int3|do_IRQ|do_page_fault|do_sparc64_fault|do_trap|dummy_nmi_callback|flush_icache_range|ia64_bad_break|ia64_do_page_fault|ia64_fault|io_check_error|mem_parity_error|nmi_watchdog_tick|notifier_call_chain|oops_begin|oops_end|program_check_exception|single_step_exception|sync_regs|unhandled_fault|unknown_nmi_error|xen_[gs]et_debugreg|xen_irq_.*|xen_.*_fl_direct.*|check_events|xen_adjust_exception_frame|xen_iret.*|xen_sysret64.*|test_ti_thread_flag.*|inat_get_opcode_attribute|system_call_after_swapgs|HYPERVISOR_[gs]et_debugreg|HYPERVISOR_event_channel_op|hash_64|hash_ptr|native_set_pte|.*raw_.*_lock.*|.*raw_.*_unlock.*|.*raw_.*_trylock.*|.*read_lock.*|.*read_unlock.*|.*read_trylock.*|.*write_lock.*|.*write_unlock.*|.*write_trylock.*|.*write_seqlock.*|.*write_sequnlock.*|.*spin_lock.*|.*spin_unlock.*|.*spin_trylock.*|.*spin_is_locked.*|rwsem_.*lock.*|.*mutex_.*lock.*|atomic_.*|atomic64_.*|get_bh|put_bh|.*apic.*|.*APIC.*|.*softirq.*|.*IRQ.*|.*_intr.*|__delay|.*kernel_text.*|get_current|current_.*|.*exception_tables.*|.*setup_rt_frame.*|.*preempt_count.*|preempt_schedule|__switch_to|special_mapping_.*|.*_pte_.*)$ blfn_ret: ^(do_exit|sys_exit|sys_exit_group)$ blfile: ^(kernel/kprobes\.c|arch/.*/kernel/kprobes\.c|.*/include/asm/io\.h|.*/include/asm/io-defs\.h|.*/include/asm/io_64\.h|.*/include/asm/bitops\.h|drivers/ide/ide-iops\.c|arch/.*/kernel/paravirt\.c|.*/include/asm/paravirt\.h|fs/seq_file\.c)$ blsection: ^(\.init\.|\.exit\.|\.devinit\.|\.devexit\.|\.cpuinit\.|\.cpuexit\.|\.meminit\.|\.memexit\.) dwarf_builder::build for parse 'kernel_clone', func 'kernel_clone' pattern 'kernel' matches module 'kernel' focused on module 'kernel' = [0xffffffff81000000-0xffffffff84400000, bias 0 file /usr/lib/debug/lib/modules/5.14.0-285.el9.x86_64/vmlinux ELF machine |x86_64 (code 62) focused on module 'kernel' module function cache kernel size 169214 module function cache kernel hit kernel_clone querying dwarf func in file kernel/fork.c count 0 (func dieoffset: 0xe58ddb) selected function kernel_clone kernel_clone = dwfl_module_addrinfo(entrypc=0xffffffff810f6250 + (nil)) added to filtered_functions (dieoffset: 0xe58ddb) function cache kernel:kernel/fork.c size 431 function cache kernel:kernel/fork.c hit kernel_clone querying dwarf func in file kernel/fork.c count 0 (func dieoffset: 0xe58ddb) selected function kernel_clone kernel_clone = dwfl_module_addrinfo(entrypc=0xffffffff810f6250 + (nil)) added to filtered_functions (dieoffset: 0xe58ddb) probe kernel_clone@kernel/fork.c:2607 kernel reloc=.dynamic pc=0xffffffff810f6250 literal_stmt_for_return: finding return value for kernel_clone (kernel/fork.c) [ ... stall for apprx 25 mins ... ] replaced $return with __private__input___dwarf_tvar_set_return_0() replaced (__private__input___dwarf_tvar_set_return_0(-(1))) = (-(1)) with __private__input___dwarf_tvar_set_return_0(-(1)) Collapsing constant unary operator '-' at <input>:1:58 replaced -(1) with -1 Rerunning the code filters. symbol resolution for derived-probe probe kernel.function("kernel_clone@kernel/fork.c:2607").return /* pc=_stext+0xf6250 */ /* <- kernel.function("kernel_clone@kernel/fork.c:2607").return */{ __private__input___dwarf_tvar_set_return_0(-1); } symbol resolution for function function __private__input___dwarf_tvar_set_return_0 (rvalue) { return (@kregister(0)) = (rvalue); } local rvalue is formal parameter number of probes with global-variable conditions: 0 Eliding side-effect-free singleton block operator '{' at <input>:1:46 replaced { __private__input___dwarf_tvar_set_return_0(-1); } with __private__input___dwarf_tvar_set_return_0(-1) Eliding side-effect-free singleton block identifier '$return' at <input>:1:48 replaced { return (@kregister(0)) = (rvalue); } with return (@kregister(0)) = (rvalue) resolved type long to identifier '$return' at <input>:1:48 resolved type long to identifier '$return' at <input>:1:48 deleting module_cache Pass 2: analyzed script: 1 probe, 1 function, 0 embeds, 0 globals using 2634060virt/1861208res/81504shr/2479040data kb, in 1924630usr/4980sys/1472297real ms. /root/.systemtap/cache/6c/stap_6c2ecbbca07f472cb787c2b35c9ee51a_1139.ko Pass 3: using cached /root/.systemtap/cache/6c/stap_6c2ecbbca07f472cb787c2b35c9ee51a_1139.c Pass 4: using cached /root/.systemtap/cache/6c/stap_6c2ecbbca07f472cb787c2b35c9ee51a_1139.ko Running rm -rf /tmp/staphsC0mK Spawn waitpid result (0x0): 0 Removed temporary directory "/tmp/staphsC0mK" real 24m37.166s user 32m6.321s sys 0m7.637s 9 x86_64 #