Bug 1853751
| Summary: | symbol_search() extreme performance degradation if symbol not found | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | John Pittman <jpittman> |
| Component: | crash | Assignee: | ltao |
| Status: | CLOSED ERRATA | QA Contact: | xiaoying yan <yiyan> |
| Severity: | low | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 8.1 | CC: | cye, djeffery, kdump-bugs, lijiang, ruyang, xiawu, yiyan |
| Target Milestone: | rc | Keywords: | Triaged |
| Target Release: | 8.0 | Flags: | pm-rhel:
mirror+
|
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | crash-7.3.1-1.el8 | Doc Type: | No Doc Update |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-05-10 13:55:49 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1674330, 1888160 | ||
Hi John, I cannot reproduce this issue at my end with RHEL-8.3 crash tool 'crash version: 7.2.8-7.el8' at my end. I tried using an intel-shumway machine for the same available in beaker. Here are the steps I followed (Hi Emma - the QE team can use the same as a reproducer): 1. Download latest 'mpykdump64' from sourceforge: https://sourceforge.net/projects/pykdump/files/mpykdump-x86_64/mpykdump-3.5.1-x86_64.tar.gz/download 1a. $ tar -zxvf mpykdump-3.5.1-x86_64.tar.gz This will install 'mpykdump' into locations: /usr/local/bin/crash64 /usr/local/lib/mpykdump64.so 2. Now launch crash tool: $ crash vmlinux vmcore <..snip..> crash> extend usr/local/lib/mpykdump64.so Setting scroll off while initializing PyKdump ./usr/local/lib/mpykdump64.so: shared object loaded 2a. Now I launch the 'test.py' script which I am placing at <http://people.redhat.com/~bhsharma/bz1853751/> crash> epython test.py Start: 4357.883254755 Run1 Complete: 4357.885866299 Run2 Complete: 4361.104522618 However I cannot reproduce the huge time gap b/w Run1 and Run2. Run2 took about 4 seconds as per the above data. Can you please share your thoughts / reproduction environment details? Thanks, Bhupesh The performance hit looks to come from having to iterate through all the symbol entities for each module every time symbol_search is called.
struct syment *
symbol_search(char *s)
{
int i;
struct syment *sp_hashed, *sp, *sp_end;
struct load_module *lm;
....
for (i = 0; i < st->mods_installed; i++) {
lm = &st->load_modules[i];
if (lm->mod_flags & MOD_INIT)
search_init = TRUE;
sp = lm->mod_symtable;
sp_end = lm->mod_symend;
for ( ; sp <= sp_end; sp++) {
if (!pseudos && MODULE_PSEUDO_SYMBOL(sp))
continue;
if (STREQ(s, sp->name))
return(sp);
}
}
In this core, the total number of symbol entities is 92426. So, calling it 2000 times as I did in the script results in iterating over 184,852,000 symbol entities.
Hello John,
The sequence for crash to find a symbol is: 1) sym hash table -> 2) iterate all kernel symbols -> 3) iterate all kernel modules symbols.
sym hash table is like a high speed cache for kernel symbols, if one symbol has been cached, it can finish searching in a short time. And the sym hash table can cover most cases.
If one symbol is not cached by hash table, crash will iterate over all kernel symbols, then iterate over all modules symbols. Then report if a symbol is not exist.
We can see if a symbol only exist in a kernel module, the searching routing can be as long as the symbol-not-exist routing. The following python script can illustrate that:
import time
from pykdump.API import *
print("Start: {}".format(time.monotonic()))
for x in range(2000):
y = sym2addr("use_blk_mq")
print("Run1 Complete: {}".format(time.monotonic()))
print(y)
for x in range(2000):
y = sym2addr("amd_pmu_reset")
print("Run2 Complete: {}".format(time.monotonic()))
print(y)
for x in range(2000):
y = sym2addr("blah")
print("Run3 Complete: {}".format(time.monotonic()))
print(y)
use_blk_mq is a hash table cached symbol, amd_pmu_reset is a kernel module(kvm_amd) only symbol, and the kvm_amd module is the last module to be searched:
$ crash /cores/retrace/tasks/156657913/crash/vmcore /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-1062.18.1.el7.x86_64/vmlinux
....
crash> mod -S <---- Load all modules to vmcore
....
crash> modinfo
struct module * MODULE_NAME SIZE
0xffffffffc031f880 dm_mod 124501
0xffffffffc032c000 dm_region_hash 20813
....
0xffffffffc28af7e0 kvm_amd 2177304 <---- Will be the last module to be searched
===========================================================================
crash> extend mpykdump64.so
crash> epython test.py
Start: 2712374.351170071
Run1 Complete: 2712375.19024274 <---- 1s
18446744072639087520 <---- symbol "use_blk_mq" found
Run2 Complete: 2712382.229403971 <---- 7s
18446744072678450272 <---- symbol "amd_pmu_reset" found
Run3 Complete: 2712389.406426623 <---- 7s
0 <---- symbol "blah" found
At this level, the performance of symbol-not-found case and symbol-for-kernel-module-only case are the same. From my point, I think the performance degradation for symbol-not-found case when comparing with symbol-cached-in-hash-table case is expected.
Any thoughts?
Thanks,
Tao Liu
Sorry, I will correct line Run3 Complete: 2712389.406426623 <---- 7s 0 <---- symbol "blah" found to be Run3 Complete: 2712389.406426623 <---- 7s 0 <---- symbol "blah" not found The first version of patch [1] has been sent to upstream. [1]: https://listman.redhat.com/archives/crash-utility/2021-August/msg00021.html Thanks ltao; tested script using patch
# cat symcheck.py
import time
from pykdump.API import *
print("Start: {}".format(time.monotonic()))
for x in range(2000):
y = sym2addr("use_blk_mq")
print("Run1 Complete: {}".format(time.monotonic()))
for x in range(2000):
y = sym2addr("amd_pmu_reset")
print("Run2 Complete: {}".format(time.monotonic()))
for x in range(2000):
y = sym2addr("blah")
print("Run3 Complete: {}".format(time.monotonic()))
===================================================
crash> epython /home/rdu/jpittman/symcheck.py
Start: 4061781.363723004
Run1 Complete: 4061782.285744557
Run2 Complete: 4061783.403973823
Run3 Complete: 4061784.069274155
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (crash bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:1864 |
Description of problem: symbol_search() extreme performance degradation if symbol not found Version-Release number of selected component (if applicable): crash-7.2.7-3.el8.x86_64 How reproducible: Every time Actual results: Performance hit 20x at 2k iterations Expected results: Ideally no perf hit in the error path, but if a perf hit is unavoidable, it shouldn't be this drastic. Additional info: The issue can be seen easist through a pykdump script. The pykdump tool has function sym2addr() which is just a wrapper for symbol_search(). static PyObject * py_sym2addr(PyObject *self, PyObject *args) { char *symbol; unsigned long long addr; struct syment *se; if (!PyArg_ParseTuple(args, "s", &symbol)) { PyErr_SetString(crashError, "invalid parameter type"); \ return NULL; } se = symbol_search(symbol); <===== if (se) addr = se->value; else addr = 0; // ASID //printf("addr=%lx\n", addr); //return Py_BuildValue("K", (unsigned long long) addr); return PyLong_FromUnsignedLong(addr); } static PyMethodDef crashMethods[] = { .... {"sym2addr", py_sym2addr, METH_VARARGS}, .... If we pass in a symbol that is present in the vmcore, the performance is fine, but if we pass in a symbol that is not found in the core, the performance degrades drastically. Below is the reproduction pykdump script: print("Start: {}".format(time.monotonic())) for x in range(2000): y = sym2addr("use_blk_mq") print("Run1 Complete: {}".format(time.monotonic())) for x in range(2000): y = sym2addr("blah") print("Run2 Complete: {}".format(time.monotonic())) And the results: Start: 2185649.555258775 Run1 Complete: 2185650.32046814 <=== 1 sec Run2 Complete: 2185671.415442096 <=== 21 secs