1853751 – symbol_search() extreme performance degradation if symbol not found

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1853751 - symbol_search() extreme performance degradation if symbol not found

Summary: symbol_search() extreme performance degradation if symbol not found

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	crash
Sub Component:
Version:	8.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	low
Target Milestone:	rc
Target Release:	8.0
Assignee:	ltao
QA Contact:	xiaoying yan
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1674330 84KernelDebugPostPlanning
TreeView+	depends on / blocked

Reported:	2020-07-03 19:48 UTC by John Pittman
Modified:	2022-06-28 08:33 UTC (History)
CC List:	7 users (show)
Fixed In Version:	crash-7.3.1-1.el8
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-05-10 13:55:49 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2022:1864	0	None	None	None	2022-05-10 13:56:04 UTC

Description John Pittman 2020-07-03 19:48:30 UTC

Description of problem:

symbol_search() extreme performance degradation if symbol not found

Version-Release number of selected component (if applicable):

crash-7.2.7-3.el8.x86_64

How reproducible:

Every time

Actual results:

Performance hit 20x at 2k iterations

Expected results:

Ideally no perf hit in the error path, but if a perf hit is unavoidable, it shouldn't be this drastic.

Additional info:

The issue can be seen easist through a pykdump script.  The pykdump tool has function sym2addr() which is just a wrapper for symbol_search().

static PyObject *
py_sym2addr(PyObject *self, PyObject *args) {
  char *symbol;
  unsigned long long addr;
  struct syment *se;

  if (!PyArg_ParseTuple(args, "s", &symbol)) {
    PyErr_SetString(crashError, "invalid parameter type"); \
    return NULL;
  }

  se = symbol_search(symbol);  <=====

  if (se)
    addr = se->value;
  else
    addr = 0;

  // ASID
  //printf("addr=%lx\n", addr);
  //return Py_BuildValue("K", (unsigned long long) addr);
  return PyLong_FromUnsignedLong(addr);
}

static PyMethodDef crashMethods[] = {
....
  {"sym2addr",  py_sym2addr, METH_VARARGS},
....

If we pass in a symbol that is present in the vmcore, the performance is fine, but if we pass in a symbol that is not found in the core, the performance degrades drastically.

Below is the reproduction pykdump script:

print("Start: {}".format(time.monotonic()))
for x in range(2000):
    y = sym2addr("use_blk_mq")

print("Run1 Complete: {}".format(time.monotonic()))
for x in range(2000):
    y = sym2addr("blah")

print("Run2 Complete: {}".format(time.monotonic()))

And the results:

Start: 2185649.555258775 
Run1 Complete: 2185650.32046814  <=== 1 sec
Run2 Complete: 2185671.415442096  <=== 21 secs

Comment 3 Bhupesh Sharma 2020-10-28 05:00:27 UTC

Hi John,

I cannot reproduce this issue at my end with RHEL-8.3 crash tool 'crash version: 7.2.8-7.el8' 
at my end. I tried using an intel-shumway machine for the same available in beaker.

Here are the steps I followed (Hi Emma - the QE team can use the same as a reproducer):

1. Download latest 'mpykdump64' from sourceforge: https://sourceforge.net/projects/pykdump/files/mpykdump-x86_64/mpykdump-3.5.1-x86_64.tar.gz/download

1a. $ tar -zxvf mpykdump-3.5.1-x86_64.tar.gz 

This will install 'mpykdump' into locations:
/usr/local/bin/crash64
/usr/local/lib/mpykdump64.so

2. Now launch crash tool:
$ crash vmlinux vmcore

<..snip..>

crash> extend usr/local/lib/mpykdump64.so
Setting scroll off while initializing PyKdump
./usr/local/lib/mpykdump64.so: shared object loaded

2a. Now I launch the 'test.py' script which I am placing at <http://people.redhat.com/~bhsharma/bz1853751/>

crash> epython test.py
Start: 4357.883254755
Run1 Complete: 4357.885866299
Run2 Complete: 4361.104522618

However I cannot reproduce the huge time gap b/w Run1 and Run2. Run2 took about 4 seconds as per the above data.

Can you please share your thoughts / reproduction environment details?

Thanks,
Bhupesh

Comment 9 John Pittman 2021-01-01 16:20:03 UTC

The performance hit looks to come from having to iterate through all the symbol entities for each module every time symbol_search is called.

struct syment *
symbol_search(char *s)
{
    int i;
    struct syment *sp_hashed, *sp, *sp_end;
    struct load_module *lm;
....
        for (i = 0; i < st->mods_installed; i++) {
                lm = &st->load_modules[i];
                if (lm->mod_flags & MOD_INIT)
                        search_init = TRUE;
                sp = lm->mod_symtable;
                sp_end = lm->mod_symend;

                for ( ; sp <= sp_end; sp++) {
                        if (!pseudos && MODULE_PSEUDO_SYMBOL(sp))
                                continue;
                        if (STREQ(s, sp->name))
                                return(sp);
                }
        }

In this core, the total number of symbol entities is 92426.  So, calling it 2000 times as I did in the script results in iterating over 184,852,000 symbol entities.

Comment 13 ltao 2021-08-05 07:20:18 UTC

Hello John,

The sequence for crash to find a symbol is: 1) sym hash table -> 2) iterate all kernel symbols -> 3) iterate all kernel modules symbols.

sym hash table is like a high speed cache for kernel symbols, if one symbol has been cached, it can finish searching in a short time. And the sym hash table can cover most cases.

If one symbol is not cached by hash table, crash will iterate over all kernel symbols, then iterate over all modules symbols. Then report if a symbol is not exist.

We can see if a symbol only exist in a kernel module, the searching routing can be as long as the symbol-not-exist routing. The following python script can illustrate that:

import time
from pykdump.API import *

print("Start: {}".format(time.monotonic()))
for x in range(2000):
    y = sym2addr("use_blk_mq")
print("Run1 Complete: {}".format(time.monotonic()))
print(y)

for x in range(2000):
    y = sym2addr("amd_pmu_reset")
print("Run2 Complete: {}".format(time.monotonic()))
print(y)

for x in range(2000):
    y = sym2addr("blah")
print("Run3 Complete: {}".format(time.monotonic()))
print(y)

use_blk_mq is a hash table cached symbol, amd_pmu_reset is a kernel module(kvm_amd) only symbol, and the kvm_amd module is the last module to be searched:

$ crash /cores/retrace/tasks/156657913/crash/vmcore /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-1062.18.1.el7.x86_64/vmlinux
....
crash> mod -S                                             <---- Load all modules to vmcore
....
crash> modinfo
struct module *    MODULE_NAME                     SIZE 
0xffffffffc031f880 dm_mod                        124501 
0xffffffffc032c000 dm_region_hash                 20813 
....
0xffffffffc28af7e0 kvm_amd                      2177304   <---- Will be the last module to be searched
===========================================================================

crash> extend mpykdump64.so
crash> epython test.py
Start: 2712374.351170071
Run1 Complete: 2712375.19024274  <---- 1s
18446744072639087520             <---- symbol "use_blk_mq" found
Run2 Complete: 2712382.229403971 <---- 7s
18446744072678450272             <---- symbol "amd_pmu_reset" found
Run3 Complete: 2712389.406426623 <---- 7s
0                                <---- symbol "blah" found

At this level, the performance of symbol-not-found case and symbol-for-kernel-module-only case are the same. From my point, I think the performance degradation for symbol-not-found case when comparing with symbol-cached-in-hash-table case is expected.

Any thoughts?

Thanks,
Tao Liu

Comment 14 ltao 2021-08-05 07:27:57 UTC

Sorry, I will correct line

Run3 Complete: 2712389.406426623 <---- 7s
0                                <---- symbol "blah" found

to be 

Run3 Complete: 2712389.406426623 <---- 7s
0                                <---- symbol "blah" not found

Comment 15 ltao 2021-08-20 07:00:33 UTC

The first version of patch [1] has been sent to upstream.

[1]: https://listman.redhat.com/archives/crash-utility/2021-August/msg00021.html

Comment 16 John Pittman 2021-08-20 18:09:05 UTC

Thanks ltao; tested script using patch

# cat symcheck.py 
import time
from pykdump.API import *

print("Start: {}".format(time.monotonic()))

for x in range(2000):
    y = sym2addr("use_blk_mq")
print("Run1 Complete: {}".format(time.monotonic()))

for x in range(2000):
    y = sym2addr("amd_pmu_reset")
print("Run2 Complete: {}".format(time.monotonic()))

for x in range(2000):
    y = sym2addr("blah")
print("Run3 Complete: {}".format(time.monotonic()))

===================================================

crash> epython /home/rdu/jpittman/symcheck.py
Start: 4061781.363723004
Run1 Complete: 4061782.285744557
Run2 Complete: 4061783.403973823
Run3 Complete: 4061784.069274155

Comment 24 errata-xmlrpc 2022-05-10 13:55:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (crash bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:1864

Note You need to log in before you can comment on or make changes to this bug.