This is a confirmed regression in the latest upstream: 2c7310a Replace lseek/read into pread for kcore and vmcore reading vs RHEL8's crash-7.3.1-5.el8.x86_64 I have not Reproducible: Always Steps to Reproduce: 1. Start crash on affected vmcore 2. Run 'bt PID' where PID is one that is slow Actual Results: bt time is: real 27m58.509s user 27m54.682s sys 0m1.218s Vs rhel8's crash: real 0m23.950s user 0m23.308s sys 0m0.590s Expected Results: Similar 'bt' time to execute as RHEL8's crash for the same vmcore and same pid I can share the specific vmcore to any Red Hat employee but cannot share publicly due to customer data concerns. I think this affects only certain vmcores, but it is a problem as it's showing up on our production vmcore system. Though I don't know how many vmcores it affects there, my guess is it's a small percentage (less than 10%) or someone would have reported it more widely. I'll see if I can determine how far back the problem reproduces, if this is a 8.x branch problem (since it's not seen in RHEL8's 7.x build) or if it was introduced recently.
From comment #0, I meant to finish this: "I have not ..." I have not narrowed this down further or run gdb on it to understand why it takes so long. Simple tests I'm running now is to see if it reproduces if I build 8.0.1 or 8.0.2 and go from there. Not sure if someone has some ideas on it or if it's been reported before.
GOOD: crash-8.0.0-ec568e2 crash-8.0.0-ec568e2> quit real 0m9.500s user 0m8.748s sys 0m0.695s
BAD: crash-8.0.2-f1cd581 crash-8.0.2-f1cd581> quit real 28m1.614s user 27m57.931s sys 0m1.006s
Looks like this is a regression sometime after the 8.0.1 release (first regression seen in 8.0.2 release). PASS: 8.0.1 FAIL: 8.0.2 and above crash-8.0.1-2d19346> quit real 0m6.416s user 0m5.882s sys 0m0.477s
I am bisecting this now.
$ git bisect bad 2145b2bb79c59aa25c5155a8f9851554d1813fb9 is the first bad commit commit 2145b2bb79c59aa25c5155a8f9851554d1813fb9 Author: Tao Liu <ltao> Date: Wed Aug 31 11:54:13 2022 +0800 Let gdb get kernel module symbols info from crash Gdb will try to resolve an address to its corresponding symbol name such as when printing a structure. It works fine for kernel symbols, because gdb can find them through vmlinux. However as for kernel modules symbols, crash resolves them by dig into "struct module", which gdb don't know. As a result, gdb fails to translate a kernel module address to its symbol name without "mod -s|-S" options. For example we can reproduce the issue as follows. crash> timer .... 4331308176 336 ffff94ea24240860 ffffffffc03762c0 <estimation_timer> .... crash> sym 0xffffffffc03762c0 ffffffffc03762c0 (t) estimation_timer [ip_vs] Before patch: crash> timer_list ffff94ea24240860 struct timer_list { .... function = 0xffffffffc03762c0, .... } After patch: crash> timer_list ffff94ea24240860 struct timer_list { .... function = 0xffffffffc03762c0 <estimation_timer>, .... } In this patch, we add an interface for gdb, when gdb trying to build kernel module's address symbolic, the info can be get from crash. Signed-off-by: Tao Liu <ltao> defs.h | 2 ++ gdb-10.2.patch | 35 +++++++++++++++++++++++++++++++++++ gdb_interface.c | 12 ++++++++++++ 3 files changed, 49 insertions(+) $ git bisect log git bisect start # good: [2d193468e5fe7ee1c6be4c73083cc5ef8d922b74] crash-8.0.0 -> crash-8.0.1 git bisect good 2d193468e5fe7ee1c6be4c73083cc5ef8d922b74 # bad: [f1cd581d1c4afa5b8ffdfaa6a3ea9f545fe4ec91] crash-8.0.1 -> crash-8.0.2 git bisect bad f1cd581d1c4afa5b8ffdfaa6a3ea9f545fe4ec91 # bad: [f1cd581d1c4afa5b8ffdfaa6a3ea9f545fe4ec91] crash-8.0.1 -> crash-8.0.2 git bisect bad f1cd581d1c4afa5b8ffdfaa6a3ea9f545fe4ec91 # good: [cdd57e8b16aba2f5714673368d6dbc7565d59841] ppc64: handle backtrace when CPU is in an emergency stack git bisect good cdd57e8b16aba2f5714673368d6dbc7565d59841 # good: [cdd57e8b16aba2f5714673368d6dbc7565d59841] ppc64: handle backtrace when CPU is in an emergency stack git bisect good cdd57e8b16aba2f5714673368d6dbc7565d59841 # good: [9cbfea67eb4f094d47cd841b73ddbbdbe6b58696] Fix "task -R" by adding end identifier for union in task_struct git bisect good 9cbfea67eb4f094d47cd841b73ddbbdbe6b58696 # bad: [4ea3a806d11f000f2eb1ddc72c2b7a543e319f64] Fix for the invalid linux_banner pointer issue git bisect bad 4ea3a806d11f000f2eb1ddc72c2b7a543e319f64 # bad: [5b9d3e98cda9d99f3277aabec30d076e62cc5e71] Add debian/ubuntu vmlinux location to default search dirs git bisect bad 5b9d3e98cda9d99f3277aabec30d076e62cc5e71 # bad: [3ed9ec5c8d09cffac9772abbf54214125ade9127] x86_64: Correct the identifier when locating the call instruction git bisect bad 3ed9ec5c8d09cffac9772abbf54214125ade9127 # bad: [2145b2bb79c59aa25c5155a8f9851554d1813fb9] Let gdb get kernel module symbols info from crash git bisect bad 2145b2bb79c59aa25c5155a8f9851554d1813fb9 # first bad commit: [2145b2bb79c59aa25c5155a8f9851554d1813fb9] Let gdb get kernel module symbols info from crash
Tao, can you take a look? Note the vmcore in question has out of tree modules. So at a high-level, there may be something wrong there and the reason this corner case may need a follow-on patch to your original one.
Hi Dave, Sure, I will take a look into this. Thanks, Tao Liu
This bug appears to have been reported against 'rawhide' during the Fedora Linux 39 development cycle. Changing version to 39.
FEDORA-2024-190d76044a has been submitted as an update to Fedora 39. https://bodhi.fedoraproject.org/updates/FEDORA-2024-190d76044a
FEDORA-2024-190d76044a has been pushed to the Fedora 39 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-190d76044a` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-190d76044a See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2024-190d76044a (crash-8.0.4-2.fc39) has been pushed to the Fedora 39 stable repository. If problem still persists, please make note of it in this bug report.