Bug 1811208
Summary: | unable to set scope: set: gdb cannot find text block for address: st_create | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | John Pittman <jpittman> |
Component: | crash | Assignee: | Dave Anderson <anderson> |
Status: | CLOSED NOTABUG | QA Contact: | Emma Wu <xiawu> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 8.1 | CC: | cye, djeffery, dwysocha, ruyang |
Target Milestone: | rc | ||
Target Release: | 8.0 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-03-26 14:53:24 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1674330 |
Description
John Pittman
2020-03-06 20:36:10 UTC
As I recall, that's always been an issue with gdb and module symbols, some work, some (most?) don't... > As I recall, that's always been an issue with gdb and module symbols, some work, some (most?) don't...
Ah, yes, it's coming back...
When a module object file is loaded into the crash session, its data structures that
that contain its "text blocks" are searched to find the one that fits for the target address.
But the data structure text block values are not kernel virtual addresses, but rather contain
the offset values of the symbols in the object. (i.e. if you do a "nm -Bn object.ko.debug")
But there was this fix that went into crash-6.0.1, back in 2012:
- Implemented a new "mod -r" option, which will pass the "-readnow"
flag to the embedded gdb module, which will override the two-stage
strategy that it uses for reading symbol tables from module object
files. If the crash session was invoked with the "--readnow" flag,
then the same override will occur automatically. It should be noted
that doing will increase the virtual and resident memory set size.
(anderson)
With respect to the "two-stage strategy" referenced above, which updates the offset values
into kernel virtual addresses, I don't know how that happens in the normal course of events.
I believe I may have been able to force the second stage occur for a module, but I don't
recall what I did.
Anyway, the only foolproof way I could force transition was by using the more
time-consuming --readnow flag, either by:
(1) invoke crash with the --readnow flag (which uses --readnow on the vmlinux file and any modules that get loaded)
(2) use the "mod -r" flag (which uses --readnow on modules)
For example, there's a queue_io() function in both the kernel proper and
the dm_mod module:
crash> sym queue_io
ffffffffbc4721a0 (t) queue_io /usr/src/debug/kernel-3.10.0-957.27.2.el7/linux-3.10.0-957.27.2.el7.x86_64/fs/fs-writeback.c: 341
ffffffffc0235600 (t) queue_io [dm_mod]
crash>
If I try to set the module's version, it fails:
crash> set scope ffffffffc0235600
set: attempting to find/load "dm_mod" module debuginfo
MODULE NAME SIZE OBJECT FILE
ffffffffc024b880 dm_mod 124461 /usr/lib/debug/usr/lib/modules/3.10.0-957.27.2.el7.x86_64/kernel/drivers/md/dm-mod.ko.debug
set: gdb cannot find text block for address: ffffffffc0235600
crash>
If I unload the module, and then reload it adding the -r option:
crash> mod -D
crash> mod -sr dm_mod /usr/lib/debug/usr/lib/modules/3.10.0-957.27.2.el7.x86_64/kernel/drivers/md/dm-mod.ko.debug
MODULE NAME SIZE OBJECT FILE
ffffffffc024b880 dm_mod 124461 /usr/lib/debug/usr/lib/modules/3.10.0-957.27.2.el7.x86_64/kernel/drivers/md/dm-mod.ko.debug
crash> set scope ffffffffc0235600
scope: ffffffffc0235600 (queue_io)
crash>
You can also load them all with --readnow with "mod -rS".
Does that work for you?
Given that this extension module requires loading with --readnow in order to set scope, using "mod -r" is preferable: $ retrace-server-interact 929057675 crash ... [ cut ] ... crash> set scope st_create set: gdb cannot find text block for address: st_create crash> mod -d dm_service_time crash> mod -sr dm_service_time /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-862.14.4.el7.x86_64/kernel/drivers/md/dm-service-time.ko.debug MODULE NAME SIZE OBJECT FILE ffffffffc0810080 dm_service_time 12889 /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-862.14.4.el7.x86_64/kernel/drivers/md/dm-service-time.ko.debug crash> set scope st_create scope: ffffffffc080e270 (st_create) crash> Loading all modules with --readnow by default results in a huge waste of memory, so I'd like to close this BZ as NOTABUG. Sorry for the wait; I've been meaning to come back to this. Thanks a lot for looking. Unloading and reloading --readnow did work for me. Just curious, how much memory does it waste to load everything --readnow? Assuming it's way too much to do by default? It depends upon the module set, but just check the RSS of a crash utility session with "mod -S" vs. "mod -rS". In any case, there's nothing preventing your retrace-server script from using "mod -rS" by default on galvatron. I just prefer that we don't force it on everybody. David, sorry to interrupt. Could I solicit your opinion on this bz? Do you see this causing any issues? Is it overkill to load with 'mod -rS'? I see these failures from time to time when we have to set scope in the pykdump scripts. Thanks for any info. FWIW, here's a quick example on an 8GB x86_64 beaker machine with the following module set: crash> mod MODULE NAME SIZE OBJECT FILE ffffffffc0331bc0 dm_mod 151552 (not loaded) [CONFIG_KALLSYMS] ffffffffc033e180 dm_log 20480 (not loaded) [CONFIG_KALLSYMS] ffffffffc0348040 dm_region_hash 20480 (not loaded) [CONFIG_KALLSYMS] ffffffffc0353280 dm_mirror 28672 (not loaded) [CONFIG_KALLSYMS] ffffffffc035b280 ip_tables 28672 (not loaded) [CONFIG_KALLSYMS] ffffffffc037c280 video 45056 (not loaded) [CONFIG_KALLSYMS] ffffffffc0385600 wmi 32768 (not loaded) [CONFIG_KALLSYMS] ffffffffc0391100 sg 40960 (not loaded) [CONFIG_KALLSYMS] ffffffffc0396040 libcrc32c 16384 (not loaded) [CONFIG_KALLSYMS] ffffffffc039c200 crc32c_intel 24576 (not loaded) [CONFIG_KALLSYMS] ffffffffc03a7580 libahci 40960 (not loaded) [CONFIG_KALLSYMS] ffffffffc03b7900 ahci 40960 (not loaded) [CONFIG_KALLSYMS] ffffffffc03c3000 fb_sys_fops 16384 (not loaded) [CONFIG_KALLSYMS] ffffffffc03c8000 sysimgblt 16384 (not loaded) [CONFIG_KALLSYMS] ffffffffc03cd000 syscopyarea 16384 (not loaded) [CONFIG_KALLSYMS] ffffffffc03d6040 sysfillrect 16384 (not loaded) [CONFIG_KALLSYMS] ffffffffc03ff6c0 drm_kms_helper 212992 (not loaded) [CONFIG_KALLSYMS] ffffffffc0410000 i2c_algo_bit 16384 (not loaded) [CONFIG_KALLSYMS] ffffffffc0416000 intel_gtt 20480 (not loaded) [CONFIG_KALLSYMS] ffffffffc0455840 e1000e 282624 (not loaded) [CONFIG_KALLSYMS] ffffffffc0466140 sr_mod 28672 (not loaded) [CONFIG_KALLSYMS] ffffffffc049f9c0 libata 270336 (not loaded) [CONFIG_KALLSYMS] ffffffffc04b1000 mxm_wmi 16384 (not loaded) [CONFIG_KALLSYMS] ffffffffc04b92c0 lpc_ich 28672 (not loaded) [CONFIG_KALLSYMS] ffffffffc04be540 dell_smm_hwmon 16384 (not loaded) [CONFIG_KALLSYMS] ffffffffc052c980 drm 536576 (not loaded) [CONFIG_KALLSYMS] ffffffffc055be00 ttm 114688 (not loaded) [CONFIG_KALLSYMS] ffffffffc0744940 nouveau 2220032 (not loaded) [CONFIG_KALLSYMS] ffffffffc0788080 snd_intel_dspcfg 24576 (not loaded) [CONFIG_KALLSYMS] ffffffffc078d540 dcdbas 16384 (not loaded) [CONFIG_KALLSYMS] ffffffffc0794080 snd_hwdep 16384 (not loaded) [CONFIG_KALLSYMS] ffffffffc07a9380 snd 94208 (not loaded) [CONFIG_KALLSYMS] ffffffffc07b3000 iTCO_vendor_support 16384 (not loaded) [CONFIG_KALLSYMS] ffffffffc07be000 mei_me 45056 (not loaded) [CONFIG_KALLSYMS] ffffffffc07c4400 ghash_clmulni_intel 16384 (not loaded) [CONFIG_KALLSYMS] ffffffffc07c9040 soundcore 16384 (not loaded) [CONFIG_KALLSYMS] ffffffffc09c4f80 i915 2330624 (not loaded) [CONFIG_KALLSYMS] ffffffffc0a1d600 snd_pcm 110592 (not loaded) [CONFIG_KALLSYMS] ffffffffc0a34380 snd_hda_core 90112 (not loaded) [CONFIG_KALLSYMS] ffffffffc0a49680 snd_hda_codec_hdmi 69632 (not loaded) [CONFIG_KALLSYMS] ffffffffc0a59000 snd_hda_intel 40960 (not loaded) [CONFIG_KALLSYMS] ffffffffc0a694c0 sd_mod 53248 (not loaded) [CONFIG_KALLSYMS] ffffffffc0a6f000 ledtrig_audio 16384 (not loaded) [CONFIG_KALLSYMS] ffffffffc0a7e280 cdrom 65536 (not loaded) [CONFIG_KALLSYMS] ffffffffc0ba5800 xfs 1519616 (not loaded) [CONFIG_KALLSYMS] ffffffffc0bfc500 i2c_i801 28672 (not loaded) [CONFIG_KALLSYMS] ffffffffc0c01140 mei_wdt 16384 (not loaded) [CONFIG_KALLSYMS] ffffffffc0c08880 intel_rapl_perf 20480 (not loaded) [CONFIG_KALLSYMS] ffffffffc0c0eb80 intel_cstate 20480 (not loaded) [CONFIG_KALLSYMS] ffffffffc0c2bc40 mei 110592 (not loaded) [CONFIG_KALLSYMS] ffffffffc0c38100 snd_timer 36864 (not loaded) [CONFIG_KALLSYMS] ffffffffc0c576c0 snd_hda_codec 151552 (not loaded) [CONFIG_KALLSYMS] ffffffffc0c64100 iTCO_wdt 16384 (not loaded) [CONFIG_KALLSYMS] ffffffffc0c69100 pcspkr 16384 (not loaded) [CONFIG_KALLSYMS] ffffffffc0c6e200 crct10dif_pclmul 16384 (not loaded) [CONFIG_KALLSYMS] ffffffffc0c73200 crc32_pclmul 16384 (not loaded) [CONFIG_KALLSYMS] ffffffffc0c78300 intel_rapl_msr 16384 (not loaded) [CONFIG_KALLSYMS] ffffffffc0c7d1c0 intel_powerclamp 16384 (not loaded) [CONFIG_KALLSYMS] ffffffffc0c82180 coretemp 16384 (not loaded) [CONFIG_KALLSYMS] ffffffffc0c8e140 intel_wmi_thunderbolt 16384 (not loaded) [CONFIG_KALLSYMS] ffffffffc0ca4580 snd_hda_codec_generic 86016 (not loaded) [CONFIG_KALLSYMS] ffffffffc0cab040 irqbypass 16384 (not loaded) [CONFIG_KALLSYMS] ffffffffc0cb2640 intel_rapl_common 24576 (not loaded) [CONFIG_KALLSYMS] ffffffffc0cde800 intel_uncore 143360 (not loaded) [CONFIG_KALLSYMS] ffffffffc0cfe5c0 snd_hda_codec_realtek 122880 (not loaded) [CONFIG_KALLSYMS] ffffffffc0d9ebc0 kvm 786432 (not loaded) [CONFIG_KALLSYMS] ffffffffc0e06880 kvm_intel 294912 (not loaded) [CONFIG_KALLSYMS] ffffffffc0e181c0 x86_pkg_temp_thermal 16384 (not loaded) [CONFIG_KALLSYMS] ffffffffc0ebea80 sunrpc 454656 (not loaded) [CONFIG_KALLSYMS] crash> After "mod -S", here's the "ps aux" output: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 50598 19.5 5.9 501020 472644 pts/0 S+ 15:14 0:12 crash ... After "mod -rS", note the increase in %MEM and RSS: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 50598 16.4 27.9 2232508 2205552 pts/0 S+ 15:14 0:28 crash ... If my math is correct, the RSS goes from ~461MB to well over 2GB. (In reply to John Pittman from comment #8) > David, sorry to interrupt. Could I solicit your opinion on this bz? Do you > see this causing any issues? Is it overkill to load with 'mod -rS'? I see > these failures from time to time when we have to set scope in the pykdump > scripts. Thanks for any info. Yes the memory usage would be an issue as Dave A points out and especially on our production system. In just the one example he indicates a 4X's increase in RSS and given the workaround, I see a fairly large cost to benefit ratio especially considering there's easy workarounds. The problem on our retrace-server is people often leave crash sessions open for a very long time and forget about them and this change looks like it would consume a lot of RAM. Unless there is a way to identify such module conflicts easily in which case we could modify retrace-server to have a more surgical "symbol load" crashrc file and so the impact would not be as great. Today we have this: mod -S /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-957.21.3.el7.x86_64 > /dev/null We could potentially do: mod -s <non-problem module> ... mod -rs <problem module> Dave W. thanks for confirming. Dave A. thanks for your time and assistance on this. I'll try and configure the scripts to unload and reload if they see an issue. Thanks again, you can close. Thanks John and Dave. I'm definitely rusty with respect to actually analyzing real-life vmcores, but I wonder what percentage of crash dumps have panic backtraces that actually pass through module code, and by extension require the loading of that module's debuginfo data? (not to mention the full slate of *all* modules) You could parse the panic task backtrace for any entries that contain a [module] name string tagged on to the end of a stack frame: crash> bt | grep "#" | grep "]$" #7 [ffff88000904ee88] scsi_io_completion at ffffffffa005f556 [scsi_mod] crash> But then extend the bt line to do some kind of awk extraction of the "scsi_mod" string, and dump it into a file, which could then be used as an argument input file: crash> bt | grep "#" | grep "]$" | awk magic-extraction > input crash> mod -rs < input Just a thought... |