Bug 1709831
| Summary: | gcc 8.2.1->8.3.1 change related to systemtap debuginfo regression in rhel-8.1.0, kernel -84 -> -85 | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Martin Cermak <mcermak> |
| Component: | gcc | Assignee: | Marek Polacek <mpolacek> |
| gcc sub component: | system-version | QA Contact: | qe-baseos-tools-bugs |
| Status: | CLOSED UPSTREAM | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | ahajkova, fche, fweimer, jakub, law, lberk, mcermak, mjw, mnewsome, ohudlick, sipoyare, vkadlcik, vmukhame, wcohen |
| Version: | 8.0 | Keywords: | Bugfix, Triaged |
| Target Milestone: | rc | ||
| Target Release: | 8.0 | ||
| Hardware: | Unspecified | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-01-12 14:17:05 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1785287 | ||
| Bug Blocks: | 1734097 | ||
|
Description
Martin Cermak
2019-05-14 11:44:44 UTC
Historically this testcase was passing with systemtap-4.0-1 on top of 4.18.0-0.rc7.3.el8+7. Upstream systemtap-4.2 (6494d4c3c) doesn't have a fix for this. Checked the kernel -84 -> -85 diffs. Nothing relevant to NFS or compiler flags. Next thing to check may be the compiler version that built those rpms. If you have any old RHEL8 vm, just installing a pre- and post -85 kernel, with their -devel and -debuginfo, should allow you to do:
stap -r 4.18.0-86.el8.x86_64 -p4 -e 'probe module("nfsv3").function("nfs3_commit_done") { println($task) }'
stap -r 4.18.0-83.el8.x86_64 -p4 -e 'probe module("nfsv3").function("nfs3_commit_done") { println($task) }'
You will not need to reboot.
Alternately, unpack the debuginfo RPMs for pre and post 85 kernels, find the nfsv3.ko.debug files, and compare debuginfo for that inlined function via eu-readelf -w etc.
Thanks. I can kind-of see the differences in the debug records for nfs3_commit_done. Not sure what they all mean yet, but I've at least got a place where I can start looking. There's a couple of relevant BZs for changes in the debug symbols we generate. Anyway, will continue poking tomorrow. This looks like an intentional change. In gcc-8.2.x and earlier we have something like this:
[ 2d11e] subprogram abbrev: 187
abstract_origin (ref4) [ 2a3b4]
low_pc (addr) .text+0x00000000000002e0 <nfs3_commit_done>
high_pc (data8) 92 (.text+0x000000000000033c)
frame_base (exprloc)
[ 0] call_frame_cfa
sibling (ref4) [ 2d1ab]
[ 2d13a] formal_parameter abbrev: 95
abstract_origin (ref4) [ 2a3c6]
location (sec_offset) location list [ 1ed0]
[ 2d143] formal_parameter abbrev: 95
abstract_origin (ref4) [ 2a3d3]
location (sec_offset) location list [ 1f2c]
[ 2d14c] inlined_subroutine abbrev: 50
abstract_origin (ref4) [ 2c856]
entry_pc (addr) .text+0x0000000000000304 <nfs3_commit_done+0x24>
ranges (sec_offset) range list [ 200]
call_file (data1) nfs3proc.c (1)
call_line (data2) 845
call_column (data1) 6
sibling (ref4) [ 2d17d]
[ 2d165] formal_parameter abbrev: 25
abstract_origin (ref4) [ 2c873]
[ 2d16a] formal_parameter abbrev: 25
abstract_origin (ref4) [ 2c867]
[ 2d16f] GNU_call_site abbrev: 21
low_pc (addr) .text+0x0000000000000328 <nfs3_commit_done+0x48>
abstract_origin (ref4) [ 2d064]
[ 2d17d] lexical_block abbrev: 43
abstract_origin (ref4) [ 2a3b4]
low_pc (addr) .text+0x000000000000030d <nfs3_commit_done+0x2d>
high_pc (data8) 14 (.text+0x000000000000031b <nfs3_commit_done+0x3b>)
[ 2d192] formal_parameter abbrev: 25
abstract_origin (ref4) [ 2a3c6]
[ 2d197] formal_parameter abbrev: 25
abstract_origin (ref4) [ 2a3d3]
[ 2d19c] GNU_call_site abbrev: 21
low_pc (addr) .text+0x0000000000000319 <nfs3_commit_done+0x39>
abstract_origin (ref4) [ 2d6cf]
Of particular interest is the lexical block at 2d17d which then has two parameters, one of which is the task parameter we're interested in. I can't immediately think of a case where a lexical block should have parameters, Richi found it odd as well, but couldn't find anything which would make that strictly invalid.
After this upstream change form Richi:
commit c0bce4a7a8657012b377a576d3a1b6dbd5d5ef99
Author: rguenth <rguenth@138bc75d-0d04-0410-961f-82ee72b054a4>
Date: Mon Oct 29 14:13:56 2018 +0000
2018-10-29 Richard Biener <rguenther>
Backport from mainline
2018-09-26 Richard Biener <rguenther>
PR debug/87428
PR debug/87362
* tree-inline.c (expand_call_inline): When the location
of the call is UNKNOWN_LOCATION use DECL_SOURCE_LOCATION
or BUILTINS_LOCATION for the BLOCK_SOURCE_LOCATION of
the inserted BLOCK to make inlined_function_outer_scope_p
recognize it.
* dwarf2out.c (add_call_src_coords_attributes): Do not add
coords for reserved locations.
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-8-branch@265590 138bc75d-0d04-0410-961f-82ee72b054a4
The records for nfs3_commit_done looks like this:
[ 2d11e] subprogram abbrev: 187
abstract_origin (ref4) [ 2a3b4]
low_pc (addr) .text+0x00000000000002e0 <nfs3_commit_done>
high_pc (data8) 92 (.text+0x000000000000033c)
frame_base (exprloc)
[ 0] call_frame_cfa
sibling (ref4) [ 2d1b0]
[ 2d13a] formal_parameter abbrev: 95
abstract_origin (ref4) [ 2a3c6]
location (sec_offset) location list [ 1ed0]
[ 2d143] formal_parameter abbrev: 95
abstract_origin (ref4) [ 2a3d3]
location (sec_offset) location list [ 1f2c]
[ 2d14c] inlined_subroutine abbrev: 50
abstract_origin (ref4) [ 2c856]
entry_pc (addr) .text+0x0000000000000304 <nfs3_commit_done+0x24>
ranges (sec_offset) range list [ 200]
call_file (data1) nfs3proc.c (1)
call_line (data2) 845
call_column (data1) 6
sibling (ref4) [ 2d17d]
[ 2d165] formal_parameter abbrev: 25
abstract_origin (ref4) [ 2c873]
[ 2d16a] formal_parameter abbrev: 25
abstract_origin (ref4) [ 2c867]
[ 2d16f] GNU_call_site abbrev: 21
low_pc (addr) .text+0x0000000000000328 <nfs3_commit_done+0x48>
abstract_origin (ref4) [ 2d064]
[ 2d17d] inlined_subroutine abbrev: 141
abstract_origin (ref4) [ 2a3b4]
low_pc (addr) .text+0x000000000000030d <nfs3_commit_done+0x2d>
high_pc (data8) 14 (.text+0x000000000000031b <nfs3_commit_done+0x3b>)
call_file (data1) nfs3proc.c (1)
call_line (data2) 840
call_column (data1) 12
[ 2d197] formal_parameter abbrev: 25
abstract_origin (ref4) [ 2a3c6]
[ 2d19c] formal_parameter abbrev: 25
abstract_origin (ref4) [ 2a3d3]
[ 2d1a1] GNU_call_site abbrev: 21
low_pc (addr) .text+0x0000000000000319 <nfs3_commit_done+0x39>
abstract_origin (ref4) [ 2d6e4]
In particular note how the lexical block at 2d17d has changed into an inlined_subroutine with two formal parameters, one of which is the "task" parameter we want. That seems more sensible, but seems to be running afoul of stap.
I'm not familiar enough with the stap code and its dwarf reader, but I think the next step is to dig into stap and see how it handles this change. Note this seems to relate to partial inlining of nfs3_async_handle_jukebox, so it might all change again with the introduction of IEPM.
Anyway, I think we need to coordinate with someone from the stap team here. While the updated debug records seem more sensible to me, I'm working from several assumptions about what the stap code is trying to do and how it might be responding to changes in the debug info.
Thanks for looking into it, Jeff!
I believe your reference to the lexical block is not on the mark because the systemtap probes aim for the function entry point (the low_pc), not some internal point. And in the 8.2.x case, the parameter list, 0x2d13a and 0x2d143 have location.
In the 8.3.x case, this function (also) got inlined, and the inlined copy of the subroutine does not have a location for these parameters. Your second readelf didn't include the inlined copy which the stap error message's die# would identify.
For example, running 4.18.0-105.el8 (the freshest at the moment): running the same stap sript:
% stap -p4 -e 'probe module("nfsv3").function("nfs3_commit_done") { println($task) }'
semantic error: failed to retrieve location attribute for 'task' [man error::dwarf]: identifier '$task' at <input>:1:62
dieoffset: 0x8068a from /usr/lib/debug/lib/modules/4.18.0-105.el8.x86_64/kernel/fs/nfs/nfsv3.ko.debug
function: nfs3_commit_done at fs/nfs/nfs3proc.c:847:2 inlined by nfs3_commit_done at fs/nfs/nfs3proc.c:840:12
source: probe module("nfsv3").function("nfs3_commit_done") { println($task) }
% eu-readelf /usr/lib/debug/lib/modules/4.18.0-105.el8.x86_64/kernel/fs/nfs/nfsv3.ko.debug
....
[ 80667] inlined_subroutine abbrev: 166
abstract_origin (ref4) [ 7d6db]
entry_pc (addr) .text+0x00000000000004dd <nfs3_commit_done+0x2d>
GNU_entry_view (data1) 3
low_pc (addr) .text+0x00000000000004dd <nfs3_commit_done+0x2d>
high_pc (data8) 14 (.text+0x00000000000004eb <nfs3_commit_done+0x3b>)
call_file (data1) nfs3proc.c (1)
call_line (data2) 840
call_column (data1) 12
[ 8068a] formal_parameter abbrev: 26
abstract_origin (ref4) [ 7d6ed]
[ 8068f] formal_parameter abbrev: 26
abstract_origin (ref4) [ 7d6fa]
[ 80694] GNU_call_site abbrev: 21
low_pc (addr) .text+0x00000000000004e9 <nfs3_commit_done+0x39>
abstract_origin (ref4) [ 80c0d]
....
So it appears as though gcc's inlining preferences have changed, and its debuginfo generation has not kept up, just between 8.2.1 and 8.3.1.
I'm going to have to do another dive into this, meaning it's going to miss RHEL 8.1. This is function splitting in action, nfs3_commit_done is split by fnsplit pass into the small nfs3_commit_done and nfs3_commit_done.part.0 which does just the (cold) nfs_refresh_inode call. Before fnsplit, we already have # DEBUG task => NULL # DEBUG inode => NULL before that call, will need to track down exactly where that comes from. Anyway, later on we inline nfs3_commit_done.part.0 back into nfs3_commit_done, and before Richi's change we emitted clearly invalid DWARF (DW_TAG_lexical_block shouldn't have DW_TAG_formal_parameter children), but from systemtap POV perhaps a better thing. As I have been saying for years, we should have some way to express in DWARF through some extension the fact that we have partially outlined functions and have a way to mark partial inlining parts, OpenMP parallel/task/target/teams outlined regions, OpenACC regions etc., so tools like systemtap or gdb could figure out that if a user wants to stop or insert something at the start of a function, the start of the rest of partial inlinined function is certainly not that, that is somewhere deep in the middle of a function. And/or, if we are inlining the cold part of a partially inlined function back into itself, we should perhaps try harder to undo the mess fnsplit created and make sure we don't represent it as another inlined function. And the rest is just kernel's fault, KBUILD_CFLAGS += $(call cc-option, -fno-var-tracking-assignments) is ridiculous, if the kernel wants to have a workaround for each gcc wrong-code bug that ever existed by disabling the corresponding option, it would need to disable pretty much all options, including -O2 or -O0. I see https://lore.kernel.org/patchwork/patch/563362/ but don't find that in the latest kernel, nor in the RHEL8 kernel. I mean, if kernel wants to work-around a GCC 4.9 bug, it could perhaps do that for GCC 4.9 only, or say GCC 4.5-4.9 where 5/6/7/8 minor releases could be affected, but GCC5+ doesn't have this bug and the debug info quality loss is significant. (In reply to Jakub Jelinek from comment #22) > And the rest is just kernel's fault, > KBUILD_CFLAGS += $(call cc-option, -fno-var-tracking-assignments) > is ridiculous, if the kernel wants to have a workaround for each gcc > wrong-code bug that ever existed by disabling the corresponding option, it > would need to disable pretty much all options, including -O2 or -O0. > I see https://lore.kernel.org/patchwork/patch/563362/ but don't find that in > the latest kernel, nor in the RHEL8 kernel. This being tracked (for the RHEL8 kernel) as bug #1785287 This is being tracked in PR92983. |