RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1709831 - gcc 8.2.1->8.3.1 change related to systemtap debuginfo regression in rhel-8.1.0, kernel -84 -> -85
Summary: gcc 8.2.1->8.3.1 change related to systemtap debuginfo regression in rhel-8.1...
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: gcc
Version: 8.0
Hardware: Unspecified
OS: Linux
medium
medium
Target Milestone: rc
: 8.0
Assignee: Marek Polacek
QA Contact: qe-baseos-tools-bugs
URL:
Whiteboard:
Depends On: 1785287
Blocks: 1734097
TreeView+ depends on / blocked
 
Reported: 2019-05-14 11:44 UTC by Martin Cermak
Modified: 2023-07-18 14:19 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-01-12 14:17:05 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
GNU Compiler Collection 92983 0 P3 NEW [8/9/10/11 Regression] Debug info regression since PR87428 changes 2021-01-12 14:14:40 UTC

Description Martin Cermak 2019-05-14 11:44:44 UTC
Testcase for Bug 920444 - [FJ6.4 Bug] regressed in rhel-8.1.0:

=======
 8.1 Server x86_64 # cat nfs.proc.commit_done.stp 
probe nfs.proc.commit_done{
        printf("nfs.proc.commit_done server_ip: %s\n", ip_ntop(server_ip));
}
 8.1 Server x86_64 # stap -p4 nfs.proc.commit_done.stp 
semantic error: failed to retrieve location attribute for 'task' [man error::dwarf]: identifier '$task' at /usr/share/systemtap/tapset/linux/nfs_proc.stp:1393:36
        dieoffset: 0x7fd56 from /usr/lib/debug/lib/modules/4.18.0-86.el8.x86_64/kernel/fs/nfs/nfsv3.ko.debug
        function: nfs3_commit_done at fs/nfs/nfs3proc.c:847:2 inlined by nfs3_commit_done at fs/nfs/nfs3proc.c:840:12
        source:                 server_ip = @_nfs_data_server_ip($task->tk_calldata)
                                                                 ^
        in expansion of macro: operator '@nfs_data' at /usr/share/systemtap/tapset/linux/nfs_proc.stpm:16:12
        source:         ( get_ip(&@nfs_data->task) )
                                  ^
        in expansion of macro: operator '@_nfs_data_server_ip' at /usr/share/systemtap/tapset/linux/nfs_proc.stp:1393:15
        source:                 server_ip = @_nfs_data_server_ip($task->tk_calldata)
                                            ^

semantic error: failed to retrieve location attribute for 'task' [man error::dwarf]: identifier '$task' at :1393:36
        dieoffset: 0x7fd56 from /usr/lib/debug/lib/modules/4.18.0-86.el8.x86_64/kernel/fs/nfs/nfsv3.ko.debug
        function: nfs3_commit_done at fs/nfs/nfs3proc.c:847:2 inlined by nfs3_commit_done at fs/nfs/nfs3proc.c:840:12
        source:                 server_ip = @_nfs_data_server_ip($task->tk_calldata)
                                                                 ^
        in expansion of macro: operator '@nfs_data' at /usr/share/systemtap/tapset/linux/nfs_proc.stpm:16:12
        source:         ( get_ip(&@nfs_data->task) )
                                  ^
        in expansion of macro: operator '@_nfs_data_server_ip' at /usr/share/systemtap/tapset/linux/nfs_proc.stp:1393:15
        source:                 server_ip = @_nfs_data_server_ip($task->tk_calldata)
                                            ^

semantic error: unresolved type : identifier '$task' at :1393:36
        source:                 server_ip = @_nfs_data_server_ip($task->tk_calldata)
                                                                 ^
        in expansion of macro: operator '@nfs_data' at /usr/share/systemtap/tapset/linux/nfs_proc.stpm:16:12
        source:         ( get_ip(&@nfs_data->task) )
                                  ^
        in expansion of macro: operator '@_nfs_data_server_ip' at /usr/share/systemtap/tapset/linux/nfs_proc.stp:1393:15
        source:                 server_ip = @_nfs_data_server_ip($task->tk_calldata)
                                            ^

Pass 2: analysis failed.  [man error::pass2]
(1)  8.1 Server x86_64 # rpm -q systemtap
systemtap-4.1-1.el8.x86_64
 8.1 Server x86_64 # 
 8.1 Server x86_64 # uname -r
4.18.0-86.el8.x86_64
 8.1 Server x86_64 # 

=======

Comment 1 Martin Cermak 2019-05-14 11:47:18 UTC
Historically this testcase was passing with systemtap-4.0-1 on top of 4.18.0-0.rc7.3.el8+7.

Upstream systemtap-4.2 (6494d4c3c) doesn't have a fix for this.

Comment 3 Frank Ch. Eigler 2019-05-17 17:28:53 UTC
Checked the kernel -84 -> -85 diffs.  Nothing relevant to NFS or compiler flags.  Next thing to check may be the compiler version that built those rpms.

Comment 12 Frank Ch. Eigler 2019-06-04 21:56:57 UTC
If you have any old RHEL8 vm, just installing a pre- and post -85 kernel, with their -devel and -debuginfo, should allow you to do:

stap -r 4.18.0-86.el8.x86_64 -p4 -e 'probe module("nfsv3").function("nfs3_commit_done") { println($task) }'

stap -r 4.18.0-83.el8.x86_64 -p4 -e 'probe module("nfsv3").function("nfs3_commit_done") { println($task) }'

You will not need to reboot.

Alternately, unpack the debuginfo RPMs for pre and post 85 kernels, find the nfsv3.ko.debug files, and compare debuginfo for that inlined function via eu-readelf -w etc.

Comment 13 Jeff Law 2019-06-04 23:43:11 UTC
Thanks.  I can kind-of see the differences in the debug records for nfs3_commit_done.  Not sure what they all mean yet, but I've at least got a place where I can start looking.   There's a couple of relevant BZs for changes in the debug symbols we generate.  Anyway, will continue poking tomorrow.

Comment 14 Jeff Law 2019-06-05 16:30:05 UTC
This looks like an intentional change.  In gcc-8.2.x and earlier we have something like this:

[ 2d11e]    subprogram           abbrev: 187
             abstract_origin      (ref4) [ 2a3b4]
             low_pc               (addr) .text+0x00000000000002e0 <nfs3_commit_done>
             high_pc              (data8) 92 (.text+0x000000000000033c)
             frame_base           (exprloc)
              [ 0] call_frame_cfa
             sibling              (ref4) [ 2d1ab]
 [ 2d13a]      formal_parameter     abbrev: 95
               abstract_origin      (ref4) [ 2a3c6]
               location             (sec_offset) location list [  1ed0]
 [ 2d143]      formal_parameter     abbrev: 95
               abstract_origin      (ref4) [ 2a3d3]
               location             (sec_offset) location list [  1f2c]
 [ 2d14c]      inlined_subroutine   abbrev: 50
               abstract_origin      (ref4) [ 2c856]
               entry_pc             (addr) .text+0x0000000000000304 <nfs3_commit_done+0x24>
               ranges               (sec_offset) range list [   200]
               call_file            (data1) nfs3proc.c (1)
               call_line            (data2) 845
               call_column          (data1) 6
               sibling              (ref4) [ 2d17d]
 [ 2d165]        formal_parameter     abbrev: 25
                 abstract_origin      (ref4) [ 2c873]
 [ 2d16a]        formal_parameter     abbrev: 25
                 abstract_origin      (ref4) [ 2c867]
 [ 2d16f]        GNU_call_site        abbrev: 21
                 low_pc               (addr) .text+0x0000000000000328 <nfs3_commit_done+0x48>
                 abstract_origin      (ref4) [ 2d064]
 [ 2d17d]      lexical_block        abbrev: 43
               abstract_origin      (ref4) [ 2a3b4]
               low_pc               (addr) .text+0x000000000000030d <nfs3_commit_done+0x2d>
               high_pc              (data8) 14 (.text+0x000000000000031b <nfs3_commit_done+0x3b>)
 [ 2d192]        formal_parameter     abbrev: 25
                 abstract_origin      (ref4) [ 2a3c6]
 [ 2d197]        formal_parameter     abbrev: 25
                 abstract_origin      (ref4) [ 2a3d3]
 [ 2d19c]        GNU_call_site        abbrev: 21
                 low_pc               (addr) .text+0x0000000000000319 <nfs3_commit_done+0x39>
                 abstract_origin      (ref4) [ 2d6cf]

Of particular interest is the lexical block at 2d17d which then has two parameters, one of which is the task parameter we're interested in.  I can't immediately think of a case where a lexical block should have parameters, Richi found it odd as well, but couldn't find anything which would make that strictly invalid.

After this upstream change form Richi:

commit c0bce4a7a8657012b377a576d3a1b6dbd5d5ef99
Author: rguenth <rguenth@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Mon Oct 29 14:13:56 2018 +0000

    2018-10-29  Richard Biener  <rguenther>
    
            Backport from mainline
            2018-09-26  Richard Biener  <rguenther>
    
            PR debug/87428
            PR debug/87362
            * tree-inline.c (expand_call_inline): When the location
            of the call is UNKNOWN_LOCATION use DECL_SOURCE_LOCATION
            or BUILTINS_LOCATION for the BLOCK_SOURCE_LOCATION of
            the inserted BLOCK to make inlined_function_outer_scope_p
            recognize it.
            * dwarf2out.c (add_call_src_coords_attributes): Do not add
            coords for reserved locations.
    
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-8-branch@265590 138bc75d-0d04-0410-961f-82ee72b054a4

The records for nfs3_commit_done looks like this:

 [ 2d11e]    subprogram           abbrev: 187
             abstract_origin      (ref4) [ 2a3b4]
             low_pc               (addr) .text+0x00000000000002e0 <nfs3_commit_done>
             high_pc              (data8) 92 (.text+0x000000000000033c)
             frame_base           (exprloc)
              [ 0] call_frame_cfa
             sibling              (ref4) [ 2d1b0]
 [ 2d13a]      formal_parameter     abbrev: 95
               abstract_origin      (ref4) [ 2a3c6]
               location             (sec_offset) location list [  1ed0]
 [ 2d143]      formal_parameter     abbrev: 95
               abstract_origin      (ref4) [ 2a3d3]
               location             (sec_offset) location list [  1f2c]
 [ 2d14c]      inlined_subroutine   abbrev: 50
               abstract_origin      (ref4) [ 2c856]
               entry_pc             (addr) .text+0x0000000000000304 <nfs3_commit_done+0x24>
               ranges               (sec_offset) range list [   200]
               call_file            (data1) nfs3proc.c (1)
               call_line            (data2) 845
               call_column          (data1) 6
               sibling              (ref4) [ 2d17d]
 [ 2d165]        formal_parameter     abbrev: 25
                 abstract_origin      (ref4) [ 2c873]
 [ 2d16a]        formal_parameter     abbrev: 25
                 abstract_origin      (ref4) [ 2c867]
 [ 2d16f]        GNU_call_site        abbrev: 21
                 low_pc               (addr) .text+0x0000000000000328 <nfs3_commit_done+0x48>
                 abstract_origin      (ref4) [ 2d064]
 [ 2d17d]      inlined_subroutine   abbrev: 141
               abstract_origin      (ref4) [ 2a3b4]
               low_pc               (addr) .text+0x000000000000030d <nfs3_commit_done+0x2d>
               high_pc              (data8) 14 (.text+0x000000000000031b <nfs3_commit_done+0x3b>)
               call_file            (data1) nfs3proc.c (1)
               call_line            (data2) 840
               call_column          (data1) 12
 [ 2d197]        formal_parameter     abbrev: 25
                 abstract_origin      (ref4) [ 2a3c6]
 [ 2d19c]        formal_parameter     abbrev: 25
                 abstract_origin      (ref4) [ 2a3d3]
 [ 2d1a1]        GNU_call_site        abbrev: 21
                 low_pc               (addr) .text+0x0000000000000319 <nfs3_commit_done+0x39>
                 abstract_origin      (ref4) [ 2d6e4]

In particular note how the lexical block at 2d17d has changed into an inlined_subroutine with two formal parameters, one of which is the "task" parameter we want.  That seems more sensible, but seems to be running afoul of stap.

I'm not familiar enough with the stap code and its dwarf reader, but I think the next step is to dig into stap and see how it handles this change.  Note this seems to relate to partial inlining of nfs3_async_handle_jukebox, so it might all change again with the introduction of IEPM.

Anyway, I think we need to coordinate with someone from the stap team here.  While the updated debug records seem more sensible to me, I'm working from several assumptions about what the stap code is trying to do and how it might be responding to changes in the debug info.

Comment 15 Frank Ch. Eigler 2019-06-13 18:14:21 UTC
Thanks for looking into it, Jeff!

I believe your reference to the lexical block is not on the mark because the systemtap probes aim for the function entry point (the low_pc), not some internal point.  And in the 8.2.x case, the parameter list, 0x2d13a and 0x2d143 have location.

In the 8.3.x case, this function (also) got inlined, and the inlined copy of the subroutine does not have a location for these parameters.  Your second readelf didn't include the inlined copy which the stap error message's die# would identify.

For example, running 4.18.0-105.el8 (the freshest at the moment): running the same stap sript:

% stap -p4 -e 'probe module("nfsv3").function("nfs3_commit_done") { println($task) }'

semantic error: failed to retrieve location attribute for 'task' [man error::dwarf]: identifier '$task' at <input>:1:62
        dieoffset: 0x8068a from /usr/lib/debug/lib/modules/4.18.0-105.el8.x86_64/kernel/fs/nfs/nfsv3.ko.debug
        function: nfs3_commit_done at fs/nfs/nfs3proc.c:847:2 inlined by nfs3_commit_done at fs/nfs/nfs3proc.c:840:12
        source: probe module("nfsv3").function("nfs3_commit_done") { println($task) }

% eu-readelf /usr/lib/debug/lib/modules/4.18.0-105.el8.x86_64/kernel/fs/nfs/nfsv3.ko.debug

....

 [ 80667]      inlined_subroutine   abbrev: 166
               abstract_origin      (ref4) [ 7d6db]
               entry_pc             (addr) .text+0x00000000000004dd <nfs3_commit_done+0x2d>
               GNU_entry_view       (data1) 3
               low_pc               (addr) .text+0x00000000000004dd <nfs3_commit_done+0x2d>
               high_pc              (data8) 14 (.text+0x00000000000004eb <nfs3_commit_done+0x3b>)
               call_file            (data1) nfs3proc.c (1)
               call_line            (data2) 840
               call_column          (data1) 12
 [ 8068a]        formal_parameter     abbrev: 26
                 abstract_origin      (ref4) [ 7d6ed]
 [ 8068f]        formal_parameter     abbrev: 26
                 abstract_origin      (ref4) [ 7d6fa]
 [ 80694]        GNU_call_site        abbrev: 21
                 low_pc               (addr) .text+0x00000000000004e9 <nfs3_commit_done+0x39>
                 abstract_origin      (ref4) [ 80c0d]

....


So it appears as though gcc's inlining preferences have changed, and its debuginfo generation has not kept up, just between 8.2.1 and 8.3.1.

Comment 16 Jeff Law 2019-06-14 17:44:34 UTC
I'm going to have to do another dive into this, meaning it's going to miss RHEL 8.1.

Comment 21 Jakub Jelinek 2019-12-17 21:34:49 UTC
This is function splitting in action, nfs3_commit_done is split by fnsplit pass into the small nfs3_commit_done and nfs3_commit_done.part.0 which does just the (cold) nfs_refresh_inode call.
Before fnsplit, we already have
  # DEBUG task => NULL
  # DEBUG inode => NULL
before that call, will need to track down exactly where that comes from.  Anyway, later on we inline nfs3_commit_done.part.0 back into nfs3_commit_done, and before Richi's change we emitted clearly invalid DWARF (DW_TAG_lexical_block shouldn't have DW_TAG_formal_parameter children), but from systemtap POV perhaps a better thing.
As I have been saying for years, we should have some way to express in DWARF through some extension the fact that we have partially outlined functions and have a way to mark partial inlining parts, OpenMP parallel/task/target/teams outlined regions, OpenACC regions etc., so tools like systemtap or gdb could figure out that if a user wants to stop or insert something at the start of a function, the start of the rest of partial inlinined function is certainly not that, that is somewhere deep in the middle of a function.
And/or, if we are inlining the cold part of a partially inlined function back into itself, we should perhaps try harder to undo the mess fnsplit created and make sure we don't represent it as another inlined function.

Comment 22 Jakub Jelinek 2019-12-18 12:28:09 UTC
And the rest is just kernel's fault,
KBUILD_CFLAGS   += $(call cc-option, -fno-var-tracking-assignments)
is ridiculous, if the kernel wants to have a workaround for each gcc wrong-code bug that ever existed by disabling the corresponding option, it would need to disable pretty much all options, including -O2 or -O0.
I see https://lore.kernel.org/patchwork/patch/563362/ but don't find that in the latest kernel, nor in the RHEL8 kernel.
I mean, if kernel wants to work-around a GCC 4.9 bug, it could perhaps do that for GCC 4.9 only, or say GCC 4.5-4.9 where 5/6/7/8 minor releases could be affected, but GCC5+ doesn't have this bug and the debug info quality loss is significant.

Comment 23 Mark Wielaard 2020-03-17 10:36:36 UTC
(In reply to Jakub Jelinek from comment #22)
> And the rest is just kernel's fault,
> KBUILD_CFLAGS   += $(call cc-option, -fno-var-tracking-assignments)
> is ridiculous, if the kernel wants to have a workaround for each gcc
> wrong-code bug that ever existed by disabling the corresponding option, it
> would need to disable pretty much all options, including -O2 or -O0.
> I see https://lore.kernel.org/patchwork/patch/563362/ but don't find that in
> the latest kernel, nor in the RHEL8 kernel.

This being tracked (for the RHEL8 kernel) as bug #1785287

Comment 29 Marek Polacek 2021-01-12 14:17:05 UTC
This is being tracked in PR92983.


Note You need to log in before you can comment on or make changes to this bug.