Bug 1919965 - dwarves does not correctly handle the dwarf 5 debug format now used by GCC 11
Summary: dwarves does not correctly handle the dwarf 5 debug format now used by GCC 11
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: dwarves
Version: 34
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Arnaldo Carvalho de Melo
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-25 13:33 UTC by Daniel Berrangé
Modified: 2022-05-15 18:58 UTC (History)
16 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2022-05-15 18:58:52 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Daniel Berrangé 2021-01-25 13:33:15 UTC
Description of problem:
libvirt uses the "pdwtags" program from the dwarves RPM to validate struct ABI info in our test suite.  

Since the upgrade to  gcc-11.0.0-0.15.fc34.x86_64  we've seen bad information produced, claiming there is many entries of bogus padding being added to certain structs.

Using "sizeof" does not show this padding, only the debug symbols claim it exists.

Since libvirt code here is quite complex, I've narrowed it down to the following standalone reproducer


# cat admin.x

const THING_MAX = 16384;
typedef string thing<2>;

struct this {
  thing things<THING_MAX>;
};

struct that {
  thing things<THING_MAX>;
};


$ rpcgen  -c admin.x > admin.c
$ rpcgen  -h admin.x > admin.h
$ gcc -g -c `pkg-config --cflags --libs libtirpc`  -o admin.o admin.c
$ pdwtags --verbose admin.o


This will print out

struct {
        u_int                      things_len;           /*     0     4 */

        /* XXX 4 bytes hole, try to pack */

        thing *                    things_val;           /*     8     8 */

        /* Force padding: */
        thing *                    :64;
        thing *                    :64;
        thing *                    :64;
        thing *                    :64;
        thing *                    :64;
        thing *                    :64;
         ... repeated 16384 times...
        /* size: 0, cachelines: 0, members: 2 */
        /* sum members: 12, holes: 1, sum holes: 4 */
        /* padding: 65520 */

        /* BRAIN FART ALERT! 0 bytes != 12 (member bytes) + 0 (member bits) + 4 (byte holes) + 0 (bit holes), diff = -524288 bits */
}; /* size: 0 */


We can see there's an error message included there from pdwtags indicating there was something in the debuginfo data that it didn't like.

The "pahole" tool prints similar bogus data about padding.


If we ask  "objdump -D -S admin.o"  for a dissassembly of the admin.o file it reports "(bad)" against various asm instructions too.  eg.

Disassembly of section .debug_info:

0000000000000000 <.debug_info>:
{
   0:   d5                      (bad)  
   1:   04 00                   add    $0x0,%al
   3:   00 05 00 01 08 00       add    %al,0x80100(%rip)        # 80109 <xdr_that+0x8007c>
   9:   00 00                   add    %al,(%rax)
   b:   00 0f                   add    %cl,(%rdi)
   d:   00 00                   add    %al,(%rax)
   f:   00 00                   add    %al,(%rax)
         if (!xdr_string (xdrs, objp, 2))
  11:   1d 00 00 00 00          sbb    $0x0,%eax
        ...
  22:   e0 00                   loopne 24 <.debug_info+0x24>
        ...
                 return FALSE;
  2c:   00 00                   add    %al,(%rax)
  2e:   04 01                   add    $0x1,%al
  30:   08 00                   or     %al,(%rax)
  32:   00 00                   add    %al,(%rax)
        return TRUE;
  34:   00 04 02                add    %al,(%rdx,%rax,1)
  37:   07                      (bad)  
}

in both cases downgrading to gcc-11.0.0-0.14.fc34.x86_64 fixes it.

-0.16 is also broken the same as 0.15


What is also peculiar is that I can't make the reproducer any smaller that this admin.x example

eg if i remove 'struct this',  then that fixes debuginfo for "struct that", and vica-verca. Both of them have to be present and it causes bad debuginfo for both of them.

Version-Release number of selected component (if applicable):
gcc-11.0.0-0.15.fc34.x86_64 
dwarves-1.19-1.fc34.x86_64
binutils-2.35.1-16.fc34.x86_64

Comment 1 Jakub Jelinek 2021-01-25 13:41:19 UTC
That most likely means dwarves doesn't handle DWARF5 properly.
I think Mark has posted a patch last year: https://www.spinics.net/lists/dwarves/msg00503.html
As for objdump -S, I believe binutils-2.35.1-25.fc34 should have fixed that.

Comment 2 Daniel Berrangé 2021-01-25 14:01:25 UTC
(In reply to Jakub Jelinek from comment #1)
> As for objdump -S, I believe binutils-2.35.1-25.fc34 should have fixed that.

FWIW, I still see the same problem with that binutils version

Comment 3 Nick Clifton 2021-01-25 14:09:52 UTC
(In reply to Daniel Berrangé from comment #0)

> If we ask  "objdump -D -S admin.o"  for a dissassembly of the admin.o file
> it reports "(bad)" against various asm instructions too.  eg.
> 
> Disassembly of section .debug_info:
> 
> 0000000000000000 <.debug_info>:
> {
>    0:   d5                      (bad)  
>    1:   04 00                   add    $0x0,%al

Umm - you are disassembling the contents of a debug section.  It is not at
all surprising that the disassembly shows (bad) instructions - the bytes
are not machine code at all.

If you want to see sections like this decoded then the -W option is the one
that you want.

Comment 4 Daniel Berrangé 2021-01-25 15:18:44 UTC
(In reply to Nick Clifton from comment #3)
> (In reply to Daniel Berrangé from comment #0)
> Umm - you are disassembling the contents of a debug section.  It is not at
> all surprising that the disassembly shows (bad) instructions - the bytes
> are not machine code at all.

Opps, yes. I guess it was just luck that I didn't see this in builds with previous gcc version.

> If you want to see sections like this decoded then the -W option is the one
> that you want.

(In reply to Jakub Jelinek from comment #1)
> That most likely means dwarves doesn't handle DWARF5 properly.
> I think Mark has posted a patch last year:
> https://www.spinics.net/lists/dwarves/msg00503.html

I tested latest code from https://github.com/acmel/dwarves and it still fails, so guess more work is outstanding still

Comment 5 Arnaldo Carvalho de Melo 2021-01-28 19:55:52 UTC
Fix available at:

https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?h=DW_AT_data_bit_offset&id=b91b19840b0062b873ef6b7bdf38ddcdd8c105da

will be in the upcoming 1.20 release, to be released this week.

Comment 6 Ben Cotton 2021-02-09 16:13:44 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 34 development cycle.
Changing version to 34.

Comment 7 Andrea Bolognani 2021-05-06 09:40:31 UTC
With commit

  https://gitlab.com/libvirt/libvirt/-/commit/12dda05b7d386f10dedd5b4eb6b51c4dfce5deaf

libvirt has dropped the workaround it had added specifically for this
bug and the CI pipeline passes just fine, so it would appear to me
that this can now be CLOSED CURRENTRELEASE.

Comment 8 Ben Cotton 2022-05-12 14:50:49 UTC
This message is a reminder that Fedora Linux 34 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 34 on 2022-06-07.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '34'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 34 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 9 Mark Wielaard 2022-05-15 18:58:52 UTC
Closed current release, see comment #7


Note You need to log in before you can comment on or make changes to this bug.