Bug 1609577 - eu-strip damages c++ binary, making it crash, the unstripped binary works fine
Summary: eu-strip damages c++ binary, making it crash, the unstripped binary works fine
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: elfutils
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Mark Wielaard
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-29 19:45 UTC by Hans de Goede
Modified: 2018-08-02 22:49 UTC (History)
9 users (show)

Fixed In Version: elfutils-0.173-7.fc29
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-07-30 08:10:19 UTC


Attachments (Terms of Use)
"eu-readelf -S funguloids/src/funguloids" output (2.80 KB, text/plain)
2018-07-29 20:44 UTC, Hans de Goede
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1610964 None None None Never

Internal Links: 1610964

Description Hans de Goede 2018-07-29 19:45:04 UTC
Ok, this is a weird one. I noticed this while fixing the Fedora 29 FTBFS for funguloids (a game). I spend a lot of time debugging this before coming to the conclusion that this seems to be an issue with eu-strip

To reproduce take a gully up2date F29/rawhide x86_64 install and do:

# To get the game data files:
dnf install -y funguloids 

# To build funguloids
fedpkg clone funguloids
cd funguloids
fedpkg local

# To reproduce:
# First run the just build funguloids to verify it works:
funguloids/src/funguloids
# Now to reproduce the issue:
eu-strip funguloids/src/funguloids
funguloids/src/funguloids
# (core dump)

After a bunch of debugging I've pin-pointed the crash to:
funguloids/src/menu.cpp line 169:

MaterialPtr mat = MaterialManager::getSingleton().getByName("BlockCenter");

The problem is that getByName takes 2 parameters and the eu-strip seems to somehow mess up the passing of the second parameter when the default value is used for the second parameter. The declaration of getByName lives here:

/usr/include/OGRE/OgreMaterialManager.h line 162:

MaterialPtr getByName(const String& name,
   const String& groupName = ResourceGroupManager::AUTODETECT_RESOURCE_GROUP_NAME);

And ResourceGroupManager::AUTODETECT_RESOURCE_GROUP_NAME is declared here:

/usr/include/OGRE/OgreResourceGroupManager.h line 261 and further:

class _OgreExport ResourceGroupManager : public Singleton<ResourceGroupManager>, public ResourceAlloc
{
   ...
   static String AUTODETECT_RESOURCE_GROUP_NAME;


This string is implemented in ogre-1.9.0/OgreMain/src/OgreResourceGroupManager.cpp (part of the ogre package sources):

String ResourceGroupManager::AUTODETECT_RESOURCE_GROUP_NAME = "Autodetect";

If I change the line of code on which funguloids crashes after running eu-strip on it from its original:

MaterialPtr mat = MaterialManager::getSingleton().getByName("BlockCenter");

To:

MaterialPtr mat = MaterialManager::getSingleton().getByName("BlockCenter", "Generic");

Thus explicitly specifying the resource-group from which to load the resource, then the crash moves on to the next getByName() call.

Note that although the app crashes the real problem is an exception being raised when getting the resource, which then does not get handled very gracefully.

The exception error (found in ~/.funguloids/Ogre.log) is:

OGRE EXCEPTION(5:ItemIdentityException): Cannot find a group named  in ResourceGroupManager::isResourceGroupInitialised at /builddir/build/BUILD/ogre-1.9.0/OgreMain/src/OgreResourceGroupManager.cpp (line 1918)

And ogre-1.9.0/OgreMain/src/OgreResourceGroupManager.cpp (line 1918) is:


bool ResourceGroupManager::isResourceGroupInGlobalPool(const String& name)
{
   ...
   ResourceGroup* grp = getResourceGroup(name);
   if (!grp)
   {
      OGRE_EXCEPT(Exception::ERR_ITEM_NOT_FOUND, 
                  "Cannot find a group named " + name,
                  "ResourceGroupManager::isResourceGroupInitialised");
   }

Notice that this means that the passed name is "" while it should be the "Autodetect" value which is stored in ResourceGroupManager::AUTODETECT_RESOURCE_GROUP_NAME so it seems that eu-strip causes the symbol to point to zero initialized memory instead of to the actual symbol.

One last item of note, after running "eu-strip funguloids/src/funguloids" is 354552 bytes on my system, where as running "strip -g --strip-unneeded funguloids/src/funguloids" results in a binary of 383344 bytes, which does work properly, so it seems that eu-strip is stripping too much.

Comment 1 Hans de Goede 2018-07-29 19:46:18 UTC
Note that this is a somewhat big problem since /usr/lib/rpm/find-debuginfo.sh runs eu-strip on all binaries. For now I'm disabling debuginfo generation for funguloids with a reference to this bug to work around this.

Comment 2 Mark Wielaard 2018-07-29 20:03:01 UTC
(In reply to Hans de Goede from comment #1)
> Note that this is a somewhat big problem since
> /usr/lib/rpm/find-debuginfo.sh runs eu-strip on all binaries. For now I'm
> disabling debuginfo generation for funguloids with a reference to this bug
> to work around this.

For you know when this start happening?
And could you post an section list ([eu-readelf] -S binary) of the unstripped binary?

I just noticed (it started happening this weekend, it seemed fine last Friday) that annobin started putting unallocated sections in between allocated sections (and have sections listed in non-monotonically file offset order). eu-strip indeed doesn't like that, it expects all non-allocated sections (that can be stripped away) to come after the allocated sections (that are needed during runtime). I am trying to work around it. It isn't against the ELF spec to list sections in random order, but it does make the job of any tool adjusting ELF files much harder.

An example is:
$ echo "int main() { return 1; }" | gcc -s -xc -

$ eu-readelf -S ./a.out | more
There are 26 section headers, starting at offset 0x3688:

Section Headers:
[Nr] Name                 Type         Addr             Off      Size     ES Flags Lk Inf Al
[ 0]                      NULL         0000000000000000 00000000 00000000  0        0   0  0
[ 1] .interp              PROGBITS     00000000004002a8 000002a8 0000001c  0 A      0   0  1
[ 2] .note.ABI-tag        NOTE         00000000004002c4 000002c4 00000020  0 A      0   0  4
[ 3] .note.gnu.build-id   NOTE         00000000004002e4 000002e4 00000024  0 A      0   0  4
[ 4] .gnu.hash            GNU_HASH     0000000000400308 00000308 0000001c  0 A      5   0  8
[ 5] .dynsym              DYNSYM       0000000000400328 00000328 00000048 24 A      6   1  8
[ 6] .dynstr              STRTAB       0000000000400370 00000370 00000038  0 A      0   0  1
[ 7] .gnu.version         GNU_versym   00000000004003a8 000003a8 00000006  2 A      5   0  2
[ 8] .gnu.version_r       GNU_verneed  00000000004003b0 000003b0 00000020  0 A      6   1  8
[ 9] .rela.dyn            RELA         00000000004003d0 000003d0 00000030 24 A      5   0  8
[10] .init                PROGBITS     0000000000401000 00001000 00000017  0 AX     0   0  4
[11] .text                PROGBITS     0000000000401020 00001020 00000175  0 AX     0   0 16
[12] .fini                PROGBITS     0000000000401198 00001198 00000009  0 AX     0   0  4
[13] .rodata              PROGBITS     0000000000402000 00002000 00000010  0 A      0   0  8
[14] .gnu.build.attributes NOTE         0000000000402010 0000301c 00000510  0        0   0  4

^^^^ This ELF NOTE trips up eu-strip (note not Allocated, and file offset much later in the file than the previous and next section).

[15] .eh_frame_hdr        PROGBITS     0000000000402010 00002010 00000034  0 A      0   0  4
[16] .eh_frame            PROGBITS     0000000000402048 00002048 000000d8  0 A      0   0  8
[17] .init_array          INIT_ARRAY   0000000000403e50 00002e50 00000008  8 WA     0   0  8
[18] .fini_array          FINI_ARRAY   0000000000403e58 00002e58 00000008  8 WA     0   0  8
[19] .dynamic             DYNAMIC      0000000000403e60 00002e60 00000190 16 WA     6   0  8
[20] .got                 PROGBITS     0000000000403ff0 00002ff0 00000010  8 WA     0   0  8
[21] .got.plt             PROGBITS     0000000000404000 00003000 00000018  8 WA     0   0  8
[22] .data                PROGBITS     0000000000404018 00003018 00000004  0 WA     0   0  1
[23] .bss                 NOBITS       000000000040401c 0000301c 00000004  0 WA     0   0  1
[24] .comment             PROGBITS     0000000000000000 0000352c 00000058  1 MS     0   0  1
[25] .shstrtab            STRTAB       0000000000000000 00003584 000000ff  0        0   0  1

Comment 3 Hans de Goede 2018-07-29 20:44:37 UTC
Created attachment 1471386 [details]
"eu-readelf -S funguloids/src/funguloids" output

(In reply to Mark Wielaard from comment #2)
> Do you know when this start happening?

No I just noticed this today after upgrading my F28 install to rawhide this morning so that I can easily reproduce various FTBFS issues.

> And could you post an section list ([eu-readelf] -S binary) of the
> unstripped binary?

Attached. Note as said in the original description I expect that you can reproduce this easily (given a rawhide install inside say a vm).

> [14] .gnu.build.attributes NOTE         0000000000402010 0000301c 00000510 
> 0        0   0  4
> 
> ^^^^ This ELF NOTE trips up eu-strip (note not Allocated, and file offset
> much later in the file than the previous and next section).

I see a similar NOTE section inside the attached eu-readelf_-S_funguloids.log which is also at/near the end of the binary.

Comment 4 Mark Wielaard 2018-07-29 20:49:56 UTC
(In reply to Hans de Goede from comment #3)
> > And could you post an section list ([eu-readelf] -S binary) of the
> > unstripped binary?
> 
> Attached. Note as said in the original description I expect that you can
> reproduce this easily (given a rawhide install inside say a vm).
> 
> > [14] .gnu.build.attributes NOTE         0000000000402010 0000301c 00000510 
> > 0        0   0  4
> > 
> > ^^^^ This ELF NOTE trips up eu-strip (note not Allocated, and file offset
> > much later in the file than the previous and next section).
> 
> I see a similar NOTE section inside the attached
> eu-readelf_-S_funguloids.log which is also at/near the end of the binary.

Thanks. It is indeed the same problem.

I have CCed nickc who can maybe make it so that annobin doesn't produce these "out of order" .gnu.build.attributes ELF NOTES. I am trying to get eu-strip (and other elfutils tools) to deal with these, but it is a bit of a pain because there are multiple assumptions that non-allocated sections follow the allocated ones and that the section file offsets are monotonically increasing and don't jump around.

Comment 5 Hans de Goede 2018-07-29 20:54:51 UTC
Question, is funguloids hitting this because it somehow is doing something special or could anything hit this?

Underlying question, do we perhaps need to redo the mass-rebuild once this is fixed ... ?

Comment 6 Mark Wielaard 2018-07-29 20:56:52 UTC
(In reply to Hans de Goede from comment #5)
> Question, is funguloids hitting this because it somehow is doing something
> special or could anything hit this?

No. It seems to hit any native ELF file.
 
> Underlying question, do we perhaps need to redo the mass-rebuild once this
> is fixed ... ?

It is a fairly obvious bug that I haven't seen before this weekend. So it probably doesn't deserve a full mass-rebuild. But probably does need a rebuild of any (native) package since Friday.

Comment 7 Mark Wielaard 2018-07-29 22:45:18 UTC
I just pushed a build elfutils-0.173-7.fc29 with a somewhat ugly fix/workaround:
https://sourceware.org/ml/elfutils-devel/2018-q3/msg00053.html

Could you please test a build agains this new build to see if it fixes the issue?

Comment 8 Hans de Goede 2018-07-30 07:31:34 UTC
(In reply to Mark Wielaard from comment #7)
> I just pushed a build elfutils-0.173-7.fc29 with a somewhat ugly
> fix/workaround:
> https://sourceware.org/ml/elfutils-devel/2018-q3/msg00053.html
> 
> Could you please test a build agains this new build to see if it fixes the
> issue?

I can confirm that elfutils-0.173-7.fc29 fixes the funguloids issue, thank you for the quick fix.

Comment 9 Mark Wielaard 2018-07-30 08:10:19 UTC
Thanks for testing. I'll clean up the patch upstream. But lets consider it fixed in Fedora.

Comment 10 Hans de Goede 2018-07-30 08:50:40 UTC
(In reply to Mark Wielaard from comment #9)
> Thanks for testing. I'll clean up the patch upstream. But lets consider it
> fixed in Fedora.

I agree that this is fixed for Fedora, but I think we should figure out when the problem was introduced and rebuild all packages build after the problem was introduced.

Looking at the elfutils changelog I think the problem may have been introduced by this change? :

* Sat Jul 21 2018 Mark Wielaard <mjw@fedoraproject.org> - 0.173-5 - Add BuildRequires gcc-c++ for demangle support. - Add elfutils-0.173-annobingroup.patch. 

And the first successful build with that change is:

elfutils-0.173-6.fc29 	mjw 	2018-07-24 22:34:22

So we would need to rebuild everything build after say 2018-07-24 22:00 ?

Note I think this is the cause, but you probably know better. Can you file a ticket with rel-eng to get them to automatically rebuild the affected packages ? :

https://pagure.io/releng/new_issue

Comment 11 Jakub Jelinek 2018-07-30 08:52:54 UTC
The bug is really in binutils and should be fixed there, section offsets should be monotonically increasing, everything else is a bug.

Comment 12 Hans de Goede 2018-07-30 08:55:25 UTC
(In reply to Jakub Jelinek from comment #11)
> The bug is really in binutils and should be fixed there, section offsets
> should be monotonically increasing, everything else is a bug.

Ok, so was there some change elsewhere to trigger this binutils bug recently, or ... ?   I think we need to figure out what is causing the wrong section offsets, so that we can figure out which builds are affected by this and need to be rebuild (either with elfutils with the workaround, or with a fixed binutils).

Comment 13 Jakub Jelinek 2018-07-30 09:01:45 UTC
It is likely the annobin plugin that triggered this.

Comment 14 Mark Wielaard 2018-07-30 09:20:30 UTC
I think it is one of these in binutils:

* Fri Jul 27 2018 Nick Clifton  <nickc@redhat.com> - 2.31.1-7
- Fix a thinko in the merge patch.

* Fri Jul 27 2018 Nick Clifton  <nickc@redhat.com> - 2.31.1-6
- Fix a typo in the merge patch.

* Thu Jul 26 2018 Nick Clifton  <nickc@redhat.com> - 2.31.1-5
- Merge .gnu.build.attribute sections together.  (#1608390)

Which introduces/"fixed":

# Purpose:  Merge .gnu.build.attribute sections into a single section.
# Lifetime: Fixed in 2.32 
Patch14: binutils-merge-attribute-sections.patch

Which seems to force the unallocated .gnu.build.attributes section right in the middle of a couple of allocated sections.

Comment 15 Nick Clifton 2018-07-30 09:28:23 UTC
(In reply to Mark Wielaard from comment #14)
Hi Mark,

  Yes, it was a bit of sloppy thinking on my part.  I created a patch for
  the linker to combine all sections called .gnu.build.attributes.* into
  one section called .gnu.build.attributes.  But I put the combined section
  into the wrong place in the output file causing all of the problems that
  have been reported. :-(

  I have now fixed this bug (with binutils-2.31.1-7), putting the merged
  section into the correct place alongside other, similar read-only
  sections.

  The bug was present in binutils-2.31.1-4 -5 and -6, and probably the 
  easiest way to detect a broken binary is too look for a section called
  .gnu-.build.attributes.  (Note the dash character after .gnu).  This
  was a typo in the original -4 and -5 versions of the patch.  The typo
  was fixed in the -6 release, but that release was only in the buildroot
  for a few minutes before RM untagged it, so it is unlikely that there
  will be broken builds with the correct section name.

Cheers
  Nick

Comment 16 Mark Wielaard 2018-07-30 09:36:10 UTC
I don't believe this is fixed with binutils-2.31.1-7:

# rpm -q binutils gcc annobin
binutils-2.31.1-7.fc29.x86_64
gcc-8.1.1-5.fc29.x86_64
annobin-8.19-1.fc29.x86_64

# echo "int main() { return 1; }" | gcc -s -xc -

# eu-readelf -S ./a.out | grep -2 build.attribute
[12] .fini                PROGBITS     0000000000401198 00001198 00000009  0 AX     0   0  4
[13] .rodata              PROGBITS     0000000000402000 00002000 00000010  0 A      0   0  8
[14] .gnu.build.attributes NOTE         0000000000402010 0000301c 00000510  0        0   0  4
[15] .eh_frame_hdr        PROGBITS     0000000000402010 00002010 00000034  0 A      0   0  4
[16] .eh_frame            PROGBITS     0000000000402048 00002048 000000d8  0 A      0   0  8

Comment 17 Nick Clifton 2018-07-30 11:31:06 UTC
Yeah - true.  But it should be fixed in binutils-2.31.1-8....  For real this time.

Comment 18 Jan Kratochvil 2018-07-30 20:02:59 UTC
It should be checked with Fedora releng for rebuild of all of the affected packages, we have been now troubleshooting GDB crash due to it (Bug 1609504 + Bug 1609714). The GDB build got fixed by elfutils update.

Comment 19 Sergio Durigan Junior 2018-07-30 20:06:04 UTC
Thanks, Jan.  FWIW, I had to rebuild GDB for Rawhide as well:

https://koji.fedoraproject.org/koji/taskinfo?taskID=28722195

And yes, affected packages should be rebuilt.  Thanks.

Comment 20 Hans de Goede 2018-07-31 08:37:04 UTC
So since no-one seems to feel responsible to coordinate with rel-eng to get affected packages rebuild I've filed a ticket for this myself:

https://pagure.io/releng/issue/7670

I hope I got all the details right, if not feel free to add a correction to that ticket.

I will also send a mail to devel@lists.fdo about this so that if people see weird crashes because of this they don't waste a lot time debugging them.

Comment 21 Mark Wielaard 2018-07-31 09:02:01 UTC
Sorry, there was some discussion on irc about this, but nobody did think of filing a ticket with releng. Thanks for doing that.


Note You need to log in before you can comment on or make changes to this bug.