Ok, this is a weird one. I noticed this while fixing the Fedora 29 FTBFS for funguloids (a game). I spend a lot of time debugging this before coming to the conclusion that this seems to be an issue with eu-strip To reproduce take a gully up2date F29/rawhide x86_64 install and do: # To get the game data files: dnf install -y funguloids # To build funguloids fedpkg clone funguloids cd funguloids fedpkg local # To reproduce: # First run the just build funguloids to verify it works: funguloids/src/funguloids # Now to reproduce the issue: eu-strip funguloids/src/funguloids funguloids/src/funguloids # (core dump) After a bunch of debugging I've pin-pointed the crash to: funguloids/src/menu.cpp line 169: MaterialPtr mat = MaterialManager::getSingleton().getByName("BlockCenter"); The problem is that getByName takes 2 parameters and the eu-strip seems to somehow mess up the passing of the second parameter when the default value is used for the second parameter. The declaration of getByName lives here: /usr/include/OGRE/OgreMaterialManager.h line 162: MaterialPtr getByName(const String& name, const String& groupName = ResourceGroupManager::AUTODETECT_RESOURCE_GROUP_NAME); And ResourceGroupManager::AUTODETECT_RESOURCE_GROUP_NAME is declared here: /usr/include/OGRE/OgreResourceGroupManager.h line 261 and further: class _OgreExport ResourceGroupManager : public Singleton<ResourceGroupManager>, public ResourceAlloc { ... static String AUTODETECT_RESOURCE_GROUP_NAME; This string is implemented in ogre-1.9.0/OgreMain/src/OgreResourceGroupManager.cpp (part of the ogre package sources): String ResourceGroupManager::AUTODETECT_RESOURCE_GROUP_NAME = "Autodetect"; If I change the line of code on which funguloids crashes after running eu-strip on it from its original: MaterialPtr mat = MaterialManager::getSingleton().getByName("BlockCenter"); To: MaterialPtr mat = MaterialManager::getSingleton().getByName("BlockCenter", "Generic"); Thus explicitly specifying the resource-group from which to load the resource, then the crash moves on to the next getByName() call. Note that although the app crashes the real problem is an exception being raised when getting the resource, which then does not get handled very gracefully. The exception error (found in ~/.funguloids/Ogre.log) is: OGRE EXCEPTION(5:ItemIdentityException): Cannot find a group named in ResourceGroupManager::isResourceGroupInitialised at /builddir/build/BUILD/ogre-1.9.0/OgreMain/src/OgreResourceGroupManager.cpp (line 1918) And ogre-1.9.0/OgreMain/src/OgreResourceGroupManager.cpp (line 1918) is: bool ResourceGroupManager::isResourceGroupInGlobalPool(const String& name) { ... ResourceGroup* grp = getResourceGroup(name); if (!grp) { OGRE_EXCEPT(Exception::ERR_ITEM_NOT_FOUND, "Cannot find a group named " + name, "ResourceGroupManager::isResourceGroupInitialised"); } Notice that this means that the passed name is "" while it should be the "Autodetect" value which is stored in ResourceGroupManager::AUTODETECT_RESOURCE_GROUP_NAME so it seems that eu-strip causes the symbol to point to zero initialized memory instead of to the actual symbol. One last item of note, after running "eu-strip funguloids/src/funguloids" is 354552 bytes on my system, where as running "strip -g --strip-unneeded funguloids/src/funguloids" results in a binary of 383344 bytes, which does work properly, so it seems that eu-strip is stripping too much.
Note that this is a somewhat big problem since /usr/lib/rpm/find-debuginfo.sh runs eu-strip on all binaries. For now I'm disabling debuginfo generation for funguloids with a reference to this bug to work around this.
(In reply to Hans de Goede from comment #1) > Note that this is a somewhat big problem since > /usr/lib/rpm/find-debuginfo.sh runs eu-strip on all binaries. For now I'm > disabling debuginfo generation for funguloids with a reference to this bug > to work around this. For you know when this start happening? And could you post an section list ([eu-readelf] -S binary) of the unstripped binary? I just noticed (it started happening this weekend, it seemed fine last Friday) that annobin started putting unallocated sections in between allocated sections (and have sections listed in non-monotonically file offset order). eu-strip indeed doesn't like that, it expects all non-allocated sections (that can be stripped away) to come after the allocated sections (that are needed during runtime). I am trying to work around it. It isn't against the ELF spec to list sections in random order, but it does make the job of any tool adjusting ELF files much harder. An example is: $ echo "int main() { return 1; }" | gcc -s -xc - $ eu-readelf -S ./a.out | more There are 26 section headers, starting at offset 0x3688: Section Headers: [Nr] Name Type Addr Off Size ES Flags Lk Inf Al [ 0] NULL 0000000000000000 00000000 00000000 0 0 0 0 [ 1] .interp PROGBITS 00000000004002a8 000002a8 0000001c 0 A 0 0 1 [ 2] .note.ABI-tag NOTE 00000000004002c4 000002c4 00000020 0 A 0 0 4 [ 3] .note.gnu.build-id NOTE 00000000004002e4 000002e4 00000024 0 A 0 0 4 [ 4] .gnu.hash GNU_HASH 0000000000400308 00000308 0000001c 0 A 5 0 8 [ 5] .dynsym DYNSYM 0000000000400328 00000328 00000048 24 A 6 1 8 [ 6] .dynstr STRTAB 0000000000400370 00000370 00000038 0 A 0 0 1 [ 7] .gnu.version GNU_versym 00000000004003a8 000003a8 00000006 2 A 5 0 2 [ 8] .gnu.version_r GNU_verneed 00000000004003b0 000003b0 00000020 0 A 6 1 8 [ 9] .rela.dyn RELA 00000000004003d0 000003d0 00000030 24 A 5 0 8 [10] .init PROGBITS 0000000000401000 00001000 00000017 0 AX 0 0 4 [11] .text PROGBITS 0000000000401020 00001020 00000175 0 AX 0 0 16 [12] .fini PROGBITS 0000000000401198 00001198 00000009 0 AX 0 0 4 [13] .rodata PROGBITS 0000000000402000 00002000 00000010 0 A 0 0 8 [14] .gnu.build.attributes NOTE 0000000000402010 0000301c 00000510 0 0 0 4 ^^^^ This ELF NOTE trips up eu-strip (note not Allocated, and file offset much later in the file than the previous and next section). [15] .eh_frame_hdr PROGBITS 0000000000402010 00002010 00000034 0 A 0 0 4 [16] .eh_frame PROGBITS 0000000000402048 00002048 000000d8 0 A 0 0 8 [17] .init_array INIT_ARRAY 0000000000403e50 00002e50 00000008 8 WA 0 0 8 [18] .fini_array FINI_ARRAY 0000000000403e58 00002e58 00000008 8 WA 0 0 8 [19] .dynamic DYNAMIC 0000000000403e60 00002e60 00000190 16 WA 6 0 8 [20] .got PROGBITS 0000000000403ff0 00002ff0 00000010 8 WA 0 0 8 [21] .got.plt PROGBITS 0000000000404000 00003000 00000018 8 WA 0 0 8 [22] .data PROGBITS 0000000000404018 00003018 00000004 0 WA 0 0 1 [23] .bss NOBITS 000000000040401c 0000301c 00000004 0 WA 0 0 1 [24] .comment PROGBITS 0000000000000000 0000352c 00000058 1 MS 0 0 1 [25] .shstrtab STRTAB 0000000000000000 00003584 000000ff 0 0 0 1
Created attachment 1471386 [details] "eu-readelf -S funguloids/src/funguloids" output (In reply to Mark Wielaard from comment #2) > Do you know when this start happening? No I just noticed this today after upgrading my F28 install to rawhide this morning so that I can easily reproduce various FTBFS issues. > And could you post an section list ([eu-readelf] -S binary) of the > unstripped binary? Attached. Note as said in the original description I expect that you can reproduce this easily (given a rawhide install inside say a vm). > [14] .gnu.build.attributes NOTE 0000000000402010 0000301c 00000510 > 0 0 0 4 > > ^^^^ This ELF NOTE trips up eu-strip (note not Allocated, and file offset > much later in the file than the previous and next section). I see a similar NOTE section inside the attached eu-readelf_-S_funguloids.log which is also at/near the end of the binary.
(In reply to Hans de Goede from comment #3) > > And could you post an section list ([eu-readelf] -S binary) of the > > unstripped binary? > > Attached. Note as said in the original description I expect that you can > reproduce this easily (given a rawhide install inside say a vm). > > > [14] .gnu.build.attributes NOTE 0000000000402010 0000301c 00000510 > > 0 0 0 4 > > > > ^^^^ This ELF NOTE trips up eu-strip (note not Allocated, and file offset > > much later in the file than the previous and next section). > > I see a similar NOTE section inside the attached > eu-readelf_-S_funguloids.log which is also at/near the end of the binary. Thanks. It is indeed the same problem. I have CCed nickc who can maybe make it so that annobin doesn't produce these "out of order" .gnu.build.attributes ELF NOTES. I am trying to get eu-strip (and other elfutils tools) to deal with these, but it is a bit of a pain because there are multiple assumptions that non-allocated sections follow the allocated ones and that the section file offsets are monotonically increasing and don't jump around.
Question, is funguloids hitting this because it somehow is doing something special or could anything hit this? Underlying question, do we perhaps need to redo the mass-rebuild once this is fixed ... ?
(In reply to Hans de Goede from comment #5) > Question, is funguloids hitting this because it somehow is doing something > special or could anything hit this? No. It seems to hit any native ELF file. > Underlying question, do we perhaps need to redo the mass-rebuild once this > is fixed ... ? It is a fairly obvious bug that I haven't seen before this weekend. So it probably doesn't deserve a full mass-rebuild. But probably does need a rebuild of any (native) package since Friday.
I just pushed a build elfutils-0.173-7.fc29 with a somewhat ugly fix/workaround: https://sourceware.org/ml/elfutils-devel/2018-q3/msg00053.html Could you please test a build agains this new build to see if it fixes the issue?
(In reply to Mark Wielaard from comment #7) > I just pushed a build elfutils-0.173-7.fc29 with a somewhat ugly > fix/workaround: > https://sourceware.org/ml/elfutils-devel/2018-q3/msg00053.html > > Could you please test a build agains this new build to see if it fixes the > issue? I can confirm that elfutils-0.173-7.fc29 fixes the funguloids issue, thank you for the quick fix.
Thanks for testing. I'll clean up the patch upstream. But lets consider it fixed in Fedora.
(In reply to Mark Wielaard from comment #9) > Thanks for testing. I'll clean up the patch upstream. But lets consider it > fixed in Fedora. I agree that this is fixed for Fedora, but I think we should figure out when the problem was introduced and rebuild all packages build after the problem was introduced. Looking at the elfutils changelog I think the problem may have been introduced by this change? : * Sat Jul 21 2018 Mark Wielaard <mjw> - 0.173-5 - Add BuildRequires gcc-c++ for demangle support. - Add elfutils-0.173-annobingroup.patch. And the first successful build with that change is: elfutils-0.173-6.fc29 mjw 2018-07-24 22:34:22 So we would need to rebuild everything build after say 2018-07-24 22:00 ? Note I think this is the cause, but you probably know better. Can you file a ticket with rel-eng to get them to automatically rebuild the affected packages ? : https://pagure.io/releng/new_issue
The bug is really in binutils and should be fixed there, section offsets should be monotonically increasing, everything else is a bug.
(In reply to Jakub Jelinek from comment #11) > The bug is really in binutils and should be fixed there, section offsets > should be monotonically increasing, everything else is a bug. Ok, so was there some change elsewhere to trigger this binutils bug recently, or ... ? I think we need to figure out what is causing the wrong section offsets, so that we can figure out which builds are affected by this and need to be rebuild (either with elfutils with the workaround, or with a fixed binutils).
It is likely the annobin plugin that triggered this.
I think it is one of these in binutils: * Fri Jul 27 2018 Nick Clifton <nickc> - 2.31.1-7 - Fix a thinko in the merge patch. * Fri Jul 27 2018 Nick Clifton <nickc> - 2.31.1-6 - Fix a typo in the merge patch. * Thu Jul 26 2018 Nick Clifton <nickc> - 2.31.1-5 - Merge .gnu.build.attribute sections together. (#1608390) Which introduces/"fixed": # Purpose: Merge .gnu.build.attribute sections into a single section. # Lifetime: Fixed in 2.32 Patch14: binutils-merge-attribute-sections.patch Which seems to force the unallocated .gnu.build.attributes section right in the middle of a couple of allocated sections.
(In reply to Mark Wielaard from comment #14) Hi Mark, Yes, it was a bit of sloppy thinking on my part. I created a patch for the linker to combine all sections called .gnu.build.attributes.* into one section called .gnu.build.attributes. But I put the combined section into the wrong place in the output file causing all of the problems that have been reported. :-( I have now fixed this bug (with binutils-2.31.1-7), putting the merged section into the correct place alongside other, similar read-only sections. The bug was present in binutils-2.31.1-4 -5 and -6, and probably the easiest way to detect a broken binary is too look for a section called .gnu-.build.attributes. (Note the dash character after .gnu). This was a typo in the original -4 and -5 versions of the patch. The typo was fixed in the -6 release, but that release was only in the buildroot for a few minutes before RM untagged it, so it is unlikely that there will be broken builds with the correct section name. Cheers Nick
I don't believe this is fixed with binutils-2.31.1-7: # rpm -q binutils gcc annobin binutils-2.31.1-7.fc29.x86_64 gcc-8.1.1-5.fc29.x86_64 annobin-8.19-1.fc29.x86_64 # echo "int main() { return 1; }" | gcc -s -xc - # eu-readelf -S ./a.out | grep -2 build.attribute [12] .fini PROGBITS 0000000000401198 00001198 00000009 0 AX 0 0 4 [13] .rodata PROGBITS 0000000000402000 00002000 00000010 0 A 0 0 8 [14] .gnu.build.attributes NOTE 0000000000402010 0000301c 00000510 0 0 0 4 [15] .eh_frame_hdr PROGBITS 0000000000402010 00002010 00000034 0 A 0 0 4 [16] .eh_frame PROGBITS 0000000000402048 00002048 000000d8 0 A 0 0 8
Yeah - true. But it should be fixed in binutils-2.31.1-8.... For real this time.
It should be checked with Fedora releng for rebuild of all of the affected packages, we have been now troubleshooting GDB crash due to it (Bug 1609504 + Bug 1609714). The GDB build got fixed by elfutils update.
Thanks, Jan. FWIW, I had to rebuild GDB for Rawhide as well: https://koji.fedoraproject.org/koji/taskinfo?taskID=28722195 And yes, affected packages should be rebuilt. Thanks.
So since no-one seems to feel responsible to coordinate with rel-eng to get affected packages rebuild I've filed a ticket for this myself: https://pagure.io/releng/issue/7670 I hope I got all the details right, if not feel free to add a correction to that ticket. I will also send a mail to devel about this so that if people see weird crashes because of this they don't waste a lot time debugging them.
Sorry, there was some discussion on irc about this, but nobody did think of filing a ticket with releng. Thanks for doing that.