Bug 1584711

Summary: ld runs out of memory when linking mame-0.198-1
Product: [Fedora] Fedora Reporter: Julian Sikorski <belegdol>
Component: mameAssignee: Julian Sikorski <belegdol>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: belegdol, nickc, pbrobinson
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 0.200-1.fc29 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-19 11:44:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 485251    

Description Julian Sikorski 2018-05-31 13:29:13 UTC
Description of problem:


Version-Release number of selected component (if applicable):
0.198-1

How reproducible:
always

Steps to Reproduce:
1. Get 0.198-1 from git (any branch will do)
2. Run fedpkg build

Actual results:
Build succeeds on x86_64, i686, s390x and aarch64 but fails on armv7hl due to insufficient memory. On f27 and f28 the error says:
/usr/bin/ld: cannot size stub section: Memory exhausted
whereas on rawhide:
/usr/bin/ld: failed to set dynamic section sizes: Memory exhausted

Expected results:
Build succeeds on all architectures

Additional info:
Relevant kpji tasks:
https://koji.fedoraproject.org/koji/taskinfo?taskID=27319209
https://koji.fedoraproject.org/koji/taskinfo?taskID=27319211
https://koji.fedoraproject.org/koji/taskinfo?taskID=27319237

Comment 1 Peter Robinson 2018-06-04 17:23:12 UTC
The ARMv7 builders have 24Gb of RAM, but I suspect this is an issue with single process which on 32 bit would be ~ 3Gb, but makes me wonder why x86 32 bit is fine an what differs between the two, and why this has regressed in F-28

Comment 2 Julian Sikorski 2018-06-04 17:33:46 UTC
Hi,
it didn't regress in F28, it regressed in mame-0.198. 0.197 builds fine on armv7hl on all branches:
https://koji.fedoraproject.org/koji/packageinfo?packageID=22597

Comment 3 Peter Robinson 2018-06-05 02:07:39 UTC
> it didn't regress in F28, it regressed in mame-0.198. 0.197 builds fine on
> armv7hl on all branches:

Has it been reported upstream?

Comment 4 Julian Sikorski 2018-06-06 05:42:54 UTC
It has now: https://github.com/mamedev/mame/issues/3639

Comment 5 Julian Sikorski 2018-06-07 16:34:29 UTC
FYI, adding -Wl,--no-keep-memory -Wl,--reduce-memory-overheads to LDFLAGS did not help:
koji.fedoraproject.org/koji/taskinfo?taskID=27460043
I am being told the warnings of type:
../../../../../scripts/mame_mame/liboptional.a(coco_gmc.o):(.rodata+0x6c): multiple definition of `typeinfo name for device_finder<device_cococart_interface, false>'
../../../../../scripts/mame_mame/liboptional.a(coco_dcmodem.o):(.rodata+0x6c): first defined here
could be a bug in binutils - Nick, please may you advise? Could this be related to OOM issue?

Comment 6 Julian Sikorski 2018-06-07 16:42:58 UTC
Please note that on f26 and

Comment 7 Julian Sikorski 2018-06-07 16:44:14 UTC
Please note that on f26 and f27 ld actually crashes and the following is written into the log:
/usr/bin/ld: BFD version 2.29.1-23.fc28 assertion fail elf32-arm.c:4812

Comment 8 Nick Clifton 2018-06-08 11:36:51 UTC
Hi Juliian,

> I am being told the warnings of type:
> ../../../../../scripts/mame_mame/liboptional.a(coco_gmc.o):(.rodata+0x6c):
> multiple definition of `typeinfo name for
> device_finder<device_cococart_interface, false>'
> ../../../../../scripts/mame_mame/liboptional.a(coco_dcmodem.o):(.
> rodata+0x6c): first defined here
> could be a bug in binutils - Nick, please may you advise? Could this be
> related to OOM issue?

In theory no, but in practice I bet that it is.  Given that mame builds
just fine on other architectures, I would have to assume that the OOM
problem is the culprit.

A couple of suggestions:

  * Can you compile with -Os instead of -O2 ?

  * Does linking with the gold linker (-fuse-ld=gold) work ?

Cheers
  Nick

Comment 9 Julian Sikorski 2018-06-09 16:23:34 UTC
Hi Nick,
thanks for the pointers! I had to remove -Wl,reduce-memory-overheads when using ld.gold. Unfortunately the issue of multiple definitions and OOM persist:
https://koji.fedoraproject.org/koji/taskinfo?taskID=27501609

Comment 10 Julian Sikorski 2018-06-10 15:47:05 UTC
With 0s the OOM no longer occurs but linking still fails due to multiple declaration issue:
https://koji.fedoraproject.org/koji/taskinfo?taskID=27533958

Comment 11 Julian Sikorski 2018-06-10 15:59:13 UTC
Actually I realised that the build linked in comment 5 also only fails due to multiple declaration and not due to OOM.
According to upstream the code is correct, which is also supported by the fact that it links fine on other architectures. There is a following comment in github issue 3605:
You need a newer linker. It's legal to have the same explicit template instantiation in different compilation units - it's only illegal to duplicate it in the same compilation unit. This became an issue with C++11, so a linker not updated to handle C++11 may not handle this correctly.

Comment 12 Nick Clifton 2018-06-11 13:15:36 UTC
Hi Julian,

  Hmmm - but you are linking with the latest official release from the
  FSF, so the multiple definition problems are not due to using an out
  of date release.  (Unless that comment from github was referring to
  a fix being in the current linker development sources, rather than
  in a released linker).

  Does the multiple definition failure happen for other architectures
  if you link with the gold linker and compile with -Os ?  (I am 
  wondering if this is a generic problem or ARM specific.  I assume
  that it will be generic, but it is good to be sure).

  Did compiling with -Os and then linking with ld.bfd work ?

  I am not sure what else to suggest.  It seems to me that mame might
  just be too big to link on the ARM.  :-(

Cheers
  Nick

Comment 13 Julian Sikorski 2018-06-11 16:51:50 UTC
Hi Nick,
I am not sure what the comment from github was referring to, but I believe Vas was pointing out that the code is correct and linker is at fault.
The multiple definition failure happens on %arm only it seems. So far I have tried the following (all approaches already use -g1):
- -g1 only: OOM and multiple definition [1]
- -Wl,--no-keep-memory -Wl,--reduce-memory-overheads and -g1: no OOM, failure due to multiple definition [2]
- -Wl,--no-keep-memory -fuse-ld=gold: OOM and multiple definition [3]
- -Os and -Wl,--no-keep-memory -fuse-ld=gold: no OOM, failure due to multiple definition [4]

Summing up, in two out of 4 listed cases (and the issue reported in github issue 3605), the linking failure seems to be due to multiple definition alone without specific mention of OOM. Is it possible that ld is failing due to insufficient memory without the log containing a specific reference to this?

[1] https://koji.fedoraproject.org/koji/taskinfo?taskID=27319209
[2] https://koji.fedoraproject.org/koji/taskinfo?taskID=27460044
[3] https://koji.fedoraproject.org/koji/taskinfo?taskID=27501611
[4] https://koji.fedoraproject.org/koji/taskinfo?taskID=27533958

Comment 14 Nick Clifton 2018-06-12 14:24:34 UTC
(In reply to Julian Sikorski from comment #13)

Hi Julian,

> Is it possible that ld is failing due
> to insufficient memory without the log containing a specific reference to
> this?

It is possible, but I think that it is unlikely.

I would like to fix the multiple definition problem if possible, since that
seems like it is a real bug.  Is there a way to produce a reduced testcase
that reproduces the problem ?  (I am not a C++ expert, so I am hoping that
someone else will be able to reduce the problem down to a more manageable
size...)

Cheers
  Nick

Comment 15 Julian Sikorski 2018-06-16 19:10:22 UTC
Vas was kind enough to provide a test case:
https://belegdol.fedorapeople.org/mame/testcase.zip

You can test it with:
for i in *.cpp; do b=${i/cpp/o}; g++ -o $b -c $i; done; g++ -o testcase *.o

It works on x86_64 but fails on armv7hl - I tested using mock and qemu.

Comment 16 Nick Clifton 2018-06-18 12:12:16 UTC
Hi Julian,

  Thanks for the reproducer.  It turns out that you do not need to use
  mock or qemu, all that is needed is an ARM cross compiler.

  Anyway, I have reported the bug upstream with the FSF:

    https://sourceware.org/bugzilla/show_bug.cgi?id=23304

  With a bit of luck one the ARM maintainers will take an intertest and
  fix it.  Otherwise muggins here will have to have a go...

Cheers
  Nick

Comment 17 Nick Clifton 2018-06-19 11:44:37 UTC
Hi Julian,

  It turns out that the multiple symbol definition problem is an artefact
  of the default ARM API.  Specifically the default API (AAPCS) says:

     3.2.5.4 of the ARM C++ ABI says that class data only 
     has vague linkage if the class has no key function.

  Which translates into a requirement for only one typeinfo definition
  for a given template for the entire program.

  Other architectures have a more sane API, which allows for multiple
  definitions, one per compilation unit.

  If you use an alternative ARM API then you can get the behaviour you
  desire.  For example if you compile the testcase with the "-mabi=apcs=gnu"
  option then it will compile, assemble and link correctly.  Of course
  the program may not run correctly because the libraries involved have
  presumably all been compiled with the default API.

  Anyway, I think that this is as far as we can take this particular issue.
  It seems that MAME is just too big for the ARM, and the default ARM API 
  is too broken to support it.  Sorry. :-(

Cheers
  Nick

Comment 18 Julian Sikorski 2018-07-25 12:27:32 UTC
0.200 builds on armv7hl again, the offending code has been refactored.