Bug 1584711
Summary: | ld runs out of memory when linking mame-0.198-1 | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Julian Sikorski <belegdol> |
Component: | mame | Assignee: | Julian Sikorski <belegdol> |
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | rawhide | CC: | belegdol, nickc, pbrobinson |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | 0.200-1.fc29 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-06-19 11:44:37 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 485251 |
Description
Julian Sikorski
2018-05-31 13:29:13 UTC
The ARMv7 builders have 24Gb of RAM, but I suspect this is an issue with single process which on 32 bit would be ~ 3Gb, but makes me wonder why x86 32 bit is fine an what differs between the two, and why this has regressed in F-28 Hi, it didn't regress in F28, it regressed in mame-0.198. 0.197 builds fine on armv7hl on all branches: https://koji.fedoraproject.org/koji/packageinfo?packageID=22597 > it didn't regress in F28, it regressed in mame-0.198. 0.197 builds fine on
> armv7hl on all branches:
Has it been reported upstream?
It has now: https://github.com/mamedev/mame/issues/3639 FYI, adding -Wl,--no-keep-memory -Wl,--reduce-memory-overheads to LDFLAGS did not help: koji.fedoraproject.org/koji/taskinfo?taskID=27460043 I am being told the warnings of type: ../../../../../scripts/mame_mame/liboptional.a(coco_gmc.o):(.rodata+0x6c): multiple definition of `typeinfo name for device_finder<device_cococart_interface, false>' ../../../../../scripts/mame_mame/liboptional.a(coco_dcmodem.o):(.rodata+0x6c): first defined here could be a bug in binutils - Nick, please may you advise? Could this be related to OOM issue? Please note that on f26 and Please note that on f26 and f27 ld actually crashes and the following is written into the log: /usr/bin/ld: BFD version 2.29.1-23.fc28 assertion fail elf32-arm.c:4812 Hi Juliian,
> I am being told the warnings of type:
> ../../../../../scripts/mame_mame/liboptional.a(coco_gmc.o):(.rodata+0x6c):
> multiple definition of `typeinfo name for
> device_finder<device_cococart_interface, false>'
> ../../../../../scripts/mame_mame/liboptional.a(coco_dcmodem.o):(.
> rodata+0x6c): first defined here
> could be a bug in binutils - Nick, please may you advise? Could this be
> related to OOM issue?
In theory no, but in practice I bet that it is. Given that mame builds
just fine on other architectures, I would have to assume that the OOM
problem is the culprit.
A couple of suggestions:
* Can you compile with -Os instead of -O2 ?
* Does linking with the gold linker (-fuse-ld=gold) work ?
Cheers
Nick
Hi Nick, thanks for the pointers! I had to remove -Wl,reduce-memory-overheads when using ld.gold. Unfortunately the issue of multiple definitions and OOM persist: https://koji.fedoraproject.org/koji/taskinfo?taskID=27501609 With 0s the OOM no longer occurs but linking still fails due to multiple declaration issue: https://koji.fedoraproject.org/koji/taskinfo?taskID=27533958 Actually I realised that the build linked in comment 5 also only fails due to multiple declaration and not due to OOM. According to upstream the code is correct, which is also supported by the fact that it links fine on other architectures. There is a following comment in github issue 3605: You need a newer linker. It's legal to have the same explicit template instantiation in different compilation units - it's only illegal to duplicate it in the same compilation unit. This became an issue with C++11, so a linker not updated to handle C++11 may not handle this correctly. Hi Julian, Hmmm - but you are linking with the latest official release from the FSF, so the multiple definition problems are not due to using an out of date release. (Unless that comment from github was referring to a fix being in the current linker development sources, rather than in a released linker). Does the multiple definition failure happen for other architectures if you link with the gold linker and compile with -Os ? (I am wondering if this is a generic problem or ARM specific. I assume that it will be generic, but it is good to be sure). Did compiling with -Os and then linking with ld.bfd work ? I am not sure what else to suggest. It seems to me that mame might just be too big to link on the ARM. :-( Cheers Nick Hi Nick, I am not sure what the comment from github was referring to, but I believe Vas was pointing out that the code is correct and linker is at fault. The multiple definition failure happens on %arm only it seems. So far I have tried the following (all approaches already use -g1): - -g1 only: OOM and multiple definition [1] - -Wl,--no-keep-memory -Wl,--reduce-memory-overheads and -g1: no OOM, failure due to multiple definition [2] - -Wl,--no-keep-memory -fuse-ld=gold: OOM and multiple definition [3] - -Os and -Wl,--no-keep-memory -fuse-ld=gold: no OOM, failure due to multiple definition [4] Summing up, in two out of 4 listed cases (and the issue reported in github issue 3605), the linking failure seems to be due to multiple definition alone without specific mention of OOM. Is it possible that ld is failing due to insufficient memory without the log containing a specific reference to this? [1] https://koji.fedoraproject.org/koji/taskinfo?taskID=27319209 [2] https://koji.fedoraproject.org/koji/taskinfo?taskID=27460044 [3] https://koji.fedoraproject.org/koji/taskinfo?taskID=27501611 [4] https://koji.fedoraproject.org/koji/taskinfo?taskID=27533958 (In reply to Julian Sikorski from comment #13) Hi Julian, > Is it possible that ld is failing due > to insufficient memory without the log containing a specific reference to > this? It is possible, but I think that it is unlikely. I would like to fix the multiple definition problem if possible, since that seems like it is a real bug. Is there a way to produce a reduced testcase that reproduces the problem ? (I am not a C++ expert, so I am hoping that someone else will be able to reduce the problem down to a more manageable size...) Cheers Nick Vas was kind enough to provide a test case: https://belegdol.fedorapeople.org/mame/testcase.zip You can test it with: for i in *.cpp; do b=${i/cpp/o}; g++ -o $b -c $i; done; g++ -o testcase *.o It works on x86_64 but fails on armv7hl - I tested using mock and qemu. Hi Julian, Thanks for the reproducer. It turns out that you do not need to use mock or qemu, all that is needed is an ARM cross compiler. Anyway, I have reported the bug upstream with the FSF: https://sourceware.org/bugzilla/show_bug.cgi?id=23304 With a bit of luck one the ARM maintainers will take an intertest and fix it. Otherwise muggins here will have to have a go... Cheers Nick Hi Julian, It turns out that the multiple symbol definition problem is an artefact of the default ARM API. Specifically the default API (AAPCS) says: 3.2.5.4 of the ARM C++ ABI says that class data only has vague linkage if the class has no key function. Which translates into a requirement for only one typeinfo definition for a given template for the entire program. Other architectures have a more sane API, which allows for multiple definitions, one per compilation unit. If you use an alternative ARM API then you can get the behaviour you desire. For example if you compile the testcase with the "-mabi=apcs=gnu" option then it will compile, assemble and link correctly. Of course the program may not run correctly because the libraries involved have presumably all been compiled with the default API. Anyway, I think that this is as far as we can take this particular issue. It seems that MAME is just too big for the ARM, and the default ARM API is too broken to support it. Sorry. :-( Cheers Nick 0.200 builds on armv7hl again, the offending code has been refactored. |