Bug 1658940

Summary: Firefox fails to build on arm - /usr/bin/ld: final link failed: memory exhausted
Product: [Fedora] Fedora Reporter: Martin Stransky <stransky>
Component: firefoxAssignee: Peter Robinson <pbrobinson>
Status: ASSIGNED --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 29CC: 0xalen, anto.trande, gecko-bugs-nobody, jeremy.linton, jhorak, john.j5live, kengert, mattias.ellert, pbrobinson, pjasicek, rhughes, rstrode, sandmann, satellitgo, stransky, tstellar
Target Milestone: ---   
Target Release: ---   
Hardware: armv7hl   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 245418    
Attachments:
Description Flags
failed build log with cairo failure
none
Fedora 29 build failure none

Description Martin Stransky 2018-12-13 08:39:19 UTC
Description of problem:
https://kojipkgs.fedoraproject.org//work/tasks/122/31420122/build.log

422:20.46 toolkit/library/symverscript.stub
422:20.84 toolkit/library/libxul.so
447:08.94 /usr/bin/ld: final link failed: memory exhausted


Version-Release number of selected component (if applicable):
firefox-64.0

Comment 1 Peter Robinson 2019-01-08 11:28:50 UTC
Please at least add it to the ARMTracker so ARM people are aware of the issue :)

Comment 2 Peter Robinson 2019-01-10 04:28:16 UTC
Martin: I've been trying to fix/reproduce this. I'm currently seeing a build issue in the bundled cairo which looks like it's trying to use NEON again. The Fedora ARMv7 doesn't enable NEON by default, the libraries are expected to use runtime detection/fast paths rather than explicitly enable it. It looks like we need to explicitly pass -DHAVE_ARM_NEON=0 but a number of different ways I've tried (below) don't appear to work, can you provide some direction/assistance here?


@@ -435,6 +432,10 @@ echo "ac_add_options --with-system-libvpx" >> .mozconfig
 echo "ac_add_options --without-system-libvpx" >> .mozconfig
 %endif
 
+%ifarch {arm}
+echo "export HAVE_ARM_NEON=0" >> .mozconfig
+%endif
+
 %ifarch s390 s390x
 echo "ac_add_options --disable-ion" >> .mozconfig
 %endif
@@ -511,7 +512,12 @@ echo "ac_add_options --enable-linker=gold" >> .mozconfig
 export RUSTFLAGS="-Cdebuginfo=0"
 %endif
 export CFLAGS=$MOZ_OPT_FLAGS
+%ifarch %{arm}
+export CXXFLAGS="$MOZ_OPT_FLAGS -DHAVE_ARM_NEON=0"
+%endif
+%ifnarch %{arm}
 export CXXFLAGS=$MOZ_OPT_FLAGS
+%endif
 export LDFLAGS=$MOZ_LINK_FLAGS
 
 export PREFIX='%{_prefix}'

Comment 3 Martin Stransky 2019-01-10 09:11:36 UTC
Can you provide me a build log? I didn't see that on our koji builds, it failed on linker.

Comment 4 Peter Robinson 2019-01-10 09:47:52 UTC
(In reply to Martin Stransky from comment #3)
> Can you provide me a build log? I didn't see that on our koji builds, it
> failed on linker.

https://koji.fedoraproject.org/koji/taskinfo?taskID=31909938

Will attach the build.log too as the failed scratch builds get cleaned up quickly.

Comment 5 Peter Robinson 2019-01-10 09:49:27 UTC
Created attachment 1519714 [details]
failed build log with cairo failure

Comment 6 Martin Stransky 2019-01-10 10:08:29 UTC
I see, that's rawhide. I prefer to fix F29/28 first and then look at rawhide as it brings new failures.
Let's concentrate at the actual showstopper which is the memory exhaustion - we can't do any builds with that.

Comment 7 Peter Robinson 2019-01-10 10:19:04 UTC
(In reply to Martin Stransky from comment #6)
> I see, that's rawhide. I prefer to fix F29/28 first and then look at rawhide
> as it brings new failures.
> Let's concentrate at the actual showstopper which is the memory exhaustion -
> we can't do any builds with that.

Sure, but you hadn't previously mentioned that's what you preferred and I normally work rawhide and then roll backwards.

I'll take a look at f29 then, but either way the build is explicitly enabling NEON which is should not so we also need to deal with that separately too so my question above still remains and is relevant on all branches.

Comment 8 Peter Robinson 2019-01-11 00:00:52 UTC
Created attachment 1519959 [details]
Fedora 29 build failure

https://koji.fedoraproject.org/koji/taskinfo?taskID=31939286

Same Cairo failure as per rawhide

Comment 9 Martin Stransky 2019-01-11 08:44:52 UTC
I don't understand why the same package (Firefox 64) was compiled fine and it's broken now. I haven't done any arm related changes to the package so there's no reason for that unless build system was changed somehow. 

Can you please try to build the firefox-64.0-2.fc29 version (build task was https://koji.fedoraproject.org/koji/taskinfo?taskID=31420059) ? That's the package on what failed at memory exhaustion and led to this bug report.

If the firefox-64.0-2.fc29 fails to build because of cairo/neon now it may be something wrong with the build system.

Comment 10 Peter Robinson 2019-01-11 12:42:55 UTC
I'm submitted a scratch build as such:

koji build --scratch --arch-override=armv7hl f29 git+https://src.fedoraproject.org/rpms/firefox.git#3336f2b99462836caf87ad455525be1a20b05809
Created task: 31956796
Task info: https://koji.fedoraproject.org/koji/taskinfo?taskID=31956796

based on the above build, there's no doubt a bunch of things that could have changed in the last month (the gcc changes/improves for one) so any number of things could have changed but at this point it's not even getting to the linking stage.

Comment 11 Martin Stransky 2019-01-11 15:19:36 UTC
I see your build is still running so I hope the cairo failure you see it's a regression introduced to 64.0-7 package - it will easy to find it then.

Comment 12 Peter Robinson 2019-01-12 00:06:16 UTC
So it appears to be a regression introduced since -2

435:10.72 In file included from /builddir/build/BUILD/firefox-64.0/objdir/dom/canvas/Unified_cpp_dom_canvas5.cpp:101:
435:10.72 /builddir/build/BUILD/firefox-64.0/dom/canvas/WebGLUniformLocation.cpp: In member function 'JS::Value mozilla::WebGLUniformLocation::GetUniform(JSContext*) const':
435:10.72 /builddir/build/BUILD/firefox-64.0/dom/canvas/WebGLUniformLocation.cpp:177:32: note: parameter passing for argument of type 'JS::MutableHandle<JS::Value>' changed in GCC 7.1
435:10.72              if (!dom::ToJSValue(js, boolBuffer, elemSize, &val)) {
435:10.72                   ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
435:16.00 toolkit/library/symverscript.stub
435:16.44 toolkit/library/libxul.so
461:37.52 /usr/bin/ld: final link failed: memory exhausted
461:37.52 collect2: error: ld returned 1 exit status
461:37.52 gmake[4]: *** [/builddir/build/BUILD/firefox-64.0/config/rules.mk:712: libxul.so] Error 1
461:37.52 gmake[3]: *** [/builddir/build/BUILD/firefox-64.0/config/recurse.mk:74: toolkit/library/target] Error 2
461:37.52 gmake[2]: *** [/builddir/build/BUILD/firefox-64.0/config/recurse.mk:34: compile] Error 2
461:37.53 gmake[1]: *** [/builddir/build/BUILD/firefox-64.0/config/rules.mk:431: default] Error 2
461:37.53 gmake: *** [client.mk:125: build] Error 2
461:37.57 287 compiler warnings present.

Comment 13 Martin Stransky 2019-01-12 08:59:49 UTC
Great! I'll fix that regression if you manage to address the memory issue.

Comment 14 Jeremy Linton 2019-01-29 17:10:51 UTC
Tried to reproduce this, but in the current rawhide rustc is crashing fairly early.

see bz:1670502

Comment 15 Martin Stransky 2019-02-06 12:58:05 UTC
Rawhide is recently broken due to gcc9 update.

Comment 16 Jeremy Linton 2019-02-08 01:26:53 UTC
Ok, so its possible to get past the memory exaustion by disabling debuginfo and optimization. ( -C opt-level=0 -C debuginfo=0) this is apparently a problem on all 32-bit arch's at this point.

https://github.com/rust-lang/rust/issues/45854

Comment 17 Martin Stransky 2019-02-08 07:51:22 UTC
(In reply to Jeremy Linton from comment #16)
> Ok, so its possible to get past the memory exaustion by disabling debuginfo
> and optimization. ( -C opt-level=0 -C debuginfo=0) this is apparently a
> problem on all 32-bit arch's at this point.
> 
> https://github.com/rust-lang/rust/issues/45854

AFAIK the problem here is linking of libxul.so and it's not related to rust (at least not directly). Rust build failure was Bug 1523912.

Comment 18 Jeremy Linton 2019-02-19 22:53:18 UTC
Spent some more time trying to get a clean build. Couple comments: the memory exaustion in libxul can be avoided if all the files in that link pass are striped with '-xd' which strips all local and debug symbols. This happens with both the gold and normal BFD linker. It doesn't help that the rust library is a GB by itself before stripping.

I also continue having issues with the rust pass as well, and need the opt-level and debuginfo reset, as well as a the LinuxSignal.h patch updated, as well as a few other tweaks. If/when I get a clean build from the .spec file I will post the delta. At the moment i've got some ugly hacks to workaround problems with the firefox profiler and the fact that apparently armv8 doesn't trigger {arm} stanzas in the .spec files.

Comment 19 Jeremy Linton 2019-03-04 22:16:57 UTC
Well, at the moment i'm at a loss why its failing in koji, i'm going to spin up a different enviroment and see if I can duplicate it.

Anyway, right now its still failing with:

662:45.59 /usr/bin/ld.gold: fatal error: libxul.so: mmap: failed to allocate 562853416 bytes for output file: Cannot allocate memory

which should _NOT_ be happening given that i've got `-Wl,--no-mmap-output-file` in MOZ_LINK_FLAGS. In fact the whole thing is odd since I'm sitting at just about 2G of address space utilization when I build locally. It almost looks like my flags arent being propogated through to the autogenerated build files.

Anyway, the current build tweaks are roughly:

MOZ_LINK_FLAGS="-Wl,--no-keep-memory -Wl,--no-keep-files-mapped -Wl,--no-map-whole-files -Wl,--no-mmap-output-file"


MOZ_RUST_DEFAULT_FLAGS="-Cdebuginfo=0 -Copt-level=0"


I have a sed to turn off NEON support for ycbcr, otherwise it seems gas has problems with jumps that are to far away.

sed -i -e "s/MOZILLA_MAY_SUPPORT_NEON/MOZILLA_MAY_SUPPORT_NEONXXXX/g" gfx/ycbcr/*


I've also replaced patch415 with:


+++ firefox-66.0/mfbt/LinuxSignal.h     2019-02-19 22:32:03.127639819 +0000
@@ -22,7 +22,7 @@ __attribute__((naked)) void SignalTrampo
                                              void* aContext) {
   asm volatile("nop; nop; nop; nop" : : : "memory");
 
-  asm volatile("b %0" : : "X"(H) : "memory");
+  H(aSignal, aInfo, aContext);
 }

Comment 20 Peter Robinson 2019-04-29 17:58:53 UTC
For reference a new gcc bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90273

Comment 21 Peter Robinson 2019-05-19 17:30:11 UTC
*** Bug 1641623 has been marked as a duplicate of this bug. ***