Bug 1666335 - Firefox fails to start
Summary: Firefox fails to start
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: firefox
Version: 29
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Martin Stransky
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-01-15 14:40 UTC by David
Modified: 2019-11-27 23:04 UTC (History)
12 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2019-11-27 23:04:52 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Backtrace (4.97 KB, text/plain)
2019-01-15 15:35 UTC, David
no flags Details
Backtrace full (5.29 KB, text/plain)
2019-01-15 15:36 UTC, David
no flags Details
Strace output (145.64 KB, application/gzip)
2019-01-16 11:17 UTC, David
no flags Details

Description David 2019-01-15 14:40:55 UTC
Description of problem:

After updating Firefox, it doesn't start anymore. It immediately triggers the Crash Reporter.


Version-Release number of selected component (if applicable): 64.0.2-1


How reproducible:

Always.


This is the messages I get on the terminal:

$ firefox --safe-mode                                                                                                                                                                                                                                                                          
ExceptionHandler::GenerateDump cloned child 5926                                                                                                                                                                                                                                                                             
ExceptionHandler::SendContinueSignalToChild sent continue signal to child 
ExceptionHandler::WaitForContinueSignal waiting for continue signal... 
                                                                                                                                                                                                                                                      


Additional info:

Adding --safe-mode doesn't change anything.

I removed ~/.mozilla and .cache/mozilla, but the problem persists.

Comment 1 Martin Stransky 2019-01-15 15:04:25 UTC
Can you please try to get a backtrace by gdb? How-to is here:
https://fedoraproject.org/wiki/Debugging_guidelines_for_Mozilla_products#Application_crash

Thanks.

Comment 2 David 2019-01-15 15:35:47 UTC
Created attachment 1520791 [details]
Backtrace

Here is the output of bt from gdb.

Unfortunately, bt full segfaults in step 19, despite having set: ulimit -S -s unlimited.

Comment 3 David 2019-01-15 15:36:14 UTC
Created attachment 1520792 [details]
Backtrace full

Comment 4 Martin Stransky 2019-01-16 09:21:59 UTC
Hm, I don't see any crash info at the backtrace. Do you have a crash ID from Mozilla Crash Reporter? It can be found at about:crashes web page.

Comment 6 Martin Stransky 2019-01-16 11:09:26 UTC
Thanks, that comes from https://dxr.mozilla.org/mozilla-release/source/js/xpconnect/src/XPCJSContext.cpp#155
Can you try to run firefox under strace to find out why PR_CreateThread() failed?

Something like:

strace -ff -o output.txt firefox

there may be more output files as Firefox uses many threads and strace needs to follow all of them.
You should find a relevant info about failed thread creation in someone.

Comment 7 David 2019-01-16 11:17:02 UTC
Created attachment 1520965 [details]
Strace output

I am not sure I am doing it correctly. Running the command fails very quickly, but attached are the files produced.

🦑 strace -ff -o output.txt firefox
ExceptionHandler::GenerateDump cloned child ExceptionHandler::WaitForContinueSignal waiting for continue signal...
19684
ExceptionHandler::SendContinueSignalToChild sent continue signal to child
2019-01-16 12:12:33: minidump.cc:5094: ERROR: Minidump could not open minidump /home/david/.mozilla/firefox/38fnfq0k.default/minidumps/52f77df7-0bd0-8f67-d733-b953a305986e.dmp, error 2: No such file or directory
2019-01-16 12:12:33: minidump.cc:5191: ERROR: Minidump cannot open minidump

Comment 8 Martin Stransky 2019-01-16 14:05:36 UTC
Hm, frankly I have no idea what's going on here, this seems to be relevant but it looks odd:

madvise(0x7ffda755c000, 8372224, MADV_NOHUGEPAGE) = -1 ENOMEM (Cannot allocate memory)

do you see any relevant line in system log (journalctl -b) for instance?

Comment 9 Martin Stransky 2019-01-16 14:10:43 UTC
btw. which latest Firefox version does work for you?

Comment 10 David 2019-01-16 15:05:45 UTC
There is nothing being written on journalctl -b or dmesg.

dnf downgrade firefox takes me to 62.0.3-1.fc29, which works.

Comment 11 David 2019-01-16 15:22:47 UTC
I have downloaded the new build from koji, firefox-64.0.2-2, and the problem persists.

I think I was using 64.0-7 before, but I cannot find how to download it from Koji to confirm.

Comment 12 Martin Stransky 2019-01-16 19:53:15 UTC
You can find a particular package at https://koji.fedoraproject.org/koji/packageinfo?packageID=37

64.0-7 for F29 is here - https://koji.fedoraproject.org/koji/buildinfo?buildID=1177702

Comment 13 David 2019-01-17 10:20:00 UTC
Both 64.0-7 and 63.0.3-3 fail.

I don't know if they were working before, but I am sure I upgraded it at least once with it working.

Comment 14 Martin Stransky 2019-01-17 10:25:16 UTC
Please try 64.0-4 build [1], it has disabled PGO+LTO optimizations and it was the first 64.0 release for F29. Thanks.

[1] https://koji.fedoraproject.org/koji/buildinfo?buildID=1171746

Comment 15 David 2019-01-17 13:35:28 UTC
Still fails. Here is the crash report: https://crash-stats.mozilla.com/report/index/e7618d2f-7b59-4664-b401-7da9d0190117

Comment 16 Martin Stransky 2019-01-21 08:35:47 UTC
Can you please try an upstream binary from Mozilla? Download the package here:

https://www.mozilla.org/en-US/firefox/download/thanks/

untar it and run as ./firefox on command line. Thanks.

Comment 17 David 2019-01-22 12:56:32 UTC
It still crashes, here is the report:

https://crash-stats.mozilla.com/report/index/619288c9-24b4-43cb-845e-dcd580190122

Should I try other versions from the upstream binary?

Comment 18 David 2019-01-22 15:45:40 UTC
For the record, the previous reports were from my workstation, but I am getting the same on my laptop.

https://crash-stats.mozilla.com/report/index/5626f584-3335-4f08-8016-a152e0190122

Both run the same version of Fedora. I'd suspect some addon, but starting in safe mode should disable them all, right?

Comment 19 Martin Stransky 2019-01-23 13:40:52 UTC
(In reply to David from comment #18)
> For the record, the previous reports were from my workstation, but I am
> getting the same on my laptop.
> 
> https://crash-stats.mozilla.com/report/index/5626f584-3335-4f08-8016-
> a152e0190122
> 
> Both run the same version of Fedora. I'd suspect some addon, but starting in
> safe mode should disable them all, right?

Yes, -safe-mode is without addons. Also Firefox has already disabled binary
addons so those should not affect it. And if you removed ~/.mozilla you have
a clean environment already.

I suspect there's something wrong with the system as PR_CreateThread() should not fail
unless there isn't memory/resource left.

But if 62.0.3 works and 64 fails...there should not be so radical changes between 62/64.

For instance may you have some memory restriction set? 
https://stackoverflow.com/questions/40002586/why-cant-firefox-cope-with-a-memory-limit-set-by-ulimit

Comment 20 David 2019-02-11 14:04:03 UTC
I have been trying to change limits, still crashing. Here is the ulimit status:

$ ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 62804
max locked memory       (kbytes, -l) 16384
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 62804
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

I have also tested Plasma KDE, but that doesn't fix it either. Both my machines have plenty of memory to spare, so that shouldn't be a problem. Any more ideas?

Comment 21 David 2019-02-11 14:50:01 UTC
More information: I created a new user on the same machine, and that can run Firefox 65. I have double-checked, removing ~/.cache/mozilla  and ~/.mozilla does not fix the problem, so there must be something else in my user. $LD_LIBRARY_PATH is empty, and $CPATH points to :/home/david/.local/include

Is there any other user setting that could affect Firefox?

Comment 22 David 2019-02-14 20:21:54 UTC
I found the cause! TLDR: I have set LD_PRELOAD=/usr/lib64/libopenblaso.so, removing that, it works.

Here is the long version: since the plain binaries from Mozilla also had the same problem, points to a system issue. And since it happened in two of my machines, but not in anyone else's, there had to be something in my configuration. So I decided to poke around with strace to look at what was being loaded. I found it was opening openblas, which reminded me I am preloading the OpenMP version, to automatically parallelise linear algebra in Python. That would explain it! Most people are unlikely to care about linear algebra performance. Indeed, removing that, made Firefox work again.

The question now is why is Firefox failing in the newer versions, but not before, and why is it loading and calling openblas. The rpm does not depend on any of the openblas packages (dnf remove openbl* does not attempt to remove Firefox).

Comment 23 Martin Stransky 2019-02-18 13:29:19 UTC
That's interesting. I don't see any reference to libopenblaso in Firefox codebase so I expect it's loaded from some dependent library. LD_DEBUG may tell you more how that interacts with Firefox. (run "LD_DEBUG=all LD_DEBUG_OUTPUT=out.txt firefox" but be prepared that the out.txt is growing quickly)

Comment 24 David 2019-02-18 16:02:19 UTC
This is enough to trigger it:


LD_PRELOAD=/usr/lib64/libopenblaso.so LD_DEBUG=all LD_DEBUG_OUTPUT=out.txt firefox --new-instance

There is indeed a lot of output, but I cannot see any clear signs of what is wrong. First, everything in LD_PRELOAD and dependencies (libgcc, libgomp...) is loaded, followed by a lot of symbol lookup. madvise showed up earlier, but that seems clean:

     20144:     symbol=madvise;  lookup in file=/usr/lib64/firefox/firefox [0]
     20144:     symbol=madvise;  lookup in file=/usr/lib64/libopenblaso.so [0]
     20144:     symbol=madvise;  lookup in file=/usr/lib64/libpthread.so.0 [0]
     20144:     symbol=madvise;  lookup in file=/usr/lib64/libdl.so.2 [0]
     20144:     symbol=madvise;  lookup in file=/usr/lib64/libstdc++.so.6 [0]
     20144:     symbol=madvise;  lookup in file=/usr/lib64/libm.so.6 [0]
     20144:     symbol=madvise;  lookup in file=/usr/lib64/libgcc_s.so.1 [0]
     20144:     symbol=madvise;  lookup in file=/usr/lib64/libc.so.6 [0]
     20144:     binding file /usr/lib64/firefox/firefox [0] to /usr/lib64/libc.so.6 [0]: normal symbol `madvise' [GLIBC_2.2.5]

grepping for symbols that may come from openblas, I get malloc:

binding file /usr/lib64/libopenblaso.so [0] to /usr/lib64/firefox/firefox [0]: normal symbol `malloc' [GLIBC_2.2.5]
binding file /usr/lib64/libopenblaso.so [0] to /usr/lib64/firefox/minidump-analyzer [0]: normal symbol `malloc' [GLIBC_2.2.5]
binding file /usr/lib64/libopenblaso.so [0] to /usr/lib64/libc.so.6 [0]: normal symbol `malloc' [GLIBC_2.2.5]

Any pointers on where or how to dig?

Comment 25 Martin Stransky 2019-02-19 11:03:56 UTC
Hm, sorry, no idea.

Comment 26 Ben Cotton 2019-10-31 19:01:46 UTC
This message is a reminder that Fedora 29 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 29 on 2019-11-26.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '29'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 29 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 27 Ben Cotton 2019-11-27 23:04:52 UTC
Fedora 29 changed to end-of-life (EOL) status on 2019-11-26. Fedora 29 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.