Created attachment 797052 [details]
firefox 23.0.1 with add-on from https://eff.org/https-everywhere installed
Start FF, load any https page, quit.
FF always crashes (on exit) if any https has been loaded during the session.
This is a Fedora specific bug (or a 64 bit specific bug).
I tested using the Firefox Linux binary (32bit) made available by Mozilla, using the same Firefox profile, and didn't crash.
(Reproduced using a clean Firefox profile, so it's not dependent on special settings.)
I'll attach a stack trace.
Since the stack shows NSS, I'm cc'ing Elio and Bob.
Since the stack shows jemalloc, I'm cc'ing Stef, because we had to deal with a related bug a couple of months ago.
It seems this is specific to my non-standard environment variable
which enables a newer NSS storage mode.
Without that (and default cert8/dbm mode), no crash.
Note that upstream binary + cert9/dbm doesn't crash.
Maybe something specific to sqlite or memory allocation in Fedora 19?
(In reply to Kai Engert (:kaie) from comment #0)
> Since the stack shows jemalloc, I'm cc'ing Stef, because we had to deal with
> a related bug a couple of months ago.
The previous bug was about an broken/incompatible strndup() symbol exported by firefox.
It seems like in this case the memory has been allocated using an allocator callback (explicitly passed into sqlite3) and is now being deallocated similarly. So not sure this is related to exported symbols.
Does jemalloc support valgrind? If so, that would be one way to get more information here.
I tried to start firefox using
firefox -g -d valgrind
but that doesn't work, it complains about some mode being used that's incompatible with valgridng.
I don't know how to run firefox under valgrind.
Try: valgrind --trace-children=yes /usr/bin/firefox but I'm not sure how useful is that output. And it's extremely slow.
(In reply to Martin Stransky from comment #4)
> Try: valgrind --trace-children=yes /usr/bin/firefox
Doesn't work. Firefox doesn't come up.
==6785== Unsupported clone() flags: 0x800600
==6785== The only supported clone() uses are:
==6785== - via a threads library (LinuxThreads or NPTL)
==6785== - via the implementation of fork or vfork
==6785== Valgrind detected that your program requires
==6785== the following unimplemented functionality:
==6785== Valgrind does not support general clone().
==6785== This may be because the functionality is hard to implement,
==6785== or because no reasonable program would behave this way,
==6785== or because nobody has yet needed it. In any case, let us know at
==6785== www.valgrind.org and/or try to work around the problem, if you can.
==6785== Valgrind has to exit now. Sorry. Bye!
Strange. I'm sure mozilla has a valgrind test config...I saw the bug somewhere. Plus I can run Firefox inside valgrind with the command line I provided.
Hm, it's a bit tricky, I see the "Unsupported clone()" too, but Firefox comes up in safe-mode.
With Fedora's Firefox 25, https-everywhere and shared db, cookies don't work at all!
Bug is driving me crazy.
Maybe the problem is jemalloc. I see you disabled it on s390, because it doesn't work at all on s390.
I'll try a scratch build on x86_64 that disables jemalloc.
I don't crash using a build with jemalloc disabled!
Steps to reproduce:
Use Fedora 19.
I am able to reproduce this bug with both Firefox 23 and 25 (fedora RPM builds).
Open a terminal
Set this environment variable which uses a modern NSS database:
Create a separate Firefox profile
firefox -CreateProfile testdb
Start Firefox using the environment variable set,
and the profile from the terminal:
firefox -P testdb
Click "install in firefox".
Allow to use the observatory.
Start Firefox again
firefox -P testdb
Wait a few seconds.
I wonder if memory is destroyed using jemalloc, which did not get allocated using jemalloc.
I identified the cause of this problem. It's indeed a mix of memory allocation.
The following happens when using the NSS shared database code, which uses sqlite:
- Mozilla inits NSS
- NSS calls a sqlite API (sqlite3_mprintf)
- sqlite detects the need for init, and initializes itself using defaults
- Mozilla proceeds to init a storage service,
which registers a different allocator to be used by sqlite
This results in a crash on shutdown.
I'll file an upstream bug. I've had the change to do some initial discussion today, where one idea was to ensure that Mozilla initializes sqlite prior to initializing NSS. However, I don't like that idea, it seems fragile, and who knows, another Mozilla engineer might change that code in the future, and we'll face the same difficult to diagnose problem again.
I think Fedora should patch Mozilla and enforce the use of the default allocators for sqlite. A patch to the Makefiles is sufficient, it depends on the symbol MOZ_STORAGE_MEMORY being defined or not (storage/src/Makefile.in).
Mozilla already disables that when building on Android, probably because that's a modular environment, too. I'll propose to upstream to disable it by default on Linux, too.
The bug has been fixed upstream for Mozilla 28.
Could we pick it up in Fedora builds of xulrunner/firefox/thunderbird, until we use Mozilla28?
Added to xulrunner-25.0-5 packages.
Martin, adding it to xulrunner isn't sufficient. I ran into the bug again after upgrading to mozilla 26.
After I rebuild the firefox package with the patch included, too, it's fixed.
I see, okay, added to Firefox too. Packages firefox-26.0-4.