Bug 1007603 - NSS and cert9 (sql): firefox crash on exit with https-everywhere installed
Summary: NSS and cert9 (sql): firefox crash on exit with https-everywhere installed
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: firefox
Version: 19
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Martin Stransky
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-09-12 22:23 UTC by Kai Engert (:kaie) (inactive account)
Modified: 2013-12-17 12:56 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-12-17 12:56:52 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
stack (4.53 KB, text/plain)
2013-09-12 22:23 UTC, Kai Engert (:kaie) (inactive account)
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Mozilla Foundation 938730 0 None None None Never

Description Kai Engert (:kaie) (inactive account) 2013-09-12 22:23:19 UTC
Created attachment 797052 [details]
stack

Fedora 19
firefox 23.0.1 with add-on from https://eff.org/https-everywhere installed

Start FF, load any https page, quit.
FF always crashes (on exit) if any https has been loaded during the session.

This is a Fedora specific bug (or a 64 bit specific bug).

I tested using the Firefox Linux binary (32bit) made available by Mozilla, using the same Firefox profile, and didn't crash.

(Reproduced using a clean Firefox profile, so it's not dependent on special settings.)

I'll attach a stack trace.

Since the stack shows NSS, I'm cc'ing Elio and Bob.

Since the stack shows jemalloc, I'm cc'ing Stef, because we had to deal with a related bug a couple of months ago.

Comment 1 Kai Engert (:kaie) (inactive account) 2013-09-12 22:25:23 UTC
It seems this is specific to my non-standard environment variable
  NSS_DEFAULT_DB_TYPE="sql"

which enables a newer NSS storage mode.

Without that (and default cert8/dbm mode), no crash.

Note that upstream binary + cert9/dbm doesn't crash.

Maybe something specific to sqlite or memory allocation in Fedora 19?

Comment 2 Stef Walter 2013-09-13 06:05:29 UTC
(In reply to Kai Engert (:kaie) from comment #0)
> Since the stack shows jemalloc, I'm cc'ing Stef, because we had to deal with
> a related bug a couple of months ago.

The previous bug was about an broken/incompatible strndup() symbol exported by firefox.

It seems like in this case the memory has been allocated using an allocator callback (explicitly passed into sqlite3) and is now being deallocated similarly. So not sure this is related to exported symbols.

Does jemalloc support valgrind? If so, that would be one way to get more information here.

Comment 3 Kai Engert (:kaie) (inactive account) 2013-09-13 12:36:16 UTC
I tried to start firefox using
  firefox -g -d valgrind
but that doesn't work, it complains about some mode being used that's incompatible with valgridng.

I don't know how to run firefox under valgrind.

Comment 4 Martin Stransky 2013-10-11 20:30:42 UTC
Try: valgrind --trace-children=yes /usr/bin/firefox but I'm not sure how useful is that output. And it's extremely slow.

Comment 5 Kai Engert (:kaie) (inactive account) 2013-10-17 14:18:55 UTC
(In reply to Martin Stransky from comment #4)
> Try: valgrind --trace-children=yes /usr/bin/firefox

Doesn't work. Firefox doesn't come up.

==6785== Unsupported clone() flags: 0x800600
==6785== 
==6785== The only supported clone() uses are:
==6785==  - via a threads library (LinuxThreads or NPTL)
==6785==  - via the implementation of fork or vfork
==6785== 
==6785== Valgrind detected that your program requires
==6785== the following unimplemented functionality:
==6785==    Valgrind does not support general clone().
==6785== This may be because the functionality is hard to implement,
==6785== or because no reasonable program would behave this way,
==6785== or because nobody has yet needed it.  In any case, let us know at
==6785== www.valgrind.org and/or try to work around the problem, if you can.
==6785== 
==6785== Valgrind has to exit now.  Sorry.  Bye!

Comment 6 Martin Stransky 2013-10-17 14:20:34 UTC
Strange. I'm sure mozilla has a valgrind test config...I saw the bug somewhere. Plus I can run Firefox inside valgrind with the command line I provided.

Comment 7 Martin Stransky 2013-10-17 14:26:46 UTC
Hm, it's a bit tricky, I see the "Unsupported clone()" too, but Firefox comes up in safe-mode.

Comment 8 Kai Engert (:kaie) (inactive account) 2013-11-06 12:07:27 UTC
Another issue:
With Fedora's Firefox 25, https-everywhere and shared db, cookies don't work at all!

Comment 9 Kai Engert (:kaie) (inactive account) 2013-11-13 21:20:50 UTC
Bug is driving me crazy.

Maybe the problem is jemalloc. I see you disabled it on s390, because it doesn't work at all on s390.

I'll try a scratch build on x86_64 that disables jemalloc.

Comment 10 Kai Engert (:kaie) (inactive account) 2013-11-14 11:19:16 UTC
I don't crash using a build with jemalloc disabled!

Comment 11 Kai Engert (:kaie) (inactive account) 2013-11-14 11:20:01 UTC
Steps to reproduce:


Use Fedora 19.
I am able to reproduce this bug with both Firefox 23 and 25 (fedora RPM builds).

Open a terminal
Set this environment variable which uses a modern NSS database:
  export NSS_DEFAULT_DB_TYPE="sql"

Create a separate Firefox profile
  firefox -CreateProfile testdb

Start Firefox using the environment variable set, 
and the profile from the terminal:
  firefox -P testdb

Open
  https://www.eff.org/https-everywhere

Click "install in firefox".
Restart.
Allow to use the observatory.
Quit

Start Firefox again
  firefox -P testdb

Wait a few seconds.
Press CTRL-Q
crash

Comment 12 Kai Engert (:kaie) (inactive account) 2013-11-14 14:17:05 UTC
I wonder if memory is destroyed using jemalloc, which did not get allocated using jemalloc.

Comment 13 Kai Engert (:kaie) (inactive account) 2013-11-14 19:26:55 UTC
I identified the cause of this problem. It's indeed a mix of memory allocation.

The following happens when using the NSS shared database code, which uses sqlite:

- Mozilla inits NSS
- NSS calls a sqlite API (sqlite3_mprintf)
- sqlite detects the need for init, and initializes itself using defaults
- Mozilla proceeds to init a storage service,
  which registers a different allocator to be used by sqlite

This results in a crash on shutdown.

(Although I cannot explain why, this also causes cookies to be broken for me.)

I'll file an upstream bug. I've had the change to do some initial discussion today, where one idea was to ensure that Mozilla initializes sqlite prior to initializing NSS. However, I don't like that idea, it seems fragile, and who knows, another Mozilla engineer might change that code in the future, and we'll face the same difficult to diagnose problem again.

I think Fedora should patch Mozilla and enforce the use of the default allocators for sqlite. A patch to the Makefiles is sufficient, it depends on the symbol MOZ_STORAGE_MEMORY being defined or not (storage/src/Makefile.in).

Mozilla already disables that when building on Android, probably because that's a modular environment, too. I'll propose to upstream to disable it by default on Linux, too.

Comment 14 Kai Engert (:kaie) (inactive account) 2013-11-20 19:31:13 UTC
The bug has been fixed upstream for Mozilla 28.

Could we pick it up in Fedora builds of xulrunner/firefox/thunderbird, until we use Mozilla28?

Comment 15 Kai Engert (:kaie) (inactive account) 2013-11-20 19:33:32 UTC
Patch: https://hg.mozilla.org/integration/mozilla-inbound/rev/247ff2131af5

Comment 16 Martin Stransky 2013-11-21 11:58:06 UTC
Added to xulrunner-25.0-5 packages.

Comment 17 Kai Engert (:kaie) (inactive account) 2013-12-17 11:36:10 UTC
Martin, adding it to xulrunner isn't sufficient. I ran into the bug again after upgrading to mozilla 26.

After I rebuild the firefox package with the patch included, too, it's fixed.

Comment 18 Martin Stransky 2013-12-17 12:56:52 UTC
I see, okay, added to Firefox too. Packages firefox-26.0-4.


Note You need to log in before you can comment on or make changes to this bug.