Bug 1524507 - libproxy-mozjs causes gnome-weather to crash
Summary: libproxy-mozjs causes gnome-weather to crash
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: libproxy
Version: 30
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: David King
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-12-11 15:53 UTC by nvwarr
Modified: 2019-07-14 16:15 UTC (History)
6 users (show)

Fixed In Version: libproxy-0.4.15-10.fc29
Clone Of:
Environment:
Last Closed: 2019-02-20 03:04:58 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Trace generated by gnome-weather (9.71 KB, text/plain)
2017-12-11 15:53 UTC, nvwarr
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github libproxy libproxy issues 81 0 None None None 2018-04-16 19:37:17 UTC
Github libproxy libproxy issues 82 0 None None None 2018-04-16 19:40:50 UTC

Description nvwarr 2017-12-11 15:53:37 UTC
Created attachment 1366093 [details]
Trace generated by gnome-weather

Description of problem:

gnome-weather gives a segmentation fault on startup if libproxy-mozjs is installed.

Version-Release number of selected component (if applicable):

gnome-weather-3.26.0-2.fc27.noarch
libproxy-mozjs-0.4.15-4.fc27.x86_64

How reproducible:

Always

Steps to Reproduce:
1. Install libproxy-mozjs
2. Start gnome-weather

Actual results:

Segmentation fault

Expected results:

Should give the weather forecast

Additional info:

Uninstalling libproxy-mozjs causes gnome-weather to work properly.

The problem seems have been fixed upstream with the commit "Add symbol versions - be ready to introduce new APIs as needed". I applied this patch to 0.4.15 gnome-weather works even if libproxy-mozjs is installed.

https://github.com/libproxy/libproxy/commit/2fb80a007a1de4cf8603d7f686fdbaf46ded961a

I have only triggered this with gnome-weather, but I guess it impacts quite a lot of other gnome stuff as well (anywhere, where libproxy is being used). That is why I've given it a high severity. I've seen lots of tracebacks which look a bit like mine, but start with a different program.

Comment 1 nvwarr 2018-02-14 09:29:54 UTC
The problem persists in rawhide.

Comment 2 Michael Catanzaro 2018-04-16 19:09:38 UTC
I recommend reporting this upstream at https://github.com/libproxy/libproxy/issues, since nobody will notice it here

Comment 3 nvwarr 2018-04-16 19:20:02 UTC
But it was already fixed upstream _before_ I reported it here. I even gave a link to the commit, which fixes it! I thought someone might apply that fix on Fedora. The fix works for me.

Comment 4 Michael Catanzaro 2018-04-16 19:32:06 UTC
Ah wow cool, the problem here is that gjs and libproxy are loading incompatible versions of mozjs in the same process: it's a guaranteed explosion. I've been rambling on about this theoretical possibility for a while now, and was confused why nobody had ever reported it before. This is actually the first bug report I've seen. You must be trying to run gnome-weather outside of GNOME, right? In that case, libproxy gets dlopened by glib-networking at runtime, then libproxy dlopens mozjs38, and meanwhile gjs of course links to mozjs52... boom. The fact that more people have not reported this probably indicates how rare it is that people try to run gjs applications outside of GNOME.

So the fact that the crashes went away after that commit is actually a really bad sign. I haven't investigated, but I suspect it means libproxy can't load its own builtin modules anymore, perhaps because the symbols are no longer exported. It looks like libproxy doesn't have any testsuite and hasn't had a release since that commit, which might explain why this hasn't been noticed yet.

So we should definitely not backport that commit to Fedora. And this would actually be a *really good* issue to report upstream, because this entire situation is nuts.

There is https://github.com/libproxy/libproxy/issues/71 but that doesn't solve the underlying problem, which is that dlopening mozjs is inherently unsafe as applications don't expect it.

More relevant discussion in: https://github.com/libproxy/libproxy/issues/68

But I would recommend reporting a new issue specifically for this. Actually, two new issues: one for the crash, one for the symbol versioning breaking the module loader.

Comment 5 Michael Catanzaro 2018-04-16 19:33:12 UTC
I'll open them

Comment 6 nvwarr 2018-04-17 07:44:10 UTC
Yes, I'm running gnome-weather outside of gnome. So I guess the problem doesn't occur, when gnome-weather starts within gnome... at least not in this specific way. From reading your post here and the ones you linked to, it sounds like quite a deep problem, of the sort that gives weird apparently unconnected problems all over the place. Clearly the "it works for me" workaround is not generally useful. It would be nice, if a proper fix could be done.

Indeed if I run strace I see libmozjs-52 is opened once and libmozjs-38 twice:
openat(AT_FDCWD, "/lib64/libmozjs-52.so.0", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libmozjs-38.so", O_RDONLY|O_CLOEXEC) = 12
openat(AT_FDCWD, "/lib64/libmozjs-38.so", O_RDONLY|O_CLOEXEC) = 12

(with the workaround). Without the workaround it only loads each one once. I suspect you are right, that adding the symbols has just broken the loading of libmozjs-38.so. If I delete this shared object, gnome-weather works, with or without the workaround. So definitely, libmozjs-38.so is causing the trouble.

I guess that if libproxy could be ported to libmozjs-52, it would be safe, but some mechanism is needed to prevent similar breakage in the future.

I guess px_proxy_factory_get_proxies needs to be made thread safe. In the patch you posted in issue 68, the code it removes is definitely not thread safe. There's a race between the delete this->pr; and the create immediately after. If it gets preempted between the two, by a thread on the same path, this->pr is not NULL, but points to a deleted object and bang! Probably, hard to trigger, but not impossible. The replacement certainly looks safer. However, I suspect this is only the tip of the iceberg. It doesn't look like it is ever safe to run two threads with libproxy simultaneously. I think this problem is separate from the original one I had though. i.e. issue 68 isn't an issue with the port to libmozjs-52 but a general multithreading issue.

Comment 7 Dan Winship 2018-04-17 13:29:46 UTC
(In reply to nvwarr from comment #6)
> I guess px_proxy_factory_get_proxies needs to be made thread safe. In the
> patch you posted in issue 68, the code it removes is definitely not thread
> safe. There's a race between the delete this->pr; and the create immediately
> after. If it gets preempted between the two, by a thread on the same path,
> this->pr is not NULL, but points to a deleted object and bang!

px_proxy_factory_get_proxies() uses a mutex already. The problem isn't a race condition with anything else in libproxy, it's just that the webkit/mozjs APIs get upset if you create a JSGlobalContextRef/JSContext object in one thread, and then later use it from a different thread, even if you aren't using it in more than one thread at the same time.

(In the mozjs case, this is because it explicitly checks that you're calling from the right thread. I'm not sure if webkit does the same thing, or if it uses other threads internally and ends up racing with itself when you misuse it.)

Comment 8 Ben Cotton 2018-11-27 14:52:39 UTC
This message is a reminder that Fedora 27 is nearing its end of life.
On 2018-Nov-30  Fedora will stop maintaining and issuing updates for
Fedora 27. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora  'version' of '27'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 27 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 9 Michael Catanzaro 2018-11-27 16:02:30 UTC
(In reply to Dan Winship from comment #7)
> (In the mozjs case, this is because it explicitly checks that you're calling
> from the right thread. I'm not sure if webkit does the same thing, or if it
> uses other threads internally and ends up racing with itself when you misuse
> it.)

For the record: if you want to use JSC in multiple threads, you indeed need to spin up a new JSC VM for each thread, and never share anything between threads.

Comment 10 Fedora Update System 2019-02-08 09:51:14 UTC
libproxy-0.4.15-10.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-7285a847c4

Comment 11 Fedora Update System 2019-02-10 04:28:18 UTC
libproxy-0.4.15-10.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-7285a847c4

Comment 12 Michael Catanzaro 2019-02-10 21:05:11 UTC
That update should fix it for F29, yes, but it will break again in a subsequent Fedora release unless we change libproxy to use a separate process for loading mozjs

Comment 13 Ben Cotton 2019-02-19 17:11:46 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 30 development cycle.
Changing version to '30.

Comment 14 Fedora Update System 2019-02-20 03:04:58 UTC
libproxy-0.4.15-10.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.