Bug 1842473

Summary: webkit2gtk segfault on wayland
Product: [Fedora] Fedora Reporter: Carlos Mogas da Silva <r3pek>
Component: egl-waylandAssignee: leigh scott <leigh123linux>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 31CC: erack, gnome-sig, kwizart, leigh123linux, mcatanza, negativo17, tpopela
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: egl-wayland-1.1.5-3.fc31 egl-wayland-1.1.5-3.fc32 egl-wayland-1.1.5-3.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-08-24 01:05:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Carlos Mogas da Silva 2020-06-01 11:05:40 UTC
Description of problem:
While running wayland on an nvidia card (using proprietary drivers), evolution crashes right after opening (i think it's because it's trying to display an email).


Version-Release number of selected component (if applicable):
evolution-3.34.4-1.fc31
webkit2gtk3-2.28.2-1.fc31

How reproducible: Everytime


Steps to Reproduce:
1. Install nvidia proprietary driver
2. allow gdm and gnome-shell to use wayland
3. try to run evolution under wayland

Actual results:
crashes with this stacktrace:
                Stack trace of thread 7930:
                #0  0x00007fa2c6a51625 raise (libc.so.6)
                #1  0x00007fa2c6a3a8d9 abort (libc.so.6)
                #2  0x00007fa2c6a3a7a9 __assert_fail_base.cold (libc.so.6)
                #3  0x00007fa2c6a49a66 __assert_fail (libc.so.6)
                #4  0x00007fa2a54d5b7d wlExternalApiLock (libnvidia-egl-wayland.so.1)
                #5  0x00007fa2a54da4ab wlEglGetInternalHandleExport (libnvidia-egl-wayland.so.1)
                #6  0x00007fa2a58584ef n/a (libEGL_nvidia.so.0)
                #7  0x00007fa2a57dfeeb n/a (libEGL_nvidia.so.0)
                #8  0x00007fa2a54d7752 wl_eglstream_display_bind (libnvidia-egl-wayland.so.1)
                #9  0x00007fa2a54d6355 wlEglBindDisplaysHook (libnvidia-egl-wayland.so.1)
                #10 0x00007fa2a58543f3 n/a (libEGL_nvidia.so.0)
                #11 0x00007fa2a57dc775 n/a (libEGL_nvidia.so.0)
                #12 0x00007fa2c50aab11 _ZN2WS8Instance10initializeEPv (libWPEBackend-fdo-1.0.so.1)
                #13 0x00007fa2c797cbf6 _ZN6WebKit14WebProcessPool28platformInitializeWebProcessERKNS_15WebProcessProxyERNS_28WebProcessCreationParametersE (libwebkit2gtk-4.0.so.37)
                #14 0x00007fa2c784fdfa _ZN6WebKit14WebProcessPool23initializeNewWebProcessERNS_15WebProcessProxyEPNS_16WebsiteDataStoreENS1_11IsPrewarmedE (libwebkit2gtk-4.0.so.37)
                #15 0x00007fa2c7850f17 _ZN6WebKit14WebProcessPool19createNewWebProcessEPNS_16WebsiteDataStoreENS_15WebProcessProxy11IsPrewarmedE (libwebkit2gtk-4.0.so.37)
                #16 0x00007fa2c785166d _ZN6WebKit14WebProcessPool27processForRegistrableDomainERNS_16WebsiteDataStoreEPNS_12WebPageProxyERKN7WebCore17RegistrableDomainE (libwebkit2gtk-4.0.so.37)
                #17 0x00007fa2c7851757 _ZN6WebKit12WebPageProxy13launchProcessERKN7WebCore17RegistrableDomainENS0_19ProcessLaunchReasonE (libwebkit2gtk-4.0.so.37)
                #18 0x00007fa2c78552ce _ZN6WebKit12WebPageProxy8loadDataERKN3IPC13DataReferenceERKN3WTF6StringES8_S8_PN3API6ObjectEN7WebCore28ShouldOpenExternalURLsPolicyE (libwebkit2gtk-4.0.so.37)
                #19 0x00007fa2c78f11a0 webkit_web_view_load_bytes (libwebkit2gtk-4.0.so.37)
                #20 0x00007fa2c6db9c90 web_view_load_string (libevolution-util.so)
                #21 0x00007fa2bd4b0cc9 mail_reader_set_folder (libevolution-mail.so)
                #22 0x00007fa2bd49f53c mail_paned_view_set_folder (libevolution-mail.so)
                #23 0x00007fa2bd2a6314 mail_shell_view_got_folder_cb (module-mail.so)
                #24 0x00007fa2ca2f670a g_task_return_now (libgio-2.0.so.0)
                #25 0x00007fa2ca2f674d complete_in_idle_cb (libgio-2.0.so.0)
                #26 0x00007fa2cacb2e8b g_idle_dispatch (libglib-2.0.so.0)
                #27 0x00007fa2cacb6570 g_main_context_dispatch (libglib-2.0.so.0)
                #28 0x00007fa2cacb6900 g_main_context_iterate.isra.0 (libglib-2.0.so.0)
                #29 0x00007fa2cacb6bf3 g_main_loop_run (libglib-2.0.so.0)
                #30 0x00007fa2ca7a043d gtk_main (libgtk-3.so.0)
                #31 0x0000556c6206278d main (evolution)
                #32 0x00007fa2c6a3c1a3 __libc_start_main (libc.so.6)
                #33 0x0000556c620628ee _start (evolution)
                



Expected results:
should work normally


Additional info:
I looked into the evolution bugzilla and found out this bug report [1] that mentions that webkitgtk is the culprit here and that version 2.29.1 *should* fix the issue (not garanteed). that version is only on rawhide atm, so, idk if it's possible to upgrade the f31/32 version.

[1] https://gitlab.gnome.org/GNOME/evolution/-/issues/927

Comment 1 Michael Catanzaro 2020-06-01 13:55:21 UTC
I don't see any evidence that this would be fixed in 2.29.1. I won't upgrade F31/F32 to unstable WebKit anyway. If it's really fixed in 2.29.1, which I doubt, then we'd need to identify the related commit and request it be backported to the next 2.28 release.

Anyway, to make progress, please post a proper backtrace taken with gdb 'bt full', showing where in libnvidia-egl-wayland the crash occurs. You're lucky that component is open source, because otherwise this would be CANTFIX.

Comment 2 Carlos Mogas da Silva 2020-06-01 17:03:01 UTC
That webkit2gtk3 part is huge so I pasted the "bt full" here: https://l.r3pek.org/288be

This is just the first 14 calls.

(gdb) bt full
#0  0x00007ffff3a9c625 in raise () at /lib64/libc.so.6
#1  0x00007ffff3a858d9 in abort () at /lib64/libc.so.6
#2  0x00007ffff3a857a9 in _nl_load_domain.cold () at /lib64/libc.so.6
#3  0x00007ffff3a94a66 in annobin_assert.c_end () at /lib64/libc.so.6
#4  0x00007fffe4109b7d in wlExternalApiLock () at ../src/wayland-thread.c:87
        __PRETTY_FUNCTION__ = "wlExternalApiLock"
#5  0x00007fffe410e4ab in wlEglGetInternalHandleExport (dpy=0x5555566dad60, type=13233, handle=0x5555566dad60) at ../src/wayland-eglhandle.c:146
#6  0x00007fffd65574ef in  () at /lib64/libEGL_nvidia.so.0
#7  0x00007fffd64deeeb in  () at /lib64/libEGL_nvidia.so.0
#8  0x00007fffe410b752 in wl_eglstream_display_bind (data=data@entry=0x5555566cc5c0, wlDisplay=wlDisplay@entry=0x55555649b360, eglDisplay=eglDisplay@entry=0x5555566dad60)
    at ../src/wayland-eglstream-server.c:311
        wlStreamDpy = 0x555556b69f90
        exts = 0x0
        env = 0x0
#9  0x00007fffe410a355 in wlEglBindDisplaysHook (data=0x5555566cc5c0, dpy=0x5555566dad60, nativeDpy=0x55555649b360) at ../src/wayland-egldisplay.c:87
        res = 0
#10 0x00007fffd65533f3 in  () at /lib64/libEGL_nvidia.so.0
#11 0x00007fffd64db775 in  () at /lib64/libEGL_nvidia.so.0
#12 0x00007ffff20f5b11 in WS::Instance::initialize(void*) () at /lib64/libWPEBackend-fdo-1.0.so.1
#13 0x00007ffff49c7bf6 in WebKit::WebProcessPool::platformInitializeWebProcess(WebKit::WebProcessProxy const&, WebKit::WebProcessCreationParameters&) (this=this@entry=0x7fffe42ee000, process=
    ..., parameters=...) at ../Source/WebKit/UIProcess/glib/WebProcessPoolGLib.cpp:119
#14 0x00007ffff489adfa in WebKit::WebProcessPool::initializeNewWebProcess(WebKit::WebProcessProxy&, WebKit::WebsiteDataStore*, WebKit::WebProcessProxy::IsPrewarmed)
    (this=<optimized out>, process=..., websiteDataStore=0x7fffe42e4000, isPrewarmed=WebKit::WebProcessProxy::IsPrewarmed::No) at ../Source/WebKit/UIProcess/WebProcessPool.cpp:1044
        initializationActivity = {m_ref = std::unique_ptr<WebKit::ProcessThrottler::Activity<(WebKit::ProcessThrottler::ActivityType)0>> = {get() = 0x0}}
        parameters = <snip here>


Looks like the opensource part of the driver is having trouble with locking? (relevant code below)
    if (!wlMutexInitialized || pthread_mutex_lock(&wlMutex)) {
        assert(!"failed to lock pthread mutex");
        return -1;
    }

Comment 3 Michael Catanzaro 2020-06-01 18:25:19 UTC
One final question before I reassign component: does the crash still occur if you run with the environment variable WEBKIT_FORCE_SANDBOX=0? We just found a sandbox bug that can cause certain syscalls to randomly fail so let's eliminate that potential cause first.

Comment 4 Carlos Mogas da Silva 2020-06-01 18:28:41 UTC
(In reply to Michael Catanzaro from comment #3)
> One final question before I reassign component: does the crash still occur
> if you run with the environment variable WEBKIT_FORCE_SANDBOX=0? We just
> found a sandbox bug that can cause certain syscalls to randomly fail so
> let's eliminate that potential cause first.

Yes, same error.

Comment 5 Michael Catanzaro 2020-06-01 18:33:06 UTC
OK -> egl-wayland for further diagnosis

Comment 6 leigh scott 2020-06-01 19:28:52 UTC
I have built the latest version

https://bodhi.fedoraproject.org/updates/FEDORA-2020-be2c4beb82

https://koji.fedoraproject.org/koji/buildinfo?buildID=1519004


If it still reproduces after that you will need to upstream

https://github.com/NVIDIA/egl-wayland/issues

Comment 7 Carlos Mogas da Silva 2020-06-01 22:09:38 UTC
Nop, still the same error on the latest version.

Reported upstream.

Comment 8 leigh scott 2020-06-02 10:57:57 UTC
(In reply to Carlos Mogas da Silva from comment #7)
> Nop, still the same error on the latest version.
> 
> Reported upstream.

Thank you for forwarding the issue upstream, without the debug symbols for libEGL_nvidia.so.0 it is hard to make sense of it.

Comment 9 Carlos Mogas da Silva 2020-08-14 16:06:49 UTC
Upstream bug closed with a fix applied. They didn't version bump, but you can pick the patch up if you want ;)

Comment 10 leigh scott 2020-08-14 18:12:03 UTC
(In reply to Carlos Mogas da Silva from comment #9)
> Upstream bug closed with a fix applied. They didn't version bump, but you
> can pick the patch up if you want ;)

I have added a comment to the commit

https://github.com/NVIDIA/egl-wayland/commit/9558ec02d0f7bbf30dc1f9ee4c0b06c9b0c49afe

Comment 11 Fedora Update System 2020-08-15 00:40:14 UTC
FEDORA-2020-6900d113da has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2020-6900d113da

Comment 12 Fedora Update System 2020-08-15 00:40:14 UTC
FEDORA-EPEL-2020-83d4434be7 has been submitted as an update to Fedora EPEL 7. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2020-83d4434be7

Comment 13 Fedora Update System 2020-08-15 00:40:15 UTC
FEDORA-2020-fcc03a2706 has been submitted as an update to Fedora 32. https://bodhi.fedoraproject.org/updates/FEDORA-2020-fcc03a2706

Comment 14 Fedora Update System 2020-08-16 01:30:12 UTC
FEDORA-2020-fcc03a2706 has been pushed to the Fedora 32 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-fcc03a2706`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-fcc03a2706

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 15 Fedora Update System 2020-08-16 01:36:39 UTC
FEDORA-EPEL-2020-83d4434be7 has been pushed to the Fedora EPEL 7 testing repository.

You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2020-83d4434be7

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 16 Fedora Update System 2020-08-16 01:38:44 UTC
FEDORA-2020-6900d113da has been pushed to the Fedora 31 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-6900d113da`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-6900d113da

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 17 Fedora Update System 2020-08-24 01:05:57 UTC
FEDORA-2020-6900d113da has been pushed to the Fedora 31 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 18 Fedora Update System 2020-08-24 01:12:49 UTC
FEDORA-2020-fcc03a2706 has been pushed to the Fedora 32 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 19 Fedora Update System 2020-08-31 16:28:25 UTC
FEDORA-EPEL-2020-83d4434be7 has been pushed to the Fedora EPEL 7 stable repository.
If problem still persists, please make note of it in this bug report.