Bug 1936071 - Firefox 86 freezes / becomes unresponsive during startup or browsing
Summary: Firefox 86 freezes / becomes unresponsive during startup or browsing
Keywords:
Status: MODIFIED
Alias: None
Product: Fedora
Classification: Fedora
Component: firefox
Version: 33
Hardware: x86_64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Martin Stransky
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-06 13:17 UTC by Matthew Krupcale
Modified: 2021-04-28 09:45 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug


Attachments (Terms of Use)
Partial GDB thread backtrace of freeze with new profile and a single window on GNOME/Wayland (72.22 KB, text/plain)
2021-03-06 13:17 UTC, Matthew Krupcale
no flags Details
firefox-87.0-2.fc33 partial GDB thread backtrace of freeze with default profile and multiple windows on GNOME/Wayland (188.38 KB, text/plain)
2021-03-25 19:33 UTC, Matthew Krupcale
no flags Details
(partial?) backtrace of the frozen process (322.67 KB, text/plain)
2021-03-29 15:21 UTC, yulinux
no flags Details
uncomplete_bt_via_coredumpctl (533.48 KB, text/plain)
2021-03-29 18:44 UTC, yulinux
no flags Details
firefox-87.0-4.fc33 partial GDB thread backtrace of freeze with default profile and multiple windows playing video on GNOME/X11 (Xwayland) (74.57 KB, text/plain)
2021-03-29 23:20 UTC, Matthew Krupcale
no flags Details
different backtrace from another computer (still firefox-87.0-2.fc33.x86_64) (538.34 KB, text/plain)
2021-04-06 13:05 UTC, yulinux
no flags Details

Description Matthew Krupcale 2021-03-06 13:17:12 UTC
Created attachment 1761131 [details]
Partial GDB thread backtrace of freeze with new profile and a single window on GNOME/Wayland

Description of problem:
Firefox 86 freezes / becomes unresponsive either on startup or sometime during regular usage (probably less than 3 hours), i.e. browsing a page or watching a video. If watching a video, the video will freeze, but audio can continue to play at least for some time.

I can then SIGTERM the main process and restart firefox and continue browsing until it freezes again.

See attached (partial) GDB backtrace as per[1,2]. I've only attached one BT here (new profile, single window, Wayland), but I have several others if you want under some specific conditions.

I did not have this issue with Firefox 85.

Version-Release number of selected component (if applicable):
firefox-86.0-7.fc33.x86_64
mesa-dri-drivers-20.3.4-2.fc33.x86_64
mutter-3.38.3-1.fc33.x86_64
xorg-x11-drv-nouveau-1.0.15-10.fc33.x86_64

How reproducible:
Consistently, either on startup or after minutes or hours of browsing.

Steps to Reproduce:
1. Start Firefox 86
2. If no freeze during startup, open web pages, browse, watch video until freeze

Actual results:
Browser freezes and becomes unresponsive.

Expected results:
Browser does not freeze or become unresponsive.

Additional info:

I'm able to reproduce with:
1. GNOME/Wayland and GNOME/X11
2. Default or new profile

I have produced the issue with official Mozilla firefox-86.0 binary, but it seems to take longer / is more difficult to reproduce than Fedora version. I have not yet reproduced the issue with Mozilla firefox-87.0b6.

Compositing: Basic
GPU: GF114 [GeForce GTX 560], mesa/nouveau drivers
Desktop: GNOME/Wayland

I was not able to get a full BT because GDB itself stopped responding after printing several thread backtraces, at which point I must SIGKILL it.

On my default profile, I have many Firefox windows (17) and tabs (104) open across 3 workspaces, but I'm able to consistently reproduce with a new profile and a single tab.

In the GDB session before the freeze, I see SIGPIPEs, e.g.:

Thread 7 "IPC I/O Parent" received signal SIGPIPE, Broken pipe.
Thread 41 "Socket Thread" received signal SIGPIPE, Broken pipe.
Thread 35 "Socket Thread" received signal SIGPIPE, Broken pipe.

and sometimes seccomp violations:

Sandbox: seccomp sandbox violation: pid 565887, tid 565895, syscall 203, args 565895 128 140737045826848 4 140737045829184 1.
Sandbox: seccomp sandbox violation: pid 565887, tid 565893, syscall 203, args 565893 128 140737062612256 8 140737062614592 1.
Sandbox: seccomp sandbox violation: pid 565887, tid 565892, syscall 203, args 565892 128 140737344695584 4 140737344697920 1.
Sandbox: seccomp sandbox violation: pid 565887, tid 565894, syscall 203, args 565894 128 140737054219552 4 140737054221888 1.

[1] https://fedoraproject.org/wiki/Debugging_guidelines_for_Mozilla_products#Attach_debugger_to_running_application
[2] https://fedoraproject.org/wiki/Debugging_guidelines_for_Mozilla_products#Obtain_crash_stack_trace

Comment 1 Matthew Krupcale 2021-03-25 19:33:57 UTC
Created attachment 1766403 [details]
firefox-87.0-2.fc33 partial GDB thread backtrace of freeze with default profile and multiple windows on GNOME/Wayland

Unfortunately, I'm still having this issue with firefox-87.0-2.fc33.x86_64. I've attached the resulting (partial) backtrace of my default session which has frozen on startup.

Should I continue testing with official Mozilla binaries? I hoped that since I had trouble producing with Mozilla firefox-87.0b6 binary that this might be fixed in F33 firefox-87, but that appears to not be the case.

Comment 2 yulinux 2021-03-28 20:46:07 UTC
I have the exact same problem on two different stable machines beginning at the same time a few weeks ago after an update (prabably the firefox package). Both machines use older AMD processors (one dualcore, the other one quadcore), use AMD graphic chips onboard (RS780 and RS880), use the newest Fedora x86_64 release, use wayland and were updated to the newest Fedora release several times already via dist-upgrade...

I also tried to obtain backtraces by killing the frozen firefox processes, see [1] via abrt from one machine and [2] from the other machine via the firefox crash reporter. Both backtraces are not complete and AFAICS not as good as the ones posted in this bug report.

Please let me know if I can help to debug this problem further. 

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1938665
 -> https://retrace.fedoraproject.org/faf/reports/90413/

[2] https://crash-stats.mozilla.org/report/index/f50e5a10-ee22-4666-af77-fbd000210328

Comment 3 Martin Stransky 2021-03-29 13:20:27 UTC
Can you use coredumpctl to get a backtrace?

https://fedoraproject.org/wiki/Debugging_guidelines_for_Mozilla_products#Using_coredumpctl_to_get_backtrace

Please also do:

- test Firefox 87.0 which should be available for Fedora now.
- try to disable WebRender - https://fedoraproject.org/wiki/How_to_debug_Firefox_problems?rd=Bug_info_Firefox#Check_WebRender
- try Firefox-x11 (i.e. without Wayland) - https://fedoraproject.org/wiki/How_to_debug_Firefox_problems?rd=Bug_info_Firefox#Check_Firefox_X11_Gtk.2B_backend_.28Gnome_only.29

Comment 4 yulinux 2021-03-29 15:21:41 UTC
Created attachment 1767393 [details]
(partial?) backtrace of the frozen process

Today firefox froze again, this time with version firefox-87.0-2.fc33.x86_64. I tried to get a better backtrace by installing the debuginfo package beforehand:
# dnf debuginfo-install firefox-87.0-2.fc33.x86_64

Following I attached gdb to the frozen process:
$ gdb firefox -p 2573
Output in gdb:
[...]
[New LWP 7832]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Missing separate debuginfos, use: dnf debuginfo-install at-spi2-atk-2.38.0-1.fc33.x86_64 
[...]
 zlib-1.2.11-23.fc33.x86_64
--Type <RET> for more, q to quit, c to continue without paging--c
0x00007f301f496a5f in __GI___poll (fds=0x7fff5fa7a310, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
29	  return SYSCALL_CANCEL (poll, fds, nfds, timeout);
(gdb) 

I tried to obtain the backtrace doing:
set logging on crash_bt_f_new
print DumpJSStack()
thread apply all bt full

After a minute or so, the process does not continue anymore so I have to kill the gdb process. The last lines from the backtrace I have to copy from the terminal window into the file crash_bt_f_new.
I attach my new backtrace crash_bt_f_new. I also found one more report of this bug [1], which also hints to sandboxing, as the end of my backtrace also suggests as well as the original reporter. If using Wayland or X11 should not make a difference according to the original bug reporter. Next I will try to disable WebRender as suggested.

[1] https://ask.fedoraproject.org/t/firefox-unstable-after-update-to-86-0-64-bit-under-fc33/12884

Comment 5 yulinux 2021-03-29 18:44:18 UTC
Created attachment 1767444 [details]
uncomplete_bt_via_coredumpctl

I tried to get the backtrace according to the wiki, but the process stopped automatically:
$ coredumpctl debug 6319
[...]
(gdb) thread apply all bt full
[...]
gdb terminated by signal ABRT.

I think my root file system runs full, there are just 4 GB left and thus the backtrace cannot be performed completely. I will have to resize the partitions, if possible...

BTW AFAICS WebRender is disabled in Firefox anyway on this machine:
about:support 
-> Compositing	Basic

about:config
-> gfx.webrender.all	false
-> gfx.webrender.enabled	false

Comment 6 Matthew Krupcale 2021-03-29 23:20:46 UTC
Created attachment 1767503 [details]
firefox-87.0-4.fc33 partial GDB thread backtrace of freeze with default profile and multiple windows playing video on GNOME/X11 (Xwayland)

(In reply to Martin Stransky from comment #3)
> Can you use coredumpctl to get a backtrace?

I don't see a relevant firefox coredump in `coredumpctl list`. Firefox doesn't actually crash, it just locks up and becomes unresponsive.

> - test Firefox 87.0 which should be available for Fedora now.

I still have this issue with firefox-87.0-2.fc33.x86_64 on GNOME/Wayland (see comment #1).

> - try to disable WebRender - https://fedoraproject.org/wiki/How_to_debug_Firefox_problems?rd=Bug_info_Firefox#Check_WebRender

I don't have WebRender enabled (my hardware doesn't qualify since I'm running NVIDIA/nouveau). I'm just using basic compositor.

> - try Firefox-x11 (i.e. without Wayland) - https://fedoraproject.org/wiki/How_to_debug_Firefox_problems?rd=Bug_info_Firefox#Check_Firefox_X11_Gtk.2B_backend_.28Gnome_only.29

I still have this issue with firefox-87.0-4.fc33.x86_64 on GNOME/X11 (Xwayland). See attached (partial) BT, also including glibc debuginfo. Unfortunately doesn't look too useful with all threads (shown) wait'ing.

(In reply to yulinux from comment #2)
> Both machines use older AMD processors

I wonder if this is relevant? I'm running AMD Phenom II X4 965. Maybe older/slower CPU makes this issue more likely?

It's also interesting that we both have

#0 recvmsg
#1 mozilla::SandboxBrokerCommon::RecvWithFd
#2 mozilla::SandboxBroker::ThreadMain
...

in our Wayland backtraces, with the other threads basically stuck on

#0 futex_*wait*
#1 __pthread_cond_wait_common
#2 pthread_cond_{timed,}wait
#3 mozilla::detail::ConditionVariableImpl::wait
#4 mozilla::OffTheBooksCondVar::Wait
...

Comment 7 Martin Stransky 2021-03-30 17:39:24 UTC
Yes, it looks like a deadlock on our Wayland basic compositor code. The uncomplete_bt_via_coredumpctl is very useful here, there's the potential deadlock between our painting code and vsync handler:

                #2  0x00007f9c00f941e0 __restore_rt (libpthread.so.0 + 0x141e0)
                #3  0x00007f9c00b5ba5f __GI___poll (libc.so.6 + 0xf6a5f)
                #4  0x00007f9bff2d392c wl_display_dispatch_queue (libwayland-client.so.0 + 0x892c)
                #5  0x00007f9bf82cf31d mozilla::widget::nsWaylandDisplay::WaitForSyncEnd() (libxul.so + 0x1e6831d)
                #6  0x00007f9bf82cf34d mozilla::widget::nsWaylandDisplay::SyncBegin() (libxul.so + 0x1e6834d)
                #7  0x00007f9bf9b715e1 operator() (libxul.so + 0x370a5e1)
                #8  0x00007f9bf8f0e8ce mozilla::RunnableTask::Run() (libxul.so + 0x2aa78ce)
                #9  0x00007f9bf8f0e467 mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) (libxul.so + 0x2aa7467)
                #10 0x00007f9bf91a8129 operator() (libxul.so + 0x2d41129)
                #11 0x00007f9bf8f10257 nsThread::ProcessNextEvent(bool, bool*) (libxul.so + 0x2aa9257)
                #12 0x00007f9bf8f0fd00 NS_ProcessNextEvent(nsIThread*, bool) (libxul.so + 0x2aa8d00)
                #13 0x00007f9bf8f27880 mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) (libxul.so + 0x2ac0880)
                #14 0x00007f9bf9325bb5 MessageLoop::RunInternal() (libxul.so + 0x2ebebb5)
                #15 0x00007f9bf9b52fcd nsBaseAppShell::Run() (libxul.so + 0x36ebfcd)
                #16 0x00007f9bf9eb9826 nsAppStartup::Run() (libxul.so + 0x3a52826)
                #17 0x00007f9bf9efa261 XREMain::XRE_mainRun() (libxul.so + 0x3a93261)
                #18 0x00007f9bf9ef760a XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) (libxul.so + 0x3a9060a)
                #19 0x00007f9bf9ef726e XRE_main(int, char**, mozilla::BootstrapConfig const&) (libxul.so + 0x3a9026e)
                #20 0x0000562fa02ad28f do_main (firefox + 0x5328f)
                #21 0x0000562fa029d9f6 main (firefox + 0x439f6)
                #22 0x00007f9c00a8d1e2 __libc_start_main (libc.so.6 + 0x281e2)
                #23 0x0000562fa02acede _start (firefox + 0x52ede)

and:

                #0  0x00007f9c00f8f6c2 futex_wait_cancelable (libpthread.so.0 + 0xf6c2)
                #1  0x00007f9bff2d275b wl_display_read_events (libwayland-client.so.0 + 0x775b)
                #2  0x00007f9bff2d3939 wl_display_dispatch_queue (libwayland-client.so.0 + 0x8939)
                #3  0x00007f9bf82cf31d mozilla::widget::nsWaylandDisplay::WaitForSyncEnd() (libxul.so + 0x1e6831d)
                #4  0x00007f9bf82c689e mozilla::widget::WindowSurfaceWayland::Lock(mozilla::gfx::IntRegionTyped<mozilla::LayoutDevicePixel> const&) (libxul.so + 0x1e5f89e)
                #5  0x00007f9bf9b6b42b mozilla::widget::WindowSurfaceProvider::StartRemoteDrawingInRegion(mozilla::gfx::IntRegionTyped<mozilla::LayoutDevicePixel>&, mozilla::layers::BufferMode*) (libxul.so + 0x370442b)
                #6  0x00007f9bf9511771 mozilla::layers::BasicCompositor::BeginFrameForWindow(mozilla::gfx::IntRegionTyped<mozilla::gfx::UnknownUnits> const&, mozilla::Maybe<mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> > const&, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::IntRegionTyped<mozilla::gfx::UnknownUnits> const&) (libxul.so + 0x30aa771)

It also explains why it helps to set widget.wayland_vsync.enabled to false at about:config.

Comment 8 Martin Stransky 2021-03-30 17:44:29 UTC
(In reply to Martin Stransky from comment #7)
> It also explains why it helps to set widget.wayland_vsync.enabled to false
> at about:config.

I'm wrong about widget.wayland_vsync.enabled it won't help here.

Comment 9 yulinux 2021-03-30 23:14:14 UTC
I resized my partitions and installed all listed debuginfo-packages, but when doing the backtrace at some point gdb crashes, although enough RAM and disk space is available (created bug 1944906). Is a complete backtrace still necessary or should I try reproduce it in an environment without wayland or try to disable sandboxing or try out something else? Thanks so far for looking into it!

Comment 10 Martin Stransky 2021-03-31 16:32:45 UTC
I think I have all needed data, Thanks.

Comment 11 Martin Stransky 2021-04-01 18:03:38 UTC
Upstream bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1702606

Comment 12 Martin Stransky 2021-04-01 18:28:12 UTC
Added to firefox-87.0-9 packages.

Comment 13 Matthew Krupcale 2021-04-05 01:05:49 UTC
I've been running firefox-87.0-9 for over 48 hours now without freezing, so I suspect this is fixed now. I don't intend to run firefox X11, but considering that the fix was Wayland-related, I'm curious if you suspect this also fixes the freeze with X11?

Comment 14 yulinux 2021-04-06 13:05:43 UTC
Created attachment 1769562 [details]
different backtrace from another computer (still firefox-87.0-2.fc33.x86_64)

Great, thank you. 
I uploaded a backtrace from the other computer "uncomplete_bt_via_coredumpctl_different_computer", which looks different to me (just one time wl_display_dispatch_queue) but still from firefox-87.0-2.fc33.x86_64. Should I open a separate bug report for this (same problem) and/or do you see something suspicious from that backtrace?

Comment 15 Martin Stransky 2021-04-28 09:45:57 UTC
This should be fixed in Firefox 89. It's a bug in vsync. try to set widget.wayland_vsync.enabled to false at about:config.


Note You need to log in before you can comment on or make changes to this bug.