Bug 1936071

Summary: Firefox 86 freezes / becomes unresponsive during startup or browsing
Product: [Fedora] Fedora Reporter: Matthew Krupcale <mkrupcale>
Component: firefoxAssignee: Martin Stransky <stransky>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 33CC: erack, gecko-bugs-nobody, jhorak, kai-engert-fedora, mkrupcale, pjasicek, rhughes, robatino, rstrode, sandmann, stransky, wcohen, yulinux
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: firefox-105.0.2-1.fc38 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-30 18:40:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Partial GDB thread backtrace of freeze with new profile and a single window on GNOME/Wayland
none
firefox-87.0-2.fc33 partial GDB thread backtrace of freeze with default profile and multiple windows on GNOME/Wayland
none
(partial?) backtrace of the frozen process
none
uncomplete_bt_via_coredumpctl
none
firefox-87.0-4.fc33 partial GDB thread backtrace of freeze with default profile and multiple windows playing video on GNOME/X11 (Xwayland)
none
different backtrace from another computer (still firefox-87.0-2.fc33.x86_64) none

Description Matthew Krupcale 2021-03-06 13:17:12 UTC
Created attachment 1761131 [details]
Partial GDB thread backtrace of freeze with new profile and a single window on GNOME/Wayland

Description of problem:
Firefox 86 freezes / becomes unresponsive either on startup or sometime during regular usage (probably less than 3 hours), i.e. browsing a page or watching a video. If watching a video, the video will freeze, but audio can continue to play at least for some time.

I can then SIGTERM the main process and restart firefox and continue browsing until it freezes again.

See attached (partial) GDB backtrace as per[1,2]. I've only attached one BT here (new profile, single window, Wayland), but I have several others if you want under some specific conditions.

I did not have this issue with Firefox 85.

Version-Release number of selected component (if applicable):
firefox-86.0-7.fc33.x86_64
mesa-dri-drivers-20.3.4-2.fc33.x86_64
mutter-3.38.3-1.fc33.x86_64
xorg-x11-drv-nouveau-1.0.15-10.fc33.x86_64

How reproducible:
Consistently, either on startup or after minutes or hours of browsing.

Steps to Reproduce:
1. Start Firefox 86
2. If no freeze during startup, open web pages, browse, watch video until freeze

Actual results:
Browser freezes and becomes unresponsive.

Expected results:
Browser does not freeze or become unresponsive.

Additional info:

I'm able to reproduce with:
1. GNOME/Wayland and GNOME/X11
2. Default or new profile

I have produced the issue with official Mozilla firefox-86.0 binary, but it seems to take longer / is more difficult to reproduce than Fedora version. I have not yet reproduced the issue with Mozilla firefox-87.0b6.

Compositing: Basic
GPU: GF114 [GeForce GTX 560], mesa/nouveau drivers
Desktop: GNOME/Wayland

I was not able to get a full BT because GDB itself stopped responding after printing several thread backtraces, at which point I must SIGKILL it.

On my default profile, I have many Firefox windows (17) and tabs (104) open across 3 workspaces, but I'm able to consistently reproduce with a new profile and a single tab.

In the GDB session before the freeze, I see SIGPIPEs, e.g.:

Thread 7 "IPC I/O Parent" received signal SIGPIPE, Broken pipe.
Thread 41 "Socket Thread" received signal SIGPIPE, Broken pipe.
Thread 35 "Socket Thread" received signal SIGPIPE, Broken pipe.

and sometimes seccomp violations:

Sandbox: seccomp sandbox violation: pid 565887, tid 565895, syscall 203, args 565895 128 140737045826848 4 140737045829184 1.
Sandbox: seccomp sandbox violation: pid 565887, tid 565893, syscall 203, args 565893 128 140737062612256 8 140737062614592 1.
Sandbox: seccomp sandbox violation: pid 565887, tid 565892, syscall 203, args 565892 128 140737344695584 4 140737344697920 1.
Sandbox: seccomp sandbox violation: pid 565887, tid 565894, syscall 203, args 565894 128 140737054219552 4 140737054221888 1.

[1] https://fedoraproject.org/wiki/Debugging_guidelines_for_Mozilla_products#Attach_debugger_to_running_application
[2] https://fedoraproject.org/wiki/Debugging_guidelines_for_Mozilla_products#Obtain_crash_stack_trace

Comment 1 Matthew Krupcale 2021-03-25 19:33:57 UTC
Created attachment 1766403 [details]
firefox-87.0-2.fc33 partial GDB thread backtrace of freeze with default profile and multiple windows on GNOME/Wayland

Unfortunately, I'm still having this issue with firefox-87.0-2.fc33.x86_64. I've attached the resulting (partial) backtrace of my default session which has frozen on startup.

Should I continue testing with official Mozilla binaries? I hoped that since I had trouble producing with Mozilla firefox-87.0b6 binary that this might be fixed in F33 firefox-87, but that appears to not be the case.

Comment 2 yulinux 2021-03-28 20:46:07 UTC
I have the exact same problem on two different stable machines beginning at the same time a few weeks ago after an update (prabably the firefox package). Both machines use older AMD processors (one dualcore, the other one quadcore), use AMD graphic chips onboard (RS780 and RS880), use the newest Fedora x86_64 release, use wayland and were updated to the newest Fedora release several times already via dist-upgrade...

I also tried to obtain backtraces by killing the frozen firefox processes, see [1] via abrt from one machine and [2] from the other machine via the firefox crash reporter. Both backtraces are not complete and AFAICS not as good as the ones posted in this bug report.

Please let me know if I can help to debug this problem further. 

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1938665
 -> https://retrace.fedoraproject.org/faf/reports/90413/

[2] https://crash-stats.mozilla.org/report/index/f50e5a10-ee22-4666-af77-fbd000210328

Comment 3 Martin Stransky 2021-03-29 13:20:27 UTC
Can you use coredumpctl to get a backtrace?

https://fedoraproject.org/wiki/Debugging_guidelines_for_Mozilla_products#Using_coredumpctl_to_get_backtrace

Please also do:

- test Firefox 87.0 which should be available for Fedora now.
- try to disable WebRender - https://fedoraproject.org/wiki/How_to_debug_Firefox_problems?rd=Bug_info_Firefox#Check_WebRender
- try Firefox-x11 (i.e. without Wayland) - https://fedoraproject.org/wiki/How_to_debug_Firefox_problems?rd=Bug_info_Firefox#Check_Firefox_X11_Gtk.2B_backend_.28Gnome_only.29

Comment 4 yulinux 2021-03-29 15:21:41 UTC
Created attachment 1767393 [details]
(partial?) backtrace of the frozen process

Today firefox froze again, this time with version firefox-87.0-2.fc33.x86_64. I tried to get a better backtrace by installing the debuginfo package beforehand:
# dnf debuginfo-install firefox-87.0-2.fc33.x86_64

Following I attached gdb to the frozen process:
$ gdb firefox -p 2573
Output in gdb:
[...]
[New LWP 7832]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Missing separate debuginfos, use: dnf debuginfo-install at-spi2-atk-2.38.0-1.fc33.x86_64 
[...]
 zlib-1.2.11-23.fc33.x86_64
--Type <RET> for more, q to quit, c to continue without paging--c
0x00007f301f496a5f in __GI___poll (fds=0x7fff5fa7a310, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
29	  return SYSCALL_CANCEL (poll, fds, nfds, timeout);
(gdb) 

I tried to obtain the backtrace doing:
set logging on crash_bt_f_new
print DumpJSStack()
thread apply all bt full

After a minute or so, the process does not continue anymore so I have to kill the gdb process. The last lines from the backtrace I have to copy from the terminal window into the file crash_bt_f_new.
I attach my new backtrace crash_bt_f_new. I also found one more report of this bug [1], which also hints to sandboxing, as the end of my backtrace also suggests as well as the original reporter. If using Wayland or X11 should not make a difference according to the original bug reporter. Next I will try to disable WebRender as suggested.

[1] https://ask.fedoraproject.org/t/firefox-unstable-after-update-to-86-0-64-bit-under-fc33/12884

Comment 5 yulinux 2021-03-29 18:44:18 UTC
Created attachment 1767444 [details]
uncomplete_bt_via_coredumpctl

I tried to get the backtrace according to the wiki, but the process stopped automatically:
$ coredumpctl debug 6319
[...]
(gdb) thread apply all bt full
[...]
gdb terminated by signal ABRT.

I think my root file system runs full, there are just 4 GB left and thus the backtrace cannot be performed completely. I will have to resize the partitions, if possible...

BTW AFAICS WebRender is disabled in Firefox anyway on this machine:
about:support 
-> Compositing	Basic

about:config
-> gfx.webrender.all	false
-> gfx.webrender.enabled	false

Comment 6 Matthew Krupcale 2021-03-29 23:20:46 UTC
Created attachment 1767503 [details]
firefox-87.0-4.fc33 partial GDB thread backtrace of freeze with default profile and multiple windows playing video on GNOME/X11 (Xwayland)

(In reply to Martin Stransky from comment #3)
> Can you use coredumpctl to get a backtrace?

I don't see a relevant firefox coredump in `coredumpctl list`. Firefox doesn't actually crash, it just locks up and becomes unresponsive.

> - test Firefox 87.0 which should be available for Fedora now.

I still have this issue with firefox-87.0-2.fc33.x86_64 on GNOME/Wayland (see comment #1).

> - try to disable WebRender - https://fedoraproject.org/wiki/How_to_debug_Firefox_problems?rd=Bug_info_Firefox#Check_WebRender

I don't have WebRender enabled (my hardware doesn't qualify since I'm running NVIDIA/nouveau). I'm just using basic compositor.

> - try Firefox-x11 (i.e. without Wayland) - https://fedoraproject.org/wiki/How_to_debug_Firefox_problems?rd=Bug_info_Firefox#Check_Firefox_X11_Gtk.2B_backend_.28Gnome_only.29

I still have this issue with firefox-87.0-4.fc33.x86_64 on GNOME/X11 (Xwayland). See attached (partial) BT, also including glibc debuginfo. Unfortunately doesn't look too useful with all threads (shown) wait'ing.

(In reply to yulinux from comment #2)
> Both machines use older AMD processors

I wonder if this is relevant? I'm running AMD Phenom II X4 965. Maybe older/slower CPU makes this issue more likely?

It's also interesting that we both have

#0 recvmsg
#1 mozilla::SandboxBrokerCommon::RecvWithFd
#2 mozilla::SandboxBroker::ThreadMain
...

in our Wayland backtraces, with the other threads basically stuck on

#0 futex_*wait*
#1 __pthread_cond_wait_common
#2 pthread_cond_{timed,}wait
#3 mozilla::detail::ConditionVariableImpl::wait
#4 mozilla::OffTheBooksCondVar::Wait
...

Comment 7 Martin Stransky 2021-03-30 17:39:24 UTC
Yes, it looks like a deadlock on our Wayland basic compositor code. The uncomplete_bt_via_coredumpctl is very useful here, there's the potential deadlock between our painting code and vsync handler:

                #2  0x00007f9c00f941e0 __restore_rt (libpthread.so.0 + 0x141e0)
                #3  0x00007f9c00b5ba5f __GI___poll (libc.so.6 + 0xf6a5f)
                #4  0x00007f9bff2d392c wl_display_dispatch_queue (libwayland-client.so.0 + 0x892c)
                #5  0x00007f9bf82cf31d mozilla::widget::nsWaylandDisplay::WaitForSyncEnd() (libxul.so + 0x1e6831d)
                #6  0x00007f9bf82cf34d mozilla::widget::nsWaylandDisplay::SyncBegin() (libxul.so + 0x1e6834d)
                #7  0x00007f9bf9b715e1 operator() (libxul.so + 0x370a5e1)
                #8  0x00007f9bf8f0e8ce mozilla::RunnableTask::Run() (libxul.so + 0x2aa78ce)
                #9  0x00007f9bf8f0e467 mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) (libxul.so + 0x2aa7467)
                #10 0x00007f9bf91a8129 operator() (libxul.so + 0x2d41129)
                #11 0x00007f9bf8f10257 nsThread::ProcessNextEvent(bool, bool*) (libxul.so + 0x2aa9257)
                #12 0x00007f9bf8f0fd00 NS_ProcessNextEvent(nsIThread*, bool) (libxul.so + 0x2aa8d00)
                #13 0x00007f9bf8f27880 mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) (libxul.so + 0x2ac0880)
                #14 0x00007f9bf9325bb5 MessageLoop::RunInternal() (libxul.so + 0x2ebebb5)
                #15 0x00007f9bf9b52fcd nsBaseAppShell::Run() (libxul.so + 0x36ebfcd)
                #16 0x00007f9bf9eb9826 nsAppStartup::Run() (libxul.so + 0x3a52826)
                #17 0x00007f9bf9efa261 XREMain::XRE_mainRun() (libxul.so + 0x3a93261)
                #18 0x00007f9bf9ef760a XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) (libxul.so + 0x3a9060a)
                #19 0x00007f9bf9ef726e XRE_main(int, char**, mozilla::BootstrapConfig const&) (libxul.so + 0x3a9026e)
                #20 0x0000562fa02ad28f do_main (firefox + 0x5328f)
                #21 0x0000562fa029d9f6 main (firefox + 0x439f6)
                #22 0x00007f9c00a8d1e2 __libc_start_main (libc.so.6 + 0x281e2)
                #23 0x0000562fa02acede _start (firefox + 0x52ede)

and:

                #0  0x00007f9c00f8f6c2 futex_wait_cancelable (libpthread.so.0 + 0xf6c2)
                #1  0x00007f9bff2d275b wl_display_read_events (libwayland-client.so.0 + 0x775b)
                #2  0x00007f9bff2d3939 wl_display_dispatch_queue (libwayland-client.so.0 + 0x8939)
                #3  0x00007f9bf82cf31d mozilla::widget::nsWaylandDisplay::WaitForSyncEnd() (libxul.so + 0x1e6831d)
                #4  0x00007f9bf82c689e mozilla::widget::WindowSurfaceWayland::Lock(mozilla::gfx::IntRegionTyped<mozilla::LayoutDevicePixel> const&) (libxul.so + 0x1e5f89e)
                #5  0x00007f9bf9b6b42b mozilla::widget::WindowSurfaceProvider::StartRemoteDrawingInRegion(mozilla::gfx::IntRegionTyped<mozilla::LayoutDevicePixel>&, mozilla::layers::BufferMode*) (libxul.so + 0x370442b)
                #6  0x00007f9bf9511771 mozilla::layers::BasicCompositor::BeginFrameForWindow(mozilla::gfx::IntRegionTyped<mozilla::gfx::UnknownUnits> const&, mozilla::Maybe<mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> > const&, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::IntRegionTyped<mozilla::gfx::UnknownUnits> const&) (libxul.so + 0x30aa771)

It also explains why it helps to set widget.wayland_vsync.enabled to false at about:config.

Comment 8 Martin Stransky 2021-03-30 17:44:29 UTC
(In reply to Martin Stransky from comment #7)
> It also explains why it helps to set widget.wayland_vsync.enabled to false
> at about:config.

I'm wrong about widget.wayland_vsync.enabled it won't help here.

Comment 9 yulinux 2021-03-30 23:14:14 UTC
I resized my partitions and installed all listed debuginfo-packages, but when doing the backtrace at some point gdb crashes, although enough RAM and disk space is available (created bug 1944906). Is a complete backtrace still necessary or should I try reproduce it in an environment without wayland or try to disable sandboxing or try out something else? Thanks so far for looking into it!

Comment 10 Martin Stransky 2021-03-31 16:32:45 UTC
I think I have all needed data, Thanks.

Comment 11 Martin Stransky 2021-04-01 18:03:38 UTC
Upstream bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1702606

Comment 12 Martin Stransky 2021-04-01 18:28:12 UTC
Added to firefox-87.0-9 packages.

Comment 13 Matthew Krupcale 2021-04-05 01:05:49 UTC
I've been running firefox-87.0-9 for over 48 hours now without freezing, so I suspect this is fixed now. I don't intend to run firefox X11, but considering that the fix was Wayland-related, I'm curious if you suspect this also fixes the freeze with X11?

Comment 14 yulinux 2021-04-06 13:05:43 UTC
Created attachment 1769562 [details]
different backtrace from another computer (still firefox-87.0-2.fc33.x86_64)

Great, thank you. 
I uploaded a backtrace from the other computer "uncomplete_bt_via_coredumpctl_different_computer", which looks different to me (just one time wl_display_dispatch_queue) but still from firefox-87.0-2.fc33.x86_64. Should I open a separate bug report for this (same problem) and/or do you see something suspicious from that backtrace?

Comment 15 Martin Stransky 2021-04-28 09:45:57 UTC
This should be fixed in Firefox 89. It's a bug in vsync. try to set widget.wayland_vsync.enabled to false at about:config.

Comment 16 Ben Cotton 2021-11-04 14:37:20 UTC
This message is a reminder that Fedora 33 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 33 on 2021-11-30.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '33'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 33 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 17 Ben Cotton 2021-11-04 15:35:18 UTC
This message is a reminder that Fedora 33 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 33 on 2021-11-30.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '33'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 33 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 18 Ben Cotton 2021-11-30 18:40:21 UTC
Fedora 33 changed to end-of-life (EOL) status on 2021-11-30. Fedora 33 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 19 Fedora Update System 2022-10-05 12:46:41 UTC
FEDORA-2022-f0988ea008 has been submitted as an update to Fedora 38. https://bodhi.fedoraproject.org/updates/FEDORA-2022-f0988ea008

Comment 20 Fedora Update System 2022-10-05 12:57:08 UTC
FEDORA-2022-f0988ea008 has been pushed to the Fedora 38 stable repository.
If problem still persists, please make note of it in this bug report.