Bug 1895920

Summary: Chromium 86 crashes on WebRTC videos when switching window
Product: [Fedora] Fedora Reporter: Eric Lavarde <elavarde>
Component: chromiumAssignee: Tom "spot" Callaway <spotrh>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 32CC: blu2lz, dietrich.moerman, gombosg, k, ltoscano, mcl, rday, rgarcia, spotrh, tpopela, yaneti
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: chromium-87.0.4280.66-1.fc33 chromium-87.0.4280.66-1.fc32 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-22 01:25:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Eric Lavarde 2020-11-09 12:50:43 UTC
Description of problem:
I upgrade Chromium from 85 to 86 on Friday and today, Chromium crashes whenever I do a "Alt+Tab" while being in a videoconference (BlueJeans or Google Meet, no difference).

Version-Release number of selected component (if applicable):

chromium-86.0.4240.111-1.fc32.x86_64                                                                                                                                                                                  chromium-common-86.0.4240.111-1.fc32.x86_64                                                                                                                                                                           

How reproducible:
Always (even after a reboot)

Steps to Reproduce:
1. open a video conference tool (BlueJeans or Google Meet), you don't even need to really join the room, seeing one's own video before joining is sufficient
2. Press Alt+Tab (or sometimes just wait)

Actual results:

Chromium crashes, journalctl --user shows the following from start till crash of the browser:

```
Nov 09 13:11:56 myuser-t590 gnome-shell[5002]: ATTENTION: default value of option allow_rgb10_configs overridden by environment.
Nov 09 13:11:56 myuser-t590 gnome-keyring-daemon[3447]: asked to register item /org/freedesktop/secrets/collection/login/206, but it's already registered
Nov 09 13:11:56 myuser-t590 gnome-shell[5002]: [5002:5002:1109/131156.828651:ERROR:vaapi_wrapper.cc(546)] vaInitialize failed: unknown libva error
Nov 09 13:11:56 myuser-t590 gnome-shell[5002]: [5002:5002:1109/131156.837549:ERROR:sandbox_linux.cc(374)] InitializeSandbox() called with multiple threads in process gpu-process.
Nov 09 13:11:57 myuser-t590 gnome-keyring-daemon[3447]: asked to register item /org/freedesktop/secrets/collection/login/206, but it's already registered
Nov 09 13:12:06 myuser-t590 gnome-shell[4972]: Received signal 11 SEGV_MAPERR 000000000000
Nov 09 13:12:06 myuser-t590 gnome-shell[4972]: #0 0x556c6285ad19 base::debug::CollectStackTrace()
Nov 09 13:12:06 myuser-t590 gnome-shell[4972]: #1 0x556c627c48b6 base::debug::StackTrace::StackTrace()
Nov 09 13:12:06 myuser-t590 gnome-shell[4972]: #2 0x556c6285a759 base::debug::(anonymous namespace)::StackDumpSignalHandler()
Nov 09 13:12:06 myuser-t590 gnome-shell[4972]: #3 0x7f175f072a90 (/usr/lib64/libpthread-2.31.so+0x14a8f)
Nov 09 13:12:06 myuser-t590 gnome-shell[4972]: #4 0x556c5f081694 _ZN5mediaL15RoundedDivisionEli.cold
Nov 09 13:12:06 myuser-t590 gnome-shell[4972]: #5 0x556c6317b81e x11::Connection::Dispatch()
Nov 09 13:12:06 myuser-t590 gnome-shell[4972]: #6 0x556c6285be6b base::FileDescriptorWatcher::Controller::RunCallback()
Nov 09 13:12:06 myuser-t590 gnome-shell[4972]: #7 0x556c62822712 base::TaskAnnotator::RunTask()
Nov 09 13:12:06 myuser-t590 gnome-shell[4972]: #8 0x556c62835b28 base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWorkImpl()
Nov 09 13:12:06 myuser-t590 gnome-shell[4972]: #9 0x556c62835ece base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWork()
Nov 09 13:12:06 myuser-t590 gnome-shell[4972]: #10 0x556c6287fdc1 base::MessagePumpLibevent::Run()
Nov 09 13:12:06 myuser-t590 gnome-shell[4972]: #11 0x556c62834fd3 base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::Run()
Nov 09 13:12:06 myuser-t590 gnome-shell[4972]: #12 0x556c62805f3c base::RunLoop::Run()
Nov 09 13:12:06 myuser-t590 gnome-shell[4972]: #13 0x556c6010a9ba content::BrowserProcessSubThread::IOThreadRun()
Nov 09 13:12:06 myuser-t590 gnome-shell[4972]: #14 0x556c628453b8 base::Thread::ThreadMain()
Nov 09 13:12:06 myuser-t590 gnome-shell[4972]: #15 0x556c6286d0b5 base::(anonymous namespace)::ThreadFunc()
Nov 09 13:12:06 myuser-t590 gnome-shell[4972]: #16 0x7f175f067432 start_thread
Nov 09 13:12:06 myuser-t590 gnome-shell[4972]: #17 0x7f175d2dc913 __GI___clone
Nov 09 13:12:06 myuser-t590 gnome-shell[4972]:   r8: 0000000000000000  r9: 0000000000000000 r10: 0000556c6b059ce0 r11: 0000000000000000
Nov 09 13:12:06 myuser-t590 gnome-shell[4972]:  r12: 00002f6d0a4ab550 r13: 00002f6d0b721d20 r14: 00007f1749fd12f0 r15: 00002f6d0bd7d9c0
Nov 09 13:12:06 myuser-t590 gnome-shell[4972]:   di: 00002f6d0a4ab540  si: 00007f1749fd12f0  bp: 00007f1749fd1340  bx: 00002f6d0b721b00
Nov 09 13:12:06 myuser-t590 gnome-shell[4972]:   dx: 00000002f6d0b000  ax: 0000556c6a9ce710  cx: 0000556c6b0dc0c0  sp: 00007f1749fd12d8
Nov 09 13:12:06 myuser-t590 gnome-shell[4972]:   ip: 0000556c5f081694 efl: 0000000000010293 cgf: 002b000000000033 erf: 0000000000000004
Nov 09 13:12:06 myuser-t590 gnome-shell[4972]:  trp: 000000000000000e msk: 0000000000000000 cr2: 0000000000000000
Nov 09 13:12:06 myuser-t590 gnome-shell[4972]: [end of stack trace]
Nov 09 13:12:06 myuser-t590 gnome-shell[4972]: Calling _exit(1). Core file will not be generated.
Nov 09 13:12:06 myuser-t590 systemd[3428]: gnome-launched-chrom-4972.scope: Succeeded.
Nov 09 13:12:06 myuser-t590 systemd[3428]: gnome-launched-chrom-4972.scope: Consumed 9.427s CPU time.
```

Expected results:

Chromium doesn't crash :-/

Additional info:

The issue definitely didn't appear with chromium 85 on Friday. I downloaded chromium and chromium-common from https://koji.fedoraproject.org/koji/buildinfo?buildID=1615463 and installed them, and since then, the issue is gone again.

I have Gnome with X (not Wayland). Let me know if you need more information.

Issue could be related to https://bugzilla.redhat.com/show_bug.cgi?id=1894980 but there isn't enough information for me to decide.

Comment 1 Tom "spot" Callaway 2020-11-09 15:28:49 UTC
Looks like you have hardware accelerated video decoding on. Try turning that off and seeing if the issue goes away:

chrome://flags/#enable-accelerated-video-decode

It would also be useful to know if you are able to reproduce this in the current stable version of Google Chrome (either with hardware accelerated video decode enabled or disabled). I suspect there are a _LOT_ of bugs in Linux VAAPI that Chrome/Chromium are flushing out.

Comment 2 Eric Lavarde 2020-11-10 07:39:14 UTC
I don't have this flag in 85, only: #disable-accelerated-video-decode (it was set to "Enabled" which should have meant that video acceleration is disabled) but #enable-accelerated-video-decode appeared indeed in 86, and is set to "Disabled" (given the CPU performance video confs eat, anything else would have been surprising).

The issue doesn't happen with Google Chrome Stable Version 86.0.4240.193 (Official Build) (64-bit). I filtered for "video" among the flags and the settings are exactly the same in Chrome and Chromium. Same thing when filtering on "GPU". And, as already written, the issue wasn't present in Chromium 85.

As a side note, I see only one possibly relevant fix "[M-86][VideoCapture] Handle GPU context lost for the zero-copy path" in the Chrome release notes https://chromium.googlesource.com/chromium/src/+log/86.0.4240.183..86.0.4240.193 (but I have absolutely no clue if it's really relevant).

OK, I remove for now Chrome and roll-back again Chromium... Let me know if I can provide more information.

Comment 3 Torsten Casselt 2020-11-15 20:54:14 UTC
Happens on five different laptops in my family after upgrade to Chromium 86. Same stack traces as already posted. Fedora 32 and 33 mixed. HW acceleration disabled (default since 86 according to changelog) but also tested with enabled.

Reproducer: Go to meet.jit.si and start a meeting, it is free. Most of the time it crashes directly if I trigger the activities (left upper edge), sometimes it works a little longer. Anything that triggers leaving the focus of the Chromium window has a high chance of crashing Chromium. If I can debug anything, just let me know, I have a jitsi server set up.

Comment 4 Torsten Casselt 2020-11-15 20:56:15 UTC
Forgot: Using it on wayland, thus not a X11 issue only.

Comment 5 Tom "spot" Callaway 2020-11-16 16:07:20 UTC
I just made an update for 86.0.4240.198. Please retest with that build.

Comment 6 Torsten Casselt 2020-11-16 20:24:38 UTC
(In reply to Tom "spot" Callaway from comment #5)
> I just made an update for 86.0.4240.198. Please retest with that build.

Tested, it is still crashing.
Since 86 changed hw acceleration code and you had a patch that you used before that, I would like to test a build without vaapi enabled. I looked at the spec file: Could you provide me a build with “use_vaapi 0”?

If that does not fix it I suspect this has something to do with the way the package is built for Fedora since I use my work laptop with Chromium 86.0.4240.75 on Ubuntu just fine with Jitsi. I see that you only built 75 for rawhide. Could you provide me another build of 75 for f33? This might help to narrow it down to code or spec/patch set.

Comment 7 Tom "spot" Callaway 2020-11-16 20:57:20 UTC
Let's start with .198, because older builds have giant security holes (that are being actively exploited).

This scratch build is for Fedora 32 (with vaapi force disabled):

https://koji.fedoraproject.org/koji/taskinfo?taskID=55705265

It will take a while to complete, but you can get the x86_64 builds before the aarch64 builds finish (they are much slower).

Comment 8 Eric Lavarde 2020-11-17 07:53:20 UTC
I also tested with 86.0.4240.198 and it's still crashing but apparently less: before, I just needed to switch window with Alt+Tab and it would crash almost each time, now it's more like only 10% of the time. It's difficult to judge for sure but it seems to depend on the instance: if I restart Chromium and it doesn't crash the first time I switch window, it won't crash after either.

Else, the stack trace looks exactly the same so I don't re-attach it.

The novaapi build for x86_64 isn't yet finished, I'll look into it later.

Comment 9 Gergely Gombos 2020-11-17 09:55:15 UTC
I also have this issue in F33, Chromium 86.0.4240.183. HW acceleration flag is apparently disabled.
Software is GoToMeeting. Same stack trace.

Comment 10 Gergely Gombos 2020-11-17 10:02:34 UTC
Workaround is using Chrome for a while. (I don't experience the bug in 86.0.4240.198 Chrome)

Comment 11 Eric Lavarde 2020-11-17 10:25:45 UTC
Version novaapi installed, no crash so far, but as written above, it didn't happen any more all the time with build 198.

$ rpm -qa | grep chromium
chromium-common-86.0.4240.198-1.fc32.novaapi.x86_64
fedora-chromium-config-1.1-4.fc32.noarch
chromium-86.0.4240.198-1.fc32.novaapi.x86_64

Comment 12 Tom "spot" Callaway 2020-11-17 14:59:49 UTC
Please keep trying to reproduce it in the no vaapi build. If you cannot, then the next step is to try removing the tiny patches I have in there and just leave the default vaapi functionality on (because obviously, people want vaapi).

Comment 13 Gergely Gombos 2020-11-17 16:03:08 UTC
I also got better results with that build. It crashed once but not any more afterwards, even after like 100 windows switches. :)

Comment 14 Tom "spot" Callaway 2020-11-17 19:01:40 UTC
This build (in progress) has just the VAAPI bits that upstream chrome has and the "use_vaapi = true" flag:

https://koji.fedoraproject.org/koji/taskinfo?taskID=55753777

If this is stable (or stable with chrome://flags/#enable-accelerated-video-decode set to "Disabled"), then those patches are to blame. If not, then it's the upstream VAAPI bits.

Comment 15 Torsten Casselt 2020-11-17 21:31:37 UTC
Here are my results: As already mentioned by others, 183 crashed often when switching windows. With 198, first I had the same feeling as others stated, less crashing. I tried to find a reproducer and found it. Disclaimer first: I installed the f32 builds on f33 but this should not make a difference here.

1. Close Chromium
2. Trigger sleep/standby
3. Wake up the system again
4. Open Chromium, go to jitsi
5. Switch windows (overview, win+tab …)

This “works” about 80 to 90 % of the time I’d say. If it did not work, just repeat the steps. It is important to close every Chromium instance before suspending.
If the first crash happened (or not if you are lucky) Chromium will not crash again until you suspend or restart again.

Weird but this is how it is. Reproducible on two laptops.

With the novaapi build at first I thought things would be better. I managed to make it crash though. For this version I’d say about 50 % of the time it crashes. Maybe I just didn’t try enough and was lucky. So the percentage might be the same as in 198 with vaapi.
The build obviously does not throw the “unknown libva error” at start anymore.

Tested with accelerated video decode flag disabled and enabled, makes no difference.

I’d still like to have a 75 build because I see Chromium running without problems on Ubuntu with this build. No worries about the security issues, of course I won’t use it in production. If you don’t want to link it publicly here you could provide it to me by mail. I’d be glad to help by testing.

Comment 16 Gergely Gombos 2020-11-17 22:37:08 UTC
Tom, that build 55753777 seems to be stuck.
Torsten, are you sure that it never crashes before suspend, i.e. after a restart? I experienced crashes before suspend, too.

Comment 17 Torsten Casselt 2020-11-17 23:48:57 UTC
(In reply to Gergely Gombos from comment #16)
> Tom, that build 55753777 seems to be stuck.
> Torsten, are you sure that it never crashes before suspend, i.e. after a
> restart? I experienced crashes before suspend, too.

It also crashes after a restart. For a reproducer it was quicker to suspend though. To me it seems that it happens nearly every time after a hardware initialization.

Does it happen to anyone not on Intel GPU chips? Said five laptops I've seen it on all have Intel but I don't have different hardware to test.

Comment 18 Gergely Gombos 2020-11-18 07:54:05 UTC
Oh, the build wasn't stuck, it just takes that much time to complete. :D
I managed to crash it though for the first time... then it didn't crash any more after a lot of alt-tabbing. Interesting.

Yes, Torsten, for me, crashes happen on nvidia hardware.

Comment 19 Torsten Casselt 2020-11-18 09:08:50 UTC
@spot:

I installed rawhide to an usb stick and tested two builds:

86.0.4240.75-1.fc34   crash
85.0.4183.121-1.fc34  no crash

I don’t need a 75 build anymore, it also crashes. So the problem has to be in the way the package is built for Fedora. Or Canonical patches something else in their Ubuntu builds.

Rawhide has debugging on for Chromium, so if you need the logs, just tell me. I did not see anything different from a working build though. Since it is a SEGV_MAPERR, I did not expect anything to be there anyway.

Comment 20 Torsten Casselt 2020-11-18 11:58:13 UTC
(In reply to Tom "spot" Callaway from comment #14)
> This build (in progress) has just the VAAPI bits that upstream chrome has
> and the "use_vaapi = true" flag:
> 
> https://koji.fedoraproject.org/koji/taskinfo?taskID=55753777
> 
> If this is stable (or stable with
> chrome://flags/#enable-accelerated-video-decode set to "Disabled"), then
> those patches are to blame. If not, then it's the upstream VAAPI bits.

Tested this build and it crashes as often as 183. For me it is a step back.

Builds tested so far:

85.0.4183.121-1.fc34          no crash
86.0.4240.75-1.fc34           crashes often
86.0.4240.111-1.fc33          crashes often
86.0.4240.183-1.fc33          crashes often
86.0.4240.198-1.fc33          crashes once after restart/suspend (80–90% of the time)
86.0.4240.198-1.fc32.novaapi  crashes once after restart/suspend (maybe a bit less than normal 198)
86.0.4240.198-1.fc32.vaapi    crashes often

For me this now seems unrelated to vaapi. Did you change anything else in the build process or patch sets with the switch to 86?

Comment 21 Tom "spot" Callaway 2020-11-18 15:42:28 UTC
I did not change anything in the general build process, however, the upstream code change across major versions is HUGE. Since we build with gcc (and upstream doesn't test building with gcc, only their own fork of llvm), it is plausible that something either in the patchset that we apply to fix gcc issues (you can see the main set of those patches here: https://github.com/stha09/chromium-patches) or something specific to gcc is to blame.

The challenge with debugging this is compounded by how fast the Chromium upstream moves, they just released 87.0.4280.66 today. They often fix bugs silently that do not make it into their "stable" branch. At this point, I need to focus on getting a 87.0.4280.66 build done. If we're lucky, that build will resolve this issue, and if not, we'll keep troubleshooting.

Comment 22 Tom "spot" Callaway 2020-11-18 15:52:11 UTC
Also, FWIW, I can reproduce this (86.0.4240.198-1.fc33, with meet.jit.si). Stacktrace for reference (basically identical to Eric's):

Received signal 11 SEGV_MAPERR 000000000000
#0 0x562f787f4269 base::debug::CollectStackTrace()
#1 0x562f7875da46 base::debug::StackTrace::StackTrace()
#2 0x562f787f3ca9 base::debug::(anonymous namespace)::StackDumpSignalHandler()
#3 0x7fec5a0091e0 (/usr/lib64/libpthread-2.32.so+0x141df)
#4 0x562f74fde694 _ZN5mediaL15RoundedDivisionEli.cold
#5 0x562f7911dd3e x11::Connection::Dispatch()
#6 0x562f787f541b base::FileDescriptorWatcher::Controller::RunCallback()
#7 0x562f787bba42 base::TaskAnnotator::RunTask()
#8 0x562f787cf078 base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWorkImpl()
#9 0x562f787cf41e base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWork()
#10 0x562f78819431 base::MessagePumpLibevent::Run()
#11 0x562f787ce523 base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::Run()
#12 0x562f7879f26c base::RunLoop::Run()
#13 0x562f76071c4a content::BrowserProcessSubThread::IOThreadRun()
#14 0x562f787de908 base::Thread::ThreadMain()
#15 0x562f788066c5 base::(anonymous namespace)::ThreadFunc()
#16 0x7fec59ffe3f9 start_thread
#17 0x7fec5830c903 __GI___clone
  r8: 0000000000000000  r9: 0000000000000000 r10: 0000562f8103bce0 r11: 0000000000000293
 r12: 00002508bd9a7630 r13: 00002508be8f22a0 r14: 00007fec4521a250 r15: 00002508bec19b70
  di: 00002508bd9a7620  si: 00007fec4521a250  bp: 00007fec4521a2a0  bx: 00002508be8f2080
  dx: 00000002508be000  ax: 0000562f809b0710  cx: 0000562f810be0c0  sp: 00007fec4521a238
  ip: 0000562f74fde694 efl: 0000000000010293 cgf: 002b000000000033 erf: 0000000000000004
 trp: 000000000000000e msk: 0000000000000000 cr2: 0000000000000000
[end of stack trace]

Comment 23 Tom "spot" Callaway 2020-11-18 16:25:05 UTC
I just found a patch in gentoo's 86 tree that looks promising for this: https://gitweb.gentoo.org/repo/gentoo.git/tree/www-client/chromium/files/chromium-87-xproto-crash.patch 
This change is already applied in 87.0.4280.66, fingers crossed.

Comment 24 Eric Lavarde 2020-11-18 16:54:49 UTC
Even with the novaapi 198 version I just had a crash, again while looking for something in another window, and it wasn't my first videoconf in Chromium since last reboot. Same error message(s) as usual, SEGV_MAPPER etc.

Comment 25 Tom "spot" Callaway 2020-11-18 18:21:30 UTC
Yeah, I'm pretty convinced at this point that this issue has nothing to do with VAAPI.

Comment 26 Tom "spot" Callaway 2020-11-20 16:24:05 UTC
All: Please test this build:

Fedora 32:
https://koji.fedoraproject.org/koji/taskinfo?taskID=55881427
(I linked directly to the x86_64 build here since the aarch64 build is ... still ... building ...)

Fedora 33:
https://koji.fedoraproject.org/koji/buildinfo?buildID=1643411

In my local tests, I can no longer reproduce this crash.

Comment 27 Krzysztof Adamski 2020-11-20 17:14:14 UTC
I had a 100% reproduction ration on a 86.0.4240.183-1.fc33 and a web version of Microsoft Teams. It was enough to just change to another workspace on my i3wm and it crashed. I cannot seem to be able to reproduce this on 87.0.4280.66-1.fc33 any more. I will be able to try harder on Monday when I will be having some meetings.

Thank you!

Comment 28 Torsten Casselt 2020-11-20 19:09:57 UTC
Tested and my reproducer does not work anymore either. I was not able to provoke a crash.

Thanks for the build!

Comment 29 Fedora Update System 2020-11-20 22:48:10 UTC
FEDORA-2020-10ec8aca61 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-10ec8aca61

Comment 30 Fedora Update System 2020-11-20 22:48:11 UTC
FEDORA-2020-3e005ce2e0 has been submitted as an update to Fedora 32. https://bodhi.fedoraproject.org/updates/FEDORA-2020-3e005ce2e0

Comment 31 Fedora Update System 2020-11-21 02:44:05 UTC
FEDORA-2020-10ec8aca61 has been pushed to the Fedora 33 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-10ec8aca61`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-10ec8aca61

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 32 Fedora Update System 2020-11-21 02:46:34 UTC
FEDORA-2020-3e005ce2e0 has been pushed to the Fedora 32 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-3e005ce2e0`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-3e005ce2e0

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 33 Fedora Update System 2020-11-22 01:25:07 UTC
FEDORA-2020-10ec8aca61 has been pushed to the Fedora 33 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 34 Fedora Update System 2020-11-24 01:22:42 UTC
FEDORA-2020-3e005ce2e0 has been pushed to the Fedora 32 stable repository.
If problem still persists, please make note of it in this bug report.