Bug 1895920
Summary: | Chromium 86 crashes on WebRTC videos when switching window | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Eric Lavarde <elavarde> |
Component: | chromium | Assignee: | Tom "spot" Callaway <spotrh> |
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 32 | CC: | blu2lz, dietrich.moerman, gombosg, k, ltoscano, mcl, rday, rgarcia, spotrh, tpopela, yaneti |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | chromium-87.0.4280.66-1.fc33 chromium-87.0.4280.66-1.fc32 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-11-22 01:25:07 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Eric Lavarde
2020-11-09 12:50:43 UTC
Looks like you have hardware accelerated video decoding on. Try turning that off and seeing if the issue goes away: chrome://flags/#enable-accelerated-video-decode It would also be useful to know if you are able to reproduce this in the current stable version of Google Chrome (either with hardware accelerated video decode enabled or disabled). I suspect there are a _LOT_ of bugs in Linux VAAPI that Chrome/Chromium are flushing out. I don't have this flag in 85, only: #disable-accelerated-video-decode (it was set to "Enabled" which should have meant that video acceleration is disabled) but #enable-accelerated-video-decode appeared indeed in 86, and is set to "Disabled" (given the CPU performance video confs eat, anything else would have been surprising). The issue doesn't happen with Google Chrome Stable Version 86.0.4240.193 (Official Build) (64-bit). I filtered for "video" among the flags and the settings are exactly the same in Chrome and Chromium. Same thing when filtering on "GPU". And, as already written, the issue wasn't present in Chromium 85. As a side note, I see only one possibly relevant fix "[M-86][VideoCapture] Handle GPU context lost for the zero-copy path" in the Chrome release notes https://chromium.googlesource.com/chromium/src/+log/86.0.4240.183..86.0.4240.193 (but I have absolutely no clue if it's really relevant). OK, I remove for now Chrome and roll-back again Chromium... Let me know if I can provide more information. Happens on five different laptops in my family after upgrade to Chromium 86. Same stack traces as already posted. Fedora 32 and 33 mixed. HW acceleration disabled (default since 86 according to changelog) but also tested with enabled. Reproducer: Go to meet.jit.si and start a meeting, it is free. Most of the time it crashes directly if I trigger the activities (left upper edge), sometimes it works a little longer. Anything that triggers leaving the focus of the Chromium window has a high chance of crashing Chromium. If I can debug anything, just let me know, I have a jitsi server set up. Forgot: Using it on wayland, thus not a X11 issue only. I just made an update for 86.0.4240.198. Please retest with that build. (In reply to Tom "spot" Callaway from comment #5) > I just made an update for 86.0.4240.198. Please retest with that build. Tested, it is still crashing. Since 86 changed hw acceleration code and you had a patch that you used before that, I would like to test a build without vaapi enabled. I looked at the spec file: Could you provide me a build with “use_vaapi 0”? If that does not fix it I suspect this has something to do with the way the package is built for Fedora since I use my work laptop with Chromium 86.0.4240.75 on Ubuntu just fine with Jitsi. I see that you only built 75 for rawhide. Could you provide me another build of 75 for f33? This might help to narrow it down to code or spec/patch set. Let's start with .198, because older builds have giant security holes (that are being actively exploited). This scratch build is for Fedora 32 (with vaapi force disabled): https://koji.fedoraproject.org/koji/taskinfo?taskID=55705265 It will take a while to complete, but you can get the x86_64 builds before the aarch64 builds finish (they are much slower). I also tested with 86.0.4240.198 and it's still crashing but apparently less: before, I just needed to switch window with Alt+Tab and it would crash almost each time, now it's more like only 10% of the time. It's difficult to judge for sure but it seems to depend on the instance: if I restart Chromium and it doesn't crash the first time I switch window, it won't crash after either. Else, the stack trace looks exactly the same so I don't re-attach it. The novaapi build for x86_64 isn't yet finished, I'll look into it later. I also have this issue in F33, Chromium 86.0.4240.183. HW acceleration flag is apparently disabled. Software is GoToMeeting. Same stack trace. Workaround is using Chrome for a while. (I don't experience the bug in 86.0.4240.198 Chrome) Version novaapi installed, no crash so far, but as written above, it didn't happen any more all the time with build 198. $ rpm -qa | grep chromium chromium-common-86.0.4240.198-1.fc32.novaapi.x86_64 fedora-chromium-config-1.1-4.fc32.noarch chromium-86.0.4240.198-1.fc32.novaapi.x86_64 Please keep trying to reproduce it in the no vaapi build. If you cannot, then the next step is to try removing the tiny patches I have in there and just leave the default vaapi functionality on (because obviously, people want vaapi). I also got better results with that build. It crashed once but not any more afterwards, even after like 100 windows switches. :) This build (in progress) has just the VAAPI bits that upstream chrome has and the "use_vaapi = true" flag: https://koji.fedoraproject.org/koji/taskinfo?taskID=55753777 If this is stable (or stable with chrome://flags/#enable-accelerated-video-decode set to "Disabled"), then those patches are to blame. If not, then it's the upstream VAAPI bits. Here are my results: As already mentioned by others, 183 crashed often when switching windows. With 198, first I had the same feeling as others stated, less crashing. I tried to find a reproducer and found it. Disclaimer first: I installed the f32 builds on f33 but this should not make a difference here. 1. Close Chromium 2. Trigger sleep/standby 3. Wake up the system again 4. Open Chromium, go to jitsi 5. Switch windows (overview, win+tab …) This “works” about 80 to 90 % of the time I’d say. If it did not work, just repeat the steps. It is important to close every Chromium instance before suspending. If the first crash happened (or not if you are lucky) Chromium will not crash again until you suspend or restart again. Weird but this is how it is. Reproducible on two laptops. With the novaapi build at first I thought things would be better. I managed to make it crash though. For this version I’d say about 50 % of the time it crashes. Maybe I just didn’t try enough and was lucky. So the percentage might be the same as in 198 with vaapi. The build obviously does not throw the “unknown libva error” at start anymore. Tested with accelerated video decode flag disabled and enabled, makes no difference. I’d still like to have a 75 build because I see Chromium running without problems on Ubuntu with this build. No worries about the security issues, of course I won’t use it in production. If you don’t want to link it publicly here you could provide it to me by mail. I’d be glad to help by testing. Tom, that build 55753777 seems to be stuck. Torsten, are you sure that it never crashes before suspend, i.e. after a restart? I experienced crashes before suspend, too. (In reply to Gergely Gombos from comment #16) > Tom, that build 55753777 seems to be stuck. > Torsten, are you sure that it never crashes before suspend, i.e. after a > restart? I experienced crashes before suspend, too. It also crashes after a restart. For a reproducer it was quicker to suspend though. To me it seems that it happens nearly every time after a hardware initialization. Does it happen to anyone not on Intel GPU chips? Said five laptops I've seen it on all have Intel but I don't have different hardware to test. Oh, the build wasn't stuck, it just takes that much time to complete. :D I managed to crash it though for the first time... then it didn't crash any more after a lot of alt-tabbing. Interesting. Yes, Torsten, for me, crashes happen on nvidia hardware. @spot: I installed rawhide to an usb stick and tested two builds: 86.0.4240.75-1.fc34 crash 85.0.4183.121-1.fc34 no crash I don’t need a 75 build anymore, it also crashes. So the problem has to be in the way the package is built for Fedora. Or Canonical patches something else in their Ubuntu builds. Rawhide has debugging on for Chromium, so if you need the logs, just tell me. I did not see anything different from a working build though. Since it is a SEGV_MAPERR, I did not expect anything to be there anyway. (In reply to Tom "spot" Callaway from comment #14) > This build (in progress) has just the VAAPI bits that upstream chrome has > and the "use_vaapi = true" flag: > > https://koji.fedoraproject.org/koji/taskinfo?taskID=55753777 > > If this is stable (or stable with > chrome://flags/#enable-accelerated-video-decode set to "Disabled"), then > those patches are to blame. If not, then it's the upstream VAAPI bits. Tested this build and it crashes as often as 183. For me it is a step back. Builds tested so far: 85.0.4183.121-1.fc34 no crash 86.0.4240.75-1.fc34 crashes often 86.0.4240.111-1.fc33 crashes often 86.0.4240.183-1.fc33 crashes often 86.0.4240.198-1.fc33 crashes once after restart/suspend (80–90% of the time) 86.0.4240.198-1.fc32.novaapi crashes once after restart/suspend (maybe a bit less than normal 198) 86.0.4240.198-1.fc32.vaapi crashes often For me this now seems unrelated to vaapi. Did you change anything else in the build process or patch sets with the switch to 86? I did not change anything in the general build process, however, the upstream code change across major versions is HUGE. Since we build with gcc (and upstream doesn't test building with gcc, only their own fork of llvm), it is plausible that something either in the patchset that we apply to fix gcc issues (you can see the main set of those patches here: https://github.com/stha09/chromium-patches) or something specific to gcc is to blame. The challenge with debugging this is compounded by how fast the Chromium upstream moves, they just released 87.0.4280.66 today. They often fix bugs silently that do not make it into their "stable" branch. At this point, I need to focus on getting a 87.0.4280.66 build done. If we're lucky, that build will resolve this issue, and if not, we'll keep troubleshooting. Also, FWIW, I can reproduce this (86.0.4240.198-1.fc33, with meet.jit.si). Stacktrace for reference (basically identical to Eric's): Received signal 11 SEGV_MAPERR 000000000000 #0 0x562f787f4269 base::debug::CollectStackTrace() #1 0x562f7875da46 base::debug::StackTrace::StackTrace() #2 0x562f787f3ca9 base::debug::(anonymous namespace)::StackDumpSignalHandler() #3 0x7fec5a0091e0 (/usr/lib64/libpthread-2.32.so+0x141df) #4 0x562f74fde694 _ZN5mediaL15RoundedDivisionEli.cold #5 0x562f7911dd3e x11::Connection::Dispatch() #6 0x562f787f541b base::FileDescriptorWatcher::Controller::RunCallback() #7 0x562f787bba42 base::TaskAnnotator::RunTask() #8 0x562f787cf078 base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWorkImpl() #9 0x562f787cf41e base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWork() #10 0x562f78819431 base::MessagePumpLibevent::Run() #11 0x562f787ce523 base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::Run() #12 0x562f7879f26c base::RunLoop::Run() #13 0x562f76071c4a content::BrowserProcessSubThread::IOThreadRun() #14 0x562f787de908 base::Thread::ThreadMain() #15 0x562f788066c5 base::(anonymous namespace)::ThreadFunc() #16 0x7fec59ffe3f9 start_thread #17 0x7fec5830c903 __GI___clone r8: 0000000000000000 r9: 0000000000000000 r10: 0000562f8103bce0 r11: 0000000000000293 r12: 00002508bd9a7630 r13: 00002508be8f22a0 r14: 00007fec4521a250 r15: 00002508bec19b70 di: 00002508bd9a7620 si: 00007fec4521a250 bp: 00007fec4521a2a0 bx: 00002508be8f2080 dx: 00000002508be000 ax: 0000562f809b0710 cx: 0000562f810be0c0 sp: 00007fec4521a238 ip: 0000562f74fde694 efl: 0000000000010293 cgf: 002b000000000033 erf: 0000000000000004 trp: 000000000000000e msk: 0000000000000000 cr2: 0000000000000000 [end of stack trace] I just found a patch in gentoo's 86 tree that looks promising for this: https://gitweb.gentoo.org/repo/gentoo.git/tree/www-client/chromium/files/chromium-87-xproto-crash.patch This change is already applied in 87.0.4280.66, fingers crossed. Even with the novaapi 198 version I just had a crash, again while looking for something in another window, and it wasn't my first videoconf in Chromium since last reboot. Same error message(s) as usual, SEGV_MAPPER etc. Yeah, I'm pretty convinced at this point that this issue has nothing to do with VAAPI. All: Please test this build: Fedora 32: https://koji.fedoraproject.org/koji/taskinfo?taskID=55881427 (I linked directly to the x86_64 build here since the aarch64 build is ... still ... building ...) Fedora 33: https://koji.fedoraproject.org/koji/buildinfo?buildID=1643411 In my local tests, I can no longer reproduce this crash. I had a 100% reproduction ration on a 86.0.4240.183-1.fc33 and a web version of Microsoft Teams. It was enough to just change to another workspace on my i3wm and it crashed. I cannot seem to be able to reproduce this on 87.0.4280.66-1.fc33 any more. I will be able to try harder on Monday when I will be having some meetings. Thank you! Tested and my reproducer does not work anymore either. I was not able to provoke a crash. Thanks for the build! FEDORA-2020-10ec8aca61 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-10ec8aca61 FEDORA-2020-3e005ce2e0 has been submitted as an update to Fedora 32. https://bodhi.fedoraproject.org/updates/FEDORA-2020-3e005ce2e0 FEDORA-2020-10ec8aca61 has been pushed to the Fedora 33 testing repository. In short time you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-10ec8aca61` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-10ec8aca61 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. FEDORA-2020-3e005ce2e0 has been pushed to the Fedora 32 testing repository. In short time you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-3e005ce2e0` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-3e005ce2e0 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. FEDORA-2020-10ec8aca61 has been pushed to the Fedora 33 stable repository. If problem still persists, please make note of it in this bug report. FEDORA-2020-3e005ce2e0 has been pushed to the Fedora 32 stable repository. If problem still persists, please make note of it in this bug report. |