Bug 1989726 - [abrt] gnome-shell: cogl_texture_get_gl_texture(): gnome-shell killed by SIGSEGV
Summary: [abrt] gnome-shell: cogl_texture_get_gl_texture(): gnome-shell killed by SIGSEGV
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: mesa
Version: 35
Hardware: aarch64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Karol Herbst
QA Contact: Fedora Extras Quality Assurance
URL: https://retrace.fedoraproject.org/faf...
Whiteboard: abrt_hash:dfc94f771c2602d4e252ae744ce...
Depends On:
Blocks: ARMTracker F35FinalBlocker
TreeView+ depends on / blocked
 
Reported: 2021-08-03 19:42 UTC by Paul Whalen
Modified: 2021-10-21 15:29 UTC (History)
23 users (show)

Fixed In Version: mesa-21.2.3-6.fc35
Clone Of:
Environment:
Last Closed: 2021-10-13 23:13:09 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
File: backtrace (136.98 KB, text/plain)
2021-08-03 19:42 UTC, Paul Whalen
no flags Details
File: core_backtrace (46.86 KB, text/plain)
2021-08-03 19:42 UTC, Paul Whalen
no flags Details
File: cpuinfo (1.22 KB, text/plain)
2021-08-03 19:42 UTC, Paul Whalen
no flags Details
File: dso_list (623 bytes, text/plain)
2021-08-03 19:42 UTC, Paul Whalen
no flags Details
File: environ (1.07 KB, text/plain)
2021-08-03 19:42 UTC, Paul Whalen
no flags Details
File: exploitable (82 bytes, text/plain)
2021-08-03 19:42 UTC, Paul Whalen
no flags Details
File: limits (1.29 KB, text/plain)
2021-08-03 19:42 UTC, Paul Whalen
no flags Details
File: maps (3.90 KB, text/plain)
2021-08-03 19:42 UTC, Paul Whalen
no flags Details
File: mountinfo (2.85 KB, text/plain)
2021-08-03 19:42 UTC, Paul Whalen
no flags Details
File: open_fds (6.97 KB, text/plain)
2021-08-03 19:43 UTC, Paul Whalen
no flags Details
File: proc_pid_status (1.22 KB, text/plain)
2021-08-03 19:43 UTC, Paul Whalen
no flags Details
File: var_log_messages (5.02 KB, text/plain)
2021-08-03 19:43 UTC, Paul Whalen
no flags Details
COGL_DEBUG.txt (894.06 KB, text/plain)
2021-09-07 22:48 UTC, Paul Whalen
no flags Details
COGL_DRIVER.gles2.txt (852.38 KB, text/plain)
2021-09-07 22:49 UTC, Paul Whalen
no flags Details
Patches to make Tegra work on Fedora 35 again (6.42 KB, application/x-xz)
2021-10-12 11:54 UTC, Karol Herbst
no flags Details

Description Paul Whalen 2021-08-03 19:42:45 UTC
Description of problem:
Boot disk image Fedora-Workstation-Rawhide-20210731.n.0.aarch64.raw on the Jetson Nano.

Version-Release number of selected component:
gnome-shell-40.3-2.fc35

Additional info:
reporter:       libreport-2.15.2
backtrace_rating: 4
cgroup:         0::/user.slice/user-984.slice/session-c1.scope
cmdline:        /usr/bin/gnome-shell
crash_function: cogl_texture_get_gl_texture
executable:     /usr/bin/gnome-shell
journald_cursor: s=99955af9029242668de5c893ee502444;i=11a1;b=2316346d19cc4428ac60716a8f23631a;m=549904c;t=5c8abb12bee8d;x=fa1b9aee751032e6
kernel:         5.14.0-0.rc4.36.fc35.aarch64
rootdir:        /
runlevel:       N 5
type:           CCpp
uid:            984

Truncated backtrace:
Thread no. 1 (10 frames)
 #0 cogl_texture_get_gl_texture at ../cogl/cogl/cogl-texture.c:315
 #1 flush_layers_common_gl_state_cb at ../cogl/cogl/driver/gl/cogl-pipeline-opengl.c:562
 #2 _cogl_pipeline_foreach_layer_internal at ../cogl/cogl/cogl-pipeline.c:511
 #3 _cogl_pipeline_flush_common_gl_state at ../cogl/cogl/driver/gl/cogl-pipeline-opengl.c:648
 #4 _cogl_pipeline_flush_gl_state at ../cogl/cogl/driver/gl/cogl-pipeline-opengl.c:1039
 #5 _cogl_gl_flush_attributes_state at ../cogl/cogl/driver/gl/cogl-attribute-gl.c:256
 #6 _cogl_flush_attributes_state at ../cogl/cogl/cogl-attribute.c:626
 #7 cogl_gl_framebuffer_draw_attributes at ../cogl/cogl/driver/gl/cogl-framebuffer-gl.c:332
 #8 cogl_framebuffer_driver_draw_attributes at ../cogl/cogl/cogl-framebuffer-driver.c:113
 #9 _cogl_framebuffer_draw_attributes at ../cogl/cogl/cogl-framebuffer.c:2449

Comment 1 Paul Whalen 2021-08-03 19:42:49 UTC
Created attachment 1810593 [details]
File: backtrace

Comment 2 Paul Whalen 2021-08-03 19:42:51 UTC
Created attachment 1810594 [details]
File: core_backtrace

Comment 3 Paul Whalen 2021-08-03 19:42:52 UTC
Created attachment 1810595 [details]
File: cpuinfo

Comment 4 Paul Whalen 2021-08-03 19:42:53 UTC
Created attachment 1810596 [details]
File: dso_list

Comment 5 Paul Whalen 2021-08-03 19:42:54 UTC
Created attachment 1810597 [details]
File: environ

Comment 6 Paul Whalen 2021-08-03 19:42:55 UTC
Created attachment 1810598 [details]
File: exploitable

Comment 7 Paul Whalen 2021-08-03 19:42:57 UTC
Created attachment 1810599 [details]
File: limits

Comment 8 Paul Whalen 2021-08-03 19:42:58 UTC
Created attachment 1810600 [details]
File: maps

Comment 9 Paul Whalen 2021-08-03 19:42:59 UTC
Created attachment 1810601 [details]
File: mountinfo

Comment 10 Paul Whalen 2021-08-03 19:43:00 UTC
Created attachment 1810602 [details]
File: open_fds

Comment 11 Paul Whalen 2021-08-03 19:43:02 UTC
Created attachment 1810603 [details]
File: proc_pid_status

Comment 12 Paul Whalen 2021-08-03 19:43:03 UTC
Created attachment 1810604 [details]
File: var_log_messages

Comment 13 Ben Cotton 2021-08-10 13:34:24 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 35 development cycle.
Changing version to 35.

Comment 14 Fedora Blocker Bugs Application 2021-08-23 11:45:17 UTC
Proposed as a Blocker for 35-beta by Fedora user pbrobinson using the blocker tracking app because:

 GNOME is crashing on the Jetson nano which is an aarch64 blocking device

Comment 15 Adam Williamson 2021-08-26 19:07:29 UTC
+3 in https://pagure.io/fedora-qa/blocker-review/issue/399 , marking accepted.

Comment 16 Adam Williamson 2021-08-30 17:12:18 UTC
mclasen: according to fmuellner: 'driver issue or mutter bug' -> reassigning to mutter for now.

Comment 17 Jonas Ådahl 2021-08-30 18:23:54 UTC
Is it crashing at launch 100% of the times or are there reproduction steps?

Comment 18 Peter Robinson 2021-08-30 21:02:46 UTC
(In reply to Jonas Ådahl from comment #17)
> Is it crashing at launch 100% of the times or are there reproduction steps?

100% of the attempts I've had.

Comment 19 Jonas Ådahl 2021-08-31 07:54:34 UTC
Faked the errors that were logged in the journal and could reproduce a infinite loop that was a result of a use-after-free. Fixed by https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1979.

Comment 20 Jonas Ådahl 2021-08-31 08:35:31 UTC
Scratch build with the above mentioned MR applied: https://koji.fedoraproject.org/koji/taskinfo?taskID=74843091

Comment 21 Peter Robinson 2021-08-31 14:46:33 UTC
(In reply to Jonas Ådahl from comment #20)
> Scratch build with the above mentioned MR applied:
> https://koji.fedoraproject.org/koji/taskinfo?taskID=74843091

Still crashing, but it's not seeing it as a dupe so I wonder if it's different:
https://bugzilla.redhat.com/show_bug.cgi?id=1999681

Comment 22 Jonas Ådahl 2021-08-31 15:14:16 UTC
I wonder, does setting MUTTER_DEBUG_USE_KMS_MODIFIERS=1 in the environment have any effect?

Comment 23 Paul Whalen 2021-08-31 17:34:48 UTC
(In reply to Jonas Ådahl from comment #22)
> I wonder, does setting MUTTER_DEBUG_USE_KMS_MODIFIERS=1 in the environment
> have any effect?

None that I see, still crashing with "Aug 31 13:26:03 nano abrt-notification[3280]: [🡕] Process 1229 (gnome-shell) crashed in cogl_texture_get_gl_texture()"

Comment 24 Jonas Ådahl 2021-09-07 13:57:08 UTC
So this seems more and more like a driver issue, the crash I see is that the fallback OpenGL texture doesn't exist. I can reproduce the same backtrace by simply fake-failing creating that fallback texture. The fallback texture is a 1x1 premultiplied RGBA8888 filled with the pixel 0xffffffff, and it's allocated very early.

What does running with COGL_DEBUG=all say? Maybe COGL_DRIVER=gles2 has some effect?

Comment 25 Jonas Ådahl 2021-09-07 15:28:35 UTC
Created https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1994 but it will not fix anything, as the issue still seems to be the inability to allocate the 1x1 texture. With that MR the error message from the allocation failure can be seen, but I assume it will say "out of memory" as all the others seems to do.

Comment 26 Adam Williamson 2021-09-07 17:00:22 UTC
So if it's a driver issue, should we reassign to kernel? mesa? what driver is in question exactly here? CCing the regular graphics suspect...

Comment 27 Jonas Ådahl 2021-09-07 17:55:29 UTC
I would start with mesa first. The driver is tegra/nouveau.

Comment 28 Adam Williamson 2021-09-07 18:57:10 UTC
OK, let's try that for now.

Comment 29 Paul Whalen 2021-09-07 19:04:08 UTC
Created attachment 1821312 [details]
COGL_DEBUG.txt

Comment 30 Paul Whalen 2021-09-07 19:06:52 UTC
Created attachment 1821313 [details]
COGL_DRIVER.gles2.txt

Comment 31 Paul Whalen 2021-09-07 19:08:20 UTC
(In reply to Jonas Ådahl from comment #24)

> What does running with COGL_DEBUG=all say? Maybe COGL_DRIVER=gles2 has some
> effect?

Added logs. No change that I noticed.

Comment 32 Jonas Ådahl 2021-09-07 20:27:31 UTC
(In reply to Paul Whalen from comment #31)
> (In reply to Jonas Ådahl from comment #24)
> 
> > What does running with COGL_DEBUG=all say? Maybe COGL_DRIVER=gles2 has some
> > effect?
> 
> Added logs. No change that I noticed.

How were these env vars set? It doesn't seem like they had any effect. Note that they have to be read at login time by gnome-shell, so place them in e.g. /etc/environment and reboot.

Comment 33 Paul Whalen 2021-09-07 22:48:59 UTC
Created attachment 1821392 [details]
COGL_DEBUG.txt

Comment 34 Paul Whalen 2021-09-07 22:49:38 UTC
Created attachment 1821393 [details]
COGL_DRIVER.gles2.txt

Comment 35 Paul Whalen 2021-09-07 22:50:19 UTC
Sorry, logs updated.

Comment 36 Adam Williamson 2021-09-17 17:05:45 UTC
Hi Karol, is there any news on this? It's one of the last two blockers for F35 Beta at present (and we may be able to fudge the other one, possibly). Thanks!

Comment 37 Karol Herbst 2021-09-17 21:50:32 UTC
(In reply to Adam Williamson from comment #36)
> Hi Karol, is there any news on this? It's one of the last two blockers for
> F35 Beta at present (and we may be able to fudge the other one, possibly).
> Thanks!

Yeah. So.. I figured out what is regressing this and reverting those patches I was starting to hit memory corruption bugs. The corruptions I figured out, but for the regression I am still unsure what to do about it. I pinged Thierry Reding, who is more or less maintaining the Tegra driver bits inside mesa about this and I might have to further discuss this issue with Thierry.

Comment 38 Adam Williamson 2021-09-17 22:10:52 UTC
For the short term, could we go with reverting the thing that causes the regression and fixing the corruption bugs, if you have those figured out?

Comment 39 Karol Herbst 2021-09-17 22:39:37 UTC
(In reply to Adam Williamson from comment #38)
> For the short term, could we go with reverting the thing that causes the
> regression and fixing the corruption bugs, if you have those figured out?

I think so, the patches to fix the corruptions are not perfect yet as I was focusing on identifying the problems. Next week without XDC in the way I will have more time to spend time on this, so next week I should have a working solution for this, just not the one we might want to have long term. Is there a hard deadline you want a solution for this?

Comment 40 Adam Williamson 2021-09-17 23:55:05 UTC
Well, yes :D As I mentioned, it's blocking the Fedora 35 beta release. We already missed one date for that, and the next go/no-go is Thursday, which means we need a tested RC by then to ship. That means we need to build it by Tuesday at the latest. So, Tuesday is kinda the deadline I'm working with. If we miss that we wind up slipping F35 Beta by another week. Sorry for the tight timeline.

Comment 41 Adam Williamson 2021-09-21 19:45:49 UTC
Brief update here - fixing this is difficult and unlikely to be viable in a reasonable time. We're going to roll an RC for Beta without the fix, and discuss whether to waive this as a blocker. If we do, we'll have to message that graphics on Jetson are not working properly for Beta.

Comment 42 Ben Cotton 2021-09-23 18:01:54 UTC
Moving to F35 Final Blocker as it was waived in today's F35 Beta Go/No-Go meeting: https://meetbot.fedoraproject.org/fedora-meeting/2021-09-23/f35-beta-go_no_go-meeting.2021-09-23-17.00.log.html#l-99

Comment 43 Karol Herbst 2021-10-07 14:36:35 UTC
Okay, we now have fixes to make this all work again.

First we need to revert https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3724 and I think the easiest is to revert the patches in Fedora until we have a proper solution upstream (either revert or fixing it) but it's not clear yet what that will be.

Fixing this will solve the initially reported issue here, but then gnome will constantly segfault due to a massive amount of use-after-free accesses which will be fixed by this MR upstream: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13231

So I suggest we merge the later MR (the patches look quite good so I expect this to happen this week) pull those patches in and revert the former MR to get at least this bug out of the way.

Sadly I still didn't find time to get push access to Fedora, so somebody else has to do the final steps. I can prepare the patches though.

Comment 44 Ben Cotton 2021-10-07 15:53:14 UTC
(In reply to Karol Herbst from comment #43)

> Sadly I still didn't find time to get push access to Fedora, so somebody
> else has to do the final steps. I can prepare the patches though.

Great! If you can get the patches prepared, I'll push the mesa maintainers in Fedora to make sure they get applied.

Comment 45 Adam Williamson 2021-10-07 22:29:17 UTC
Thanks a lot Karol! I can merge the fixes if need be, but I expect Pete or Peter will be able to :)

Comment 46 Karol Herbst 2021-10-12 11:54:29 UTC
Created attachment 1832192 [details]
Patches to make Tegra work on Fedora 35 again

Sadly we didn't get to merge it upstream yet, as there are some minor concerns from my side for the patches. But I do think those are probably not relevant for fixing the core issue and I didn't hit any issues with my testing at least.

Comment 47 Adam Williamson 2021-10-12 18:32:53 UTC
awesome, thanks very much. I'll try and get those merged.

Comment 48 Fedora Update System 2021-10-12 19:28:16 UTC
FEDORA-2021-236569b607 has been submitted as an update to Fedora 35. https://bodhi.fedoraproject.org/updates/FEDORA-2021-236569b607

Comment 49 Fedora Update System 2021-10-13 18:52:35 UTC
FEDORA-2021-236569b607 has been pushed to the Fedora 35 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-236569b607`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-236569b607

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 50 Fedora Update System 2021-10-13 23:13:09 UTC
FEDORA-2021-236569b607 has been pushed to the Fedora 35 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 51 Fabio Valentini 2021-10-21 13:09:22 UTC
I tried to run some pre-release tests of Fedora 35 Workstation on my Jetson Nano yesterday, but on a fully updated F35 Workstation install, I still see gnome-shell crashes that prevent even gdm from starting.

Is it possible that this bug was not fixed with the last mesa update after all?
(My install might also be borked, but the gdm logs look suspicious.)

Comment 52 Paul Whalen 2021-10-21 13:46:16 UTC
(In reply to Fabio Valentini from comment #51)
> I tried to run some pre-release tests of Fedora 35 Workstation on my Jetson
> Nano yesterday, but on a fully updated F35 Workstation install, I still see
> gnome-shell crashes that prevent even gdm from starting.
> 
> Is it possible that this bug was not fixed with the last mesa update after
> all?
> (My install might also be borked, but the gdm logs look suspicious.)

Could you try RC1.1 - https://kojipkgs.fedoraproject.org/compose/35/Fedora-35-20211020.0/compose/Workstation/aarch64/images/Fedora-Workstation-35-1.1.aarch64.raw.xz

There is still some screen tearing, but it seems to work as well as it did in F34.

Comment 53 Karol Herbst 2021-10-21 15:28:34 UTC
(In reply to Fabio Valentini from comment #51)
> I tried to run some pre-release tests of Fedora 35 Workstation on my Jetson
> Nano yesterday, but on a fully updated F35 Workstation install, I still see
> gnome-shell crashes that prevent even gdm from starting.
> 
> Is it possible that this bug was not fixed with the last mesa update after
> all?
> (My install might also be borked, but the gdm logs look suspicious.)

I just tried it out on my jetson nano and everything seems to work alright with the packages provided by dnf. Built from mesa-21.2.4-1.fc35.src.rpm here.

Comment 54 Karol Herbst 2021-10-21 15:29:44 UTC
(In reply to Paul Whalen from comment #52)
> (In reply to Fabio Valentini from comment #51)
> > I tried to run some pre-release tests of Fedora 35 Workstation on my Jetson
> > Nano yesterday, but on a fully updated F35 Workstation install, I still see
> > gnome-shell crashes that prevent even gdm from starting.
> > 
> > Is it possible that this bug was not fixed with the last mesa update after
> > all?
> > (My install might also be borked, but the gdm logs look suspicious.)
> 
> Could you try RC1.1 -
> https://kojipkgs.fedoraproject.org/compose/35/Fedora-35-20211020.0/compose/
> Workstation/aarch64/images/Fedora-Workstation-35-1.1.aarch64.raw.xz
> 
> There is still some screen tearing, but it seems to work as well as it did
> in F34.

Yeah, I discussed this tearing issue with Thierry on multiple occasions, and it's not clear what the solutions should be here. It might be that we need to add a new UAPI to nouveau to handle this properly.


Note You need to log in before you can comment on or make changes to this bug.