Description of problem: Boot disk image Fedora-Workstation-Rawhide-20210731.n.0.aarch64.raw on the Jetson Nano. Version-Release number of selected component: gnome-shell-40.3-2.fc35 Additional info: reporter: libreport-2.15.2 backtrace_rating: 4 cgroup: 0::/user.slice/user-984.slice/session-c1.scope cmdline: /usr/bin/gnome-shell crash_function: cogl_texture_get_gl_texture executable: /usr/bin/gnome-shell journald_cursor: s=99955af9029242668de5c893ee502444;i=11a1;b=2316346d19cc4428ac60716a8f23631a;m=549904c;t=5c8abb12bee8d;x=fa1b9aee751032e6 kernel: 5.14.0-0.rc4.36.fc35.aarch64 rootdir: / runlevel: N 5 type: CCpp uid: 984 Truncated backtrace: Thread no. 1 (10 frames) #0 cogl_texture_get_gl_texture at ../cogl/cogl/cogl-texture.c:315 #1 flush_layers_common_gl_state_cb at ../cogl/cogl/driver/gl/cogl-pipeline-opengl.c:562 #2 _cogl_pipeline_foreach_layer_internal at ../cogl/cogl/cogl-pipeline.c:511 #3 _cogl_pipeline_flush_common_gl_state at ../cogl/cogl/driver/gl/cogl-pipeline-opengl.c:648 #4 _cogl_pipeline_flush_gl_state at ../cogl/cogl/driver/gl/cogl-pipeline-opengl.c:1039 #5 _cogl_gl_flush_attributes_state at ../cogl/cogl/driver/gl/cogl-attribute-gl.c:256 #6 _cogl_flush_attributes_state at ../cogl/cogl/cogl-attribute.c:626 #7 cogl_gl_framebuffer_draw_attributes at ../cogl/cogl/driver/gl/cogl-framebuffer-gl.c:332 #8 cogl_framebuffer_driver_draw_attributes at ../cogl/cogl/cogl-framebuffer-driver.c:113 #9 _cogl_framebuffer_draw_attributes at ../cogl/cogl/cogl-framebuffer.c:2449
Created attachment 1810593 [details] File: backtrace
Created attachment 1810594 [details] File: core_backtrace
Created attachment 1810595 [details] File: cpuinfo
Created attachment 1810596 [details] File: dso_list
Created attachment 1810597 [details] File: environ
Created attachment 1810598 [details] File: exploitable
Created attachment 1810599 [details] File: limits
Created attachment 1810600 [details] File: maps
Created attachment 1810601 [details] File: mountinfo
Created attachment 1810602 [details] File: open_fds
Created attachment 1810603 [details] File: proc_pid_status
Created attachment 1810604 [details] File: var_log_messages
This bug appears to have been reported against 'rawhide' during the Fedora 35 development cycle. Changing version to 35.
Proposed as a Blocker for 35-beta by Fedora user pbrobinson using the blocker tracking app because: GNOME is crashing on the Jetson nano which is an aarch64 blocking device
+3 in https://pagure.io/fedora-qa/blocker-review/issue/399 , marking accepted.
mclasen: according to fmuellner: 'driver issue or mutter bug' -> reassigning to mutter for now.
Is it crashing at launch 100% of the times or are there reproduction steps?
(In reply to Jonas Ådahl from comment #17) > Is it crashing at launch 100% of the times or are there reproduction steps? 100% of the attempts I've had.
Faked the errors that were logged in the journal and could reproduce a infinite loop that was a result of a use-after-free. Fixed by https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1979.
Scratch build with the above mentioned MR applied: https://koji.fedoraproject.org/koji/taskinfo?taskID=74843091
(In reply to Jonas Ådahl from comment #20) > Scratch build with the above mentioned MR applied: > https://koji.fedoraproject.org/koji/taskinfo?taskID=74843091 Still crashing, but it's not seeing it as a dupe so I wonder if it's different: https://bugzilla.redhat.com/show_bug.cgi?id=1999681
I wonder, does setting MUTTER_DEBUG_USE_KMS_MODIFIERS=1 in the environment have any effect?
(In reply to Jonas Ådahl from comment #22) > I wonder, does setting MUTTER_DEBUG_USE_KMS_MODIFIERS=1 in the environment > have any effect? None that I see, still crashing with "Aug 31 13:26:03 nano abrt-notification[3280]: [🡕] Process 1229 (gnome-shell) crashed in cogl_texture_get_gl_texture()"
So this seems more and more like a driver issue, the crash I see is that the fallback OpenGL texture doesn't exist. I can reproduce the same backtrace by simply fake-failing creating that fallback texture. The fallback texture is a 1x1 premultiplied RGBA8888 filled with the pixel 0xffffffff, and it's allocated very early. What does running with COGL_DEBUG=all say? Maybe COGL_DRIVER=gles2 has some effect?
Created https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1994 but it will not fix anything, as the issue still seems to be the inability to allocate the 1x1 texture. With that MR the error message from the allocation failure can be seen, but I assume it will say "out of memory" as all the others seems to do.
So if it's a driver issue, should we reassign to kernel? mesa? what driver is in question exactly here? CCing the regular graphics suspect...
I would start with mesa first. The driver is tegra/nouveau.
OK, let's try that for now.
Created attachment 1821312 [details] COGL_DEBUG.txt
Created attachment 1821313 [details] COGL_DRIVER.gles2.txt
(In reply to Jonas Ådahl from comment #24) > What does running with COGL_DEBUG=all say? Maybe COGL_DRIVER=gles2 has some > effect? Added logs. No change that I noticed.
(In reply to Paul Whalen from comment #31) > (In reply to Jonas Ådahl from comment #24) > > > What does running with COGL_DEBUG=all say? Maybe COGL_DRIVER=gles2 has some > > effect? > > Added logs. No change that I noticed. How were these env vars set? It doesn't seem like they had any effect. Note that they have to be read at login time by gnome-shell, so place them in e.g. /etc/environment and reboot.
Created attachment 1821392 [details] COGL_DEBUG.txt
Created attachment 1821393 [details] COGL_DRIVER.gles2.txt
Sorry, logs updated.
Hi Karol, is there any news on this? It's one of the last two blockers for F35 Beta at present (and we may be able to fudge the other one, possibly). Thanks!
(In reply to Adam Williamson from comment #36) > Hi Karol, is there any news on this? It's one of the last two blockers for > F35 Beta at present (and we may be able to fudge the other one, possibly). > Thanks! Yeah. So.. I figured out what is regressing this and reverting those patches I was starting to hit memory corruption bugs. The corruptions I figured out, but for the regression I am still unsure what to do about it. I pinged Thierry Reding, who is more or less maintaining the Tegra driver bits inside mesa about this and I might have to further discuss this issue with Thierry.
For the short term, could we go with reverting the thing that causes the regression and fixing the corruption bugs, if you have those figured out?
(In reply to Adam Williamson from comment #38) > For the short term, could we go with reverting the thing that causes the > regression and fixing the corruption bugs, if you have those figured out? I think so, the patches to fix the corruptions are not perfect yet as I was focusing on identifying the problems. Next week without XDC in the way I will have more time to spend time on this, so next week I should have a working solution for this, just not the one we might want to have long term. Is there a hard deadline you want a solution for this?
Well, yes :D As I mentioned, it's blocking the Fedora 35 beta release. We already missed one date for that, and the next go/no-go is Thursday, which means we need a tested RC by then to ship. That means we need to build it by Tuesday at the latest. So, Tuesday is kinda the deadline I'm working with. If we miss that we wind up slipping F35 Beta by another week. Sorry for the tight timeline.
Brief update here - fixing this is difficult and unlikely to be viable in a reasonable time. We're going to roll an RC for Beta without the fix, and discuss whether to waive this as a blocker. If we do, we'll have to message that graphics on Jetson are not working properly for Beta.
Moving to F35 Final Blocker as it was waived in today's F35 Beta Go/No-Go meeting: https://meetbot.fedoraproject.org/fedora-meeting/2021-09-23/f35-beta-go_no_go-meeting.2021-09-23-17.00.log.html#l-99
Okay, we now have fixes to make this all work again. First we need to revert https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3724 and I think the easiest is to revert the patches in Fedora until we have a proper solution upstream (either revert or fixing it) but it's not clear yet what that will be. Fixing this will solve the initially reported issue here, but then gnome will constantly segfault due to a massive amount of use-after-free accesses which will be fixed by this MR upstream: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13231 So I suggest we merge the later MR (the patches look quite good so I expect this to happen this week) pull those patches in and revert the former MR to get at least this bug out of the way. Sadly I still didn't find time to get push access to Fedora, so somebody else has to do the final steps. I can prepare the patches though.
(In reply to Karol Herbst from comment #43) > Sadly I still didn't find time to get push access to Fedora, so somebody > else has to do the final steps. I can prepare the patches though. Great! If you can get the patches prepared, I'll push the mesa maintainers in Fedora to make sure they get applied.
Thanks a lot Karol! I can merge the fixes if need be, but I expect Pete or Peter will be able to :)
Created attachment 1832192 [details] Patches to make Tegra work on Fedora 35 again Sadly we didn't get to merge it upstream yet, as there are some minor concerns from my side for the patches. But I do think those are probably not relevant for fixing the core issue and I didn't hit any issues with my testing at least.
awesome, thanks very much. I'll try and get those merged.
FEDORA-2021-236569b607 has been submitted as an update to Fedora 35. https://bodhi.fedoraproject.org/updates/FEDORA-2021-236569b607
FEDORA-2021-236569b607 has been pushed to the Fedora 35 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-236569b607` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-236569b607 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2021-236569b607 has been pushed to the Fedora 35 stable repository. If problem still persists, please make note of it in this bug report.
I tried to run some pre-release tests of Fedora 35 Workstation on my Jetson Nano yesterday, but on a fully updated F35 Workstation install, I still see gnome-shell crashes that prevent even gdm from starting. Is it possible that this bug was not fixed with the last mesa update after all? (My install might also be borked, but the gdm logs look suspicious.)
(In reply to Fabio Valentini from comment #51) > I tried to run some pre-release tests of Fedora 35 Workstation on my Jetson > Nano yesterday, but on a fully updated F35 Workstation install, I still see > gnome-shell crashes that prevent even gdm from starting. > > Is it possible that this bug was not fixed with the last mesa update after > all? > (My install might also be borked, but the gdm logs look suspicious.) Could you try RC1.1 - https://kojipkgs.fedoraproject.org/compose/35/Fedora-35-20211020.0/compose/Workstation/aarch64/images/Fedora-Workstation-35-1.1.aarch64.raw.xz There is still some screen tearing, but it seems to work as well as it did in F34.
(In reply to Fabio Valentini from comment #51) > I tried to run some pre-release tests of Fedora 35 Workstation on my Jetson > Nano yesterday, but on a fully updated F35 Workstation install, I still see > gnome-shell crashes that prevent even gdm from starting. > > Is it possible that this bug was not fixed with the last mesa update after > all? > (My install might also be borked, but the gdm logs look suspicious.) I just tried it out on my jetson nano and everything seems to work alright with the packages provided by dnf. Built from mesa-21.2.4-1.fc35.src.rpm here.
(In reply to Paul Whalen from comment #52) > (In reply to Fabio Valentini from comment #51) > > I tried to run some pre-release tests of Fedora 35 Workstation on my Jetson > > Nano yesterday, but on a fully updated F35 Workstation install, I still see > > gnome-shell crashes that prevent even gdm from starting. > > > > Is it possible that this bug was not fixed with the last mesa update after > > all? > > (My install might also be borked, but the gdm logs look suspicious.) > > Could you try RC1.1 - > https://kojipkgs.fedoraproject.org/compose/35/Fedora-35-20211020.0/compose/ > Workstation/aarch64/images/Fedora-Workstation-35-1.1.aarch64.raw.xz > > There is still some screen tearing, but it seems to work as well as it did > in F34. Yeah, I discussed this tearing issue with Thierry on multiple occasions, and it's not clear what the solutions should be here. It might be that we need to add a new UAPI to nouveau to handle this properly.