Bug 1908448 - Upgrading from version 20.2.3-1 to 20.2.4-1 results in no GUI
Summary: Upgrading from version 20.2.3-1 to 20.2.4-1 results in no GUI
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: mesa
Version: 33
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Adam Jackson
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-16 17:50 UTC by Quentin Armitage
Modified: 2021-11-30 18:06 UTC (History)
11 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2021-11-30 18:06:13 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
The system journal entries at the time the problem occurs. (67.49 KB, text/plain)
2020-12-16 17:50 UTC, Quentin Armitage
no flags Details

Description Quentin Armitage 2020-12-16 17:50:39 UTC
Created attachment 1739727 [details]
The system journal entries at the time the problem occurs.

Description of problem:
Upgrading mesa-filesystem and mesa-dri-drivers from version 20.2.3-1 to version 20.2.4-1 results in:
Oh no! Something has gone wrong.
A problem has occurred and the system can't recover. Please contact a system administrator

Version-Release number of selected component (if applicable):
20.2.4-1

How reproducible:
Always

Steps to Reproduce:
1. Upgrade to version 20.2.4-1
2. Reboot
3.

Actual results:
Oh no! message displayed as described above

Expected results:
Usual login screen displayed and able to login.

Additional info:
Dec 16 17:12:25 cain.armitage.org.uk /usr/libexec/gdm-x-session[1073]: dbus-daemon[1073]: [session uid=42 pid=1073] Activating service name='org.a11y.Bus' requested by ':1.0' (uid=42 pid=1079 comm="/usr/libexec/gnome-session-check-accelerated " label="system_u:system_r:xdm_t:s0-s0:c0.c1023")
Dec 16 17:12:25 cain.armitage.org.uk /usr/libexec/gdm-x-session[1073]: dbus-daemon[1073]: [session uid=42 pid=1073] Successfully activated service 'org.a11y.Bus'
Dec 16 17:12:25 cain.armitage.org.uk audit[1093]: ANOM_ABEND auid=4294967295 uid=42 gid=42 ses=4294967295 subj=system_u:system_r:xdm_t:s0-s0:c0.c1023 pid=1093 comm="gnome-session-c" exe="/usr/libexec/gnome-session-check-accelerated-gl-helper" sig=11 res=1
Dec 16 17:12:25 cain.armitage.org.uk kernel: gnome-session-c[1093]: segfault at 0 ip 0000000000000000 sp 00007fffdca2fcb8 error 14 in gnome-session-check-accelerated-gl-helper[564e5cc21000+2000]
Dec 16 17:12:25 cain.armitage.org.uk kernel: Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
Dec 16 17:12:26 cain.armitage.org.uk audit: BPF prog-id=48 op=LOAD
Dec 16 17:12:26 cain.armitage.org.uk audit: BPF prog-id=49 op=LOAD
Dec 16 17:12:26 cain.armitage.org.uk audit: BPF prog-id=50 op=LOAD
Dec 16 17:12:26 cain.armitage.org.uk systemd[1]: Started Process Core Dump (PID 1099/UID 0).
Dec 16 17:12:26 cain.armitage.org.uk audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@1-1099-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Dec 16 17:12:26 cain.armitage.org.uk systemd-coredump[1100]: Process 1093 (gnome-session-c) of user 42 dumped core.

                                                             Stack trace of thread 1093:
                                                             #0  0x0000000000000000 n/a (n/a + 0x0)
Dec 16 17:12:27 cain.armitage.org.uk audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@1-1099-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Dec 16 17:12:27 cain.armitage.org.uk systemd[1]: systemd-coredump: Succeeded.
Dec 16 17:12:27 cain.armitage.org.uk gnome-session-binary[1074]: WARNING: Failed to reset failed state of units: GDBus.Error:org.freedesktop.DBus.Error.Spawn.ChildExited: Process org.freedesktop.systemd1 exited with status 1


The first segfault appears to occur in `/usr/libexec/gnome-session-check-accelerated-gl-helper --print-renderer'
gdb backtrace outputs

Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/libexec/gnome-session-check-accelerated-gl-helper --print-renderer'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000000000 in ?? ()
[Current thread is 1 (Thread 0x7fe0438f1740 (LWP 1093))]
Missing separate debuginfos, use: dnf debuginfo-install elfutils-libelf-0.182-1.fc33.x86_64 expat-2.2.8-3.fc33.x86_64 glib2-2.66.3-1.fc33.x86_64 glibc-2.32-2.fc33.x86_64 libX11-1.6.12-3.fc33.x86_64 libX11-xcb-1.6.12-3.fc33.x86_64 libXau-1.0.9-4.fc33.x86_64 libXcomposite-0.4.5-3.fc33.x86_64 libXdamage-1.1.5-3.fc33.x86_64 libXext-1.3.4-4.fc33.x86_64 libXfixes-5.0.3-12.fc33.x86_64 libXxf86vm-1.1.4-14.fc33.x86_64 libdrm-2.4.102-2.fc33.x86_64 libedit-3.1-33.20191231cvs.fc33.x86_64 libffi-3.1-26.fc33.x86_64 libgcc-10.2.1-9.fc33.x86_64 libglvnd-1.3.2-2.fc33.x86_64 libglvnd-glx-1.3.2-2.fc33.x86_64 libselinux-3.1-2.fc33.x86_64 libxcb-1.13.1-5.fc33.x86_64 libxshmfence-1.3-7.fc33.x86_64 llvm-libs-11.0.0-1.fc33.x86_64 mesa-dri-drivers-20.2.3-1.fc33.x86_64 mesa-libGL-20.2.3-1.fc33.x86_64 mesa-libglapi-20.2.3-1.fc33.x86_64 ncurses-libs-6.2-3.20200222.fc33.x86_64 pcre-8.44-2.fc33.x86_64 pcre2-10.35-8.fc33.x86_64 sssd-client-2.4.0-2.fc33.x86_64 zlib-1.2.11-23.fc33.x86_64
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007fe035040f30 in flatshade_init_state () from /usr/lib64/dri/r300_dri.so
#2  0x00007fe034bc2f44 in st_translate_atifs_program () from /usr/lib64/dri/r300_dri.so
Backtrace stopped: Cannot access memory at address 0x7fffdca31b68

The following 2 coredumps that are produced subsequently are from gnome-shell, but the backtraces are the same as above.

Comment 1 Fedora Update System 2020-12-17 07:49:32 UTC
FEDORA-2020-f04e9f1683 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-f04e9f1683

Comment 2 Quentin Armitage 2020-12-17 09:02:08 UTC
After updating to version 20.2.6.1 gnome-session-check-acelerated-gl-helper is still segfaulting, and the stack backtrace is:
#0  0x0000000000000000 in ?? ()
#1  0x00007f983d5e60bb in cso_destroy_context () from /usr/lib64/dri/r300_dri.so
#2  0x00007f983d167f34 in st_destroy_context_priv () from /usr/lib64/dri/r300_dri.so
#3  0x00007f983d1691d4 in st_destroy_context () from /usr/lib64/dri/r300_dri.so
#4  0x00007f983d14a982 in dri_destroy_context () from /usr/lib64/dri/r300_dri.so
#5  0x00007f983d5e4787 in driDestroyContext () from /usr/lib64/dri/r300_dri.so
#6  0x00007f983e932c33 in dri2_destroy_context () from /lib64/libGLX_mesa.so.0
#7  0x00007f983e921559 in glXDestroyContext () from /lib64/libGLX_mesa.so.0
#8  0x0000557b43c3793e in main ()

The journal entry is:
Dec 17 08:38:14 cain.armitage.org.uk kernel: gnome-session-c[1118]: segfault at 0 ip 0000000000000000 sp 00007ffd1ebd4ca8 error 14 in gnome-session-check-accelerated-gl-helper[557b43c35000+2000]
Dec 17 08:38:14 cain.armitage.org.uk kernel: Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.


This is followed by gnome-shell segfaulting, and the stack backtrace is:
(gdb) bt
#0  0x0000000000000000 in  ()
#1  0x00007ff956fe00bb in cso_destroy_context () at /usr/lib64/dri/r300_dri.so
#2  0x00007ff956b61f34 in st_destroy_context_priv () at /usr/lib64/dri/r300_dri.so
#3  0x00007ff956b631d4 in st_destroy_context () at /usr/lib64/dri/r300_dri.so
#4  0x00007ff956b44982 in dri_destroy_context () at /usr/lib64/dri/r300_dri.so
#5  0x00007ff956fde787 in driDestroyContext () at /usr/lib64/dri/r300_dri.so
#6  0x00007ff9651fbc33 in dri2_destroy_context () at /lib64/libGLX_mesa.so.0
#7  0x00007ff9651ea559 in glXDestroyContext () at /lib64/libGLX_mesa.so.0
#8  0x00007ff97c531aa2 in _cogl_winsys_display_destroy.lto_priv.0 () at /usr/lib64/mutter-7/libmutter-cogl-7.so.0
#9  0x00007ff97c500d8d in _cogl_object_display_indirect_free () at /usr/lib64/mutter-7/libmutter-cogl-7.so.0
#10 0x00007ff97c503875 in cogl_renderer_check_onscreen_template () at /usr/lib64/mutter-7/libmutter-cogl-7.so.0
#11 0x00007ff97cceaf24 in clutter_backend_x11_get_display () at /usr/lib64/mutter-7/libmutter-clutter-7.so.0
#12 0x00007ff97cc8e31e in clutter_backend_real_create_context () at /usr/lib64/mutter-7/libmutter-clutter-7.so.0
#13 0x00007ff97ccb6327 in clutter_init_real () at /usr/lib64/mutter-7/libmutter-clutter-7.so.0
#14 0x00007ff97ccb66c8 in post_parse_hook () at /usr/lib64/mutter-7/libmutter-clutter-7.so.0
#15 0x00007ff97d6a41c7 in g_option_context_parse () at /lib64/libglib-2.0.so.0
#16 0x00007ff97ccb69bd in clutter_init () at /usr/lib64/mutter-7/libmutter-clutter-7.so.0
#17 0x00007ff97ca8dec3 in meta_backend_initable_init () at /lib64/libmutter-7.so.0
#18 0x00007ff97cae504c in meta_init () at /lib64/libmutter-7.so.0
#19 0x000055987a34f9dc in main ()
(gdb) 

and then another segfault in gnome-shell with the same stack backtrace (except for the addresses)

Prior to the gnome-shell segfaults are the following entries in the journal (twice for each segfault):
Dec 17 08:38:20 cain.armitage.org.uk /usr/libexec/gdm-x-session[1068]: (II) RADEON(0): EDID vendor "SEC", prod id 0
Dec 17 08:38:20 cain.armitage.org.uk /usr/libexec/gdm-x-session[1068]: (II) RADEON(0): Printing DDC gathered Modelines:
Dec 17 08:38:20 cain.armitage.org.uk /usr/libexec/gdm-x-session[1068]: (II) RADEON(0): Modeline "1280x800"x0.0   71.11  1280 1328 1360 1440  800 802 808 823 -hsync -vsync (49.4 kHz eP)

Comment 3 Pete Walter 2020-12-17 11:26:01 UTC
I suspect it could be a fallout from https://gitlab.freedesktop.org/mesa/mesa/-/commit/bd32ac29bbbe26e035c76a0d85c664be4c3ec0e4 that added an assert to cso_destroy_context(). Any chance you could file this upstream and say that it's a regression in 20.2?

Comment 4 Quentin Armitage 2020-12-17 16:29:04 UTC
I have built the mesa packages v20.2.6 with commit bd32ac29 reverted, and the segfault still occurs.

I have reported this upstream at https://gitlab.freedesktop.org/mesa/mesa/-/issues/3990

Comment 5 Quentin Armitage 2020-12-17 17:42:54 UTC
With v20.2.6 if I revert bd32ac29 as before and also revert 5f9912a4 and d6fd7acf it then works successfully.

Comment 6 Quentin Armitage 2021-01-17 16:07:37 UTC
After upgrading to v20.3.3 the segfault starts occurring again. Applying the following patch stops the segfault occurring, although I don't expect that the patch is the right way to resolve the problem. I have posted these details at https://gitlab.freedesktop.org/mesa/mesa/-/issues/3990.

diff --git a/src/gallium/auxiliary/cso_cache/cso_context.c b/src/gallium/auxiliary/cso_cache/cso_context.c
index 1eef6aac70c..4dca8b41d38 100644
--- a/src/gallium/auxiliary/cso_cache/cso_context.c
+++ b/src/gallium/auxiliary/cso_cache/cso_context.c
@@ -412,7 +412,7 @@ void cso_destroy_context( struct cso_context *ctx )
             if (maxview > 0) {
                ctx->pipe->set_sampler_views(ctx->pipe, sh, 0, maxview, views);
             }
-            if (maxssbo > 0) {
+            if (maxssbo > 0 && ctx->pipe-set_shader_buffers) {
                ctx->pipe->set_shader_buffers(ctx->pipe, sh, 0, maxssbo, ssbos, 0);
             }
             for (int i = 0; i < maxcb; i++) {

Comment 7 Quentin Armitage 2021-02-24 09:25:24 UTC
Upstream commit https://gitlab.freedesktop.org/mesa/mesa/-/commit/58e43594fc457eaaf1b1e01e48948959a82080bc resolves this issue.

Would it be possible to backport the patch to Fedora 32 and 33?

Comment 8 Ben Cotton 2021-11-04 16:00:59 UTC
This message is a reminder that Fedora 33 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 33 on 2021-11-30.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '33'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 33 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 9 Ben Cotton 2021-11-30 18:06:13 UTC
Fedora 33 changed to end-of-life (EOL) status on 2021-11-30. Fedora 33 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.