Bug 1535080
Summary: | [Wayland] gnome-shell crash and process stay eating 100% CPU | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Tomas Pelka <tpelka> |
Component: | gnome-shell | Assignee: | Florian Müllner <fmuellner> |
Status: | CLOSED ERRATA | QA Contact: | Desktop QE <desktop-qa-list> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.5 | CC: | jan.public, jkoten, lmiksik, mcepl, ofourdan |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | mutter-3.26.2-7.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-04-10 13:10:18 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Tomas Pelka
2018-01-16 15:49:22 UTC
All of the core files from the link in comment 0 are from Xwayland. Would you have the core files from gnome-shell when it crashes? (In reply to Olivier Fourdan from comment #3) > All of the core files from the link in comment 0 are from Xwayland. > > Would you have the core files from gnome-shell when it crashes? All of the core files point toward an xwl_read_events() (i.e. gnome-shell dead) *but* those two: · core.13915 · core.19594 The backtrace of those is similar: #0 0x00007fd8ce8e41a7 in raise () from /usr/lib64/libc.so.6 #1 0x00007fd8ce8e5898 in abort () from /usr/lib64/libc.so.6 #2 0x000000000058f1da in OsAbort () at utils.c:1361 #3 0x0000000000594ce3 in AbortServer () at log.c:877 #4 0x0000000000595b2d in FatalError (f=f@entry=0x5b7490 "Caught signal %d (%s). Server aborting\n") at log.c:1015 #5 0x000000000058c43c in OsSigHandler (signo=11, sip=<optimized out>, unused=<optimized out>) at osinit.c:154 #6 <signal handler called> #7 xwl_glamor_pixmap_get_wl_buffer (pixmap=pixmap@entry=0x2994320) at xwayland-glamor.c:162 #8 0x0000000000424da5 in xwl_screen_post_damage (xwl_screen=0x215c750) at xwayland.c:514 #9 block_handler (data=0x215c750, timeout=<optimized out>) at xwayland.c:665 #10 0x0000000000557e46 in BlockHandler (pTimeout=pTimeout@entry=0x7ffc2ec84a04) at dixutils.c:388 #11 0x0000000000585ed9 in WaitForSomething (are_ready=0) at WaitFor.c:219 #12 0x00000000005531e1 in Dispatch () at dispatch.c:422 #13 0x000000000055744a in dix_main (argc=11, argv=0x7ffc2ec84be8, envp=<optimized out>) at main.c:287 #14 0x00007fd8ce8d0377 in __libc_start_main () from /usr/lib64/libc.so.6 #15 0x00000000004240fe in _start () (gdb) f 8 #8 0x0000000000424da5 in xwl_screen_post_damage (xwl_screen=0x215c750) at xwayland.c:514 514 buffer = xwl_glamor_pixmap_get_wl_buffer(pixmap); (gdb) p *pixmap $1 = {drawable = {type = 1 '\001', class = 0 '\000', depth = 24 '\030', bitsPerPixel = 32 ' ', id = 0, x = 0, y = 0, width = 0, height = 0, pScreen = 0x215c200, serialNumber = 1}, devPrivates = 0x2994368, refcnt = 1, devKind = 0, devPrivate = {ptr = 0x29943f0, val = 43598832, uval = 43598832, fptr = 0x29943f0}, screen_x = 0, screen_y = 0, usage_hint = 0, master_pixmap = 0x0} So we have the window pixmap being empty and xwl_pixmap_get() returing NULL (thus causing the NULL pointer deref). Question is, how do we get there, in post_damage() with a window with a pixmap of size 0×0 and no buffer. Worth noting, in both cases, the window that trigered the crash was "hexchat" (looking down the window's userProps) (gdb) x /s xwl_window->window->optional->userProps->next->next->next->next->next->next->next->next->next->next->next->next->next->next->next->next->next->next->next->data 0x215bcf0: "hexchat" There is a weird bug in F27 (which uses the same version of gnome) with gnome-shell and hexchat (bug 1525861), so I wonder if that's the same. OK I removed hexchat (I had it from flathub, FYI) from autostart apps and I'm back in wayland session with no gnome crashes. Just note stat I still have one app (dropbox) that is started on login so definitely hexchat+gnome-shell issue. (In reply to Olivier Fourdan from comment #4) > All of the core files point toward an xwl_read_events() (i.e. gnome-shell > dead) *but* those two: > > · core.13915 > · core.19594 > > The backtrace of those is similar [...] For that, I just posted he following patch upstream: https://patchwork.freedesktop.org/series/36683/ But would like to get the gnome-shell core files as well to investigate on gnome-shell/mutter side as well. (In reply to Tomas Pelka from comment #5) > OK I removed hexchat (I had it from flathub, FYI) from autostart apps and > I'm back in wayland session with no gnome crashes. > > Just note stat I still have one app (dropbox) that is started on login so > definitely hexchat+gnome-shell issue. I am trying to reproduce the issue by installing hexchat via flatpak and adding it to the autostarted apps, with no success so far, several login attempts all worked fine, no crash... While investigating bug 1529175, Matěj was able to reproduce this bug with pidgin (exact same Xwayland backtrace) and capture a core file for both gnome-shell and Xwayland, so this is not related specifically to hexchat. Looking at the core file, I see that gnome-shell crashes in the save_phase_2_callback() of the xsession management code: #0 0x00007f20a83ae941 in meta_workspace_index (workspace=0x0) at core/workspace.c:670 #1 0x00007f20a83b9a89 in save_phase_2_callback () at x11/session.c:953 #2 0x00007f20a83b9a89 in save_phase_2_callback (smc_conn=<optimized out>, client_data=0x1) at x11/session.c:455 #3 0x00007f2099c254be in _SmcProcessMessage () at /lib64/libSM.so.6 #4 0x00007f2099a15ea7 in IceProcessMessages () at /lib64/libICE.so.6 #5 0x00007f20a83b9390 in process_ice_messages (channel=<optimized out>, condition=<optimized out>, client_data=<optimized out>) at x11/session.c:96 #6 0x00007f20a23998f9 in g_main_context_dispatch (context=0xf4adf0) at gmain.c:3146 #7 0x00007f20a23998f9 in g_main_context_dispatch (context=context@entry=0xf4adf0) at gmain.c:3811 #8 0x00007f20a2399c58 in g_main_context_iterate (context=0xf4adf0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3884 #9 0x00007f20a2399f2a in g_main_loop_run (loop=0x11a8ca0) at gmain.c:4080 #10 0x00007f20a8395f4c in meta_run () at core/main.c:652 #11 0x0000000000402584 in main (argc=1, argv=0x7ffc42235328) at main.c:539 The window->workspace is 0x0, thus the null pointer dereference in meta_workspace_index(). Looking at the window, we see: window->type = META_WINDOW_NORMAL window->rect = {x = 2008, y = 73, width = 357, height = 814} window->monitor = 0x1180000, window->override_redirect = 0, window->unmanaging = 0, window->workspace = 0x0, window->always_sticky = 0, window->initial_workspace = 0, window->on_all_workspaces = 1, window->on_all_workspaces_requested = 0, window->initial_workspace_set = 1 So we have window->on_all_workspaces TRUE but window->on_all_workspaces_requested FALSE, and the window located quite faron the right, so chances are that it's on a secondary monitor while the primary is on the right. Looking at the code, “window->on_all_workspaces” is set from “should_be_on_all_workspaces (window)”, and now things get interesting, because it reads: 4726 static gboolean 4727 should_be_on_all_workspaces (MetaWindow *window) 4728 { 4729 if (window->always_sticky) 4730 return TRUE; 4731 4732 if (window->on_all_workspaces_requested) 4733 return TRUE; 4734 4735 if (window->override_redirect) 4736 return TRUE; 4737 4738 if (meta_prefs_get_workspaces_only_on_primary () && 4739 !window->unmanaging && 4740 window->monitor && 4741 !meta_window_is_on_primary_monitor (window)) 4742 return TRUE; 4743 4744 return FALSE; 4745 } -> So I strongly suspect the issue occurs with windows on the second monitor (not primary) with workspaces_only_on_primary set (the default being off means we may hit a seldom tested case here). Nice catch Olivier! What you said is true my hexchat was started on secondary display. Sorry for not mentioning it earlier. And what gives: $ dconf read /org/gnome/mutter/workspaces-only-on-primary on your account? (In reply to Olivier Fourdan from comment #10) > And what gives: > > $ dconf read /org/gnome/mutter/workspaces-only-on-primary > > on your account? Nothing (In reply to Tomas Pelka from comment #11) > Nothing Ah sorry, what about: $ gsettings get org.gnome.mutter workspaces-only-on-primary (In reply to Olivier Fourdan from comment #12) > (In reply to Tomas Pelka from comment #11) > > Nothing > > Ah sorry, what about: > > $ gsettings get org.gnome.mutter workspaces-only-on-primary False for me, what about you Mateji? (In reply to Tomas Pelka from comment #13) > False for me, what about you Mateji? False as well. Oh, sorry, Jonas and Florian pointed out this settings can be overridden, what gives: $ gsettings get org.gnome.shell.overrides workspaces-only-on-primary (In reply to Olivier Fourdan from comment #15) > Oh, sorry, Jonas and Florian pointed out this settings can be overridden, > what gives: > > $ gsettings get org.gnome.shell.overrides workspaces-only-on-primary That's true Just wondering whether bug 1538756 is a duplicate of this one. Just something abrt dug up. (In reply to Matěj Cepl from comment #18) > Just wondering whether bug 1538756 is a duplicate of this one. Just > something abrt dug up. Not a duplicate, but a consequence of this bug I reckon. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0770 |