Bug 460793 - metacity crashes in event_callback; meta_display_screen_for_root returns NULL
metacity crashes in event_callback; meta_display_screen_for_root returns NULL
Status: CLOSED WONTFIX
Product: Fedora
Classification: Fedora
Component: metacity (Show other bugs)
9
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Søren Sandmann Pedersen
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-09-01 02:32 EDT by Andy Bakun
Modified: 2014-06-18 05:10 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-07-14 10:38:56 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
script output of gdb session against crashed metacity (19.28 KB, text/plain)
2008-09-19 04:28 EDT, Andy Bakun
no flags Details
a patch that avoids this specific null pointer dereference (1.10 KB, patch)
2008-10-14 02:40 EDT, Andy Bakun
no flags Details | Diff

  None (edit)
Description Andy Bakun 2008-09-01 02:32:39 EDT
Description of problem:

metacity crashes and spawns bugbuddy, which can not be seen because metacity is crashed.  Windows can not gain focus (I'm using focus-follows-mouse), all mouse clicks are ignored, but if an xterm had focus at the time of the crash, I can spawn gdb and get a backtrace:

#0  0x00110416 in __kernel_vsyscall ()
#1  0x003cf233 in __waitpid_nocancel () from /lib/libc.so.6
#2  0x0059cd67 in g_spawn_sync () from /lib/libglib-2.0.so.0
#3  0x0059d0ac in g_spawn_command_line_sync () from /lib/libglib-2.0.so.0
#4  0x00121253 in ?? () from /usr/lib/gtk-2.0/modules/libgnomebreakpad.so
#5  0x0012131e in ?? () from /usr/lib/gtk-2.0/modules/libgnomebreakpad.so
#6  0x00121a17 in google_breakpad::ExceptionHandler::InternalWriteMinidump () from /usr/lib/gtk-2.0/modules/libgnomebreakpad.so
#7  0x00121e23 in google_breakpad::ExceptionHandler::HandleException () from /usr/lib/gtk-2.0/modules/libgnomebreakpad.so
#8  <signal handler called>
#9  0x08060f13 in gdk_rectangle_intersect ()
#10 0x080aaaef in gdk_rectangle_intersect ()
#11 0x030d6def in ?? () from /usr/lib/libgdk-x11-2.0.so.0
#12 0xbf9c1dc8 in ?? ()
#13 0x08e72a80 in ?? ()
#14 0x08e88cf8 in ?? ()
#15 0x08e56920 in ?? ()
#16 0x00000001 in ?? ()
#17 0x03121b80 in g_option_context_set_help_enabled () from /usr/lib/libgdk-x11-2.0.so.0
#18 0xbf9c1be8 in ?? ()
#19 0x00000000 in ?? ()

Window decorations don't get redrawn, but applications can still seem to draw into their windows (xterm, system monitor applet).  When this originally started happening, I was able to ssh into the machine, and kill bugbuddy and metacity; metacity would be restarted and then would immediately crash again.

Version-Release number of selected component (if applicable):
metacity --version: metacity 2.22.0
metacity-2.22.0-3.fc9.i386

gdk_rectangle_intersect appears to be defined in libgdk, owned by gtk+:
 gtk+-1.2.10-61.fc9.i386
 gtk+-devel-1.2.10-61.fc9.i386

gtk2 owns /usr/lib/libgdk-x11-2.0.so.0:
 gtk2-2.12.11-1.fc9.i386
 gtk2-devel-2.12.11-1.fc9.i386

g_option_context_set_help_enabled appears to be defined in libglib, owned by glib2:
 glib2-2.16.5-1.fc9.i386
 glib2-devel-2.16.5-1.fc9.i386

How reproducible:
Unknown pattern, but happens repeatedly.  On both a my Fedora9 desktop (Pentium4) and a MacBookPro with F9 installed on it.

I'm usually running a bunch of xterms (up to 6), firefox 3.0.1, pidgin (2.4.3-1.fc9) (with guifications plugin enabled), on boht machines, and opera and xclock on just the Pentium4.

Steps to Reproduce:
1. normal work
2. metacity crashes
3. control-alt-backspace to recover, re-login
  
Actual results:


Expected results:


Additional info:

Both machines I've experienced this on are dual head using Xinerama.
nvidia drivers:
xorg-x11-drv-nvidia-173.14.12-1.lvn9.i386
xorg-x11-drv-nvidia-libs-173.14.12-1.lvn9.i386
uname -r: 2.6.25.14-108.fc9.i686
Comment 1 Andy Bakun 2008-09-19 04:28:03 EDT
Created attachment 317166 [details]
script output of gdb session against crashed metacity
Comment 2 Andy Bakun 2008-09-19 04:37:39 EDT
Forget the above traceback.  I got the debuginfo packages installed, and the attached gdb session is much more useful.

It seems meta_display_screen_for_root for returning NULL, and there is no NULL check before it is dereferenced in the arguments to meta_workspace_focus_default_window.

Not sure what is causing this, or which window is supposed to be the default one.  This doesn't seem to happen when I'm switching workspaces.

I'm also running imwheel (which I mention because I believe it grabs something in the root window but doesn't have any windows itself).
Comment 3 Chris Underhill 2008-09-22 17:34:10 EDT
I'm seeing exactly the same problem - however I'm not running imwheel. I was able to attach to metacity remotely and using gdb, obtained:

Program received signal SIGSEGV, Segmentation fault.
0x000000000041c730 in event_callback (event=0x7fff4a530540, data=0x15951d0) at core/display.c:1988
1988              meta_workspace_focus_default_window (new_screen->active_workspace,
(gdb) bt
#0  0x000000000041c730 in event_callback (event=0x7fff4a530540, data=0x15951d0) at core/display.c:1988
#1  0x0000000000463106 in filter_func (xevent=0x1596bd0, event=<value optimized out>, data=0x839bedb)
    at ui/ui.c:83
#2  0x000000372ec5418b in gdk_event_apply_filters (xevent=Could not find the frame base for "gdk_event_apply_filters".
) at gdkevents-x11.c:345
#3  0x000000372ec54f0f in gdk_event_translate (display=Could not find the frame base for "gdk_event_translate".
) at gdkevents-x11.c:896
#4  0x000000372ec57a16 in _gdk_events_queue (display=Could not find the frame base for "_gdk_events_queue".
) at gdkevents-x11.c:2285
#5  0x000000372ec57bec in gdk_event_dispatch (source=Could not find the frame base for "gdk_event_dispatch".
) at gdkevents-x11.c:2345
#6  0x000000372c8374db in IA__g_main_context_dispatch (context=<value optimized out>) at gmain.c:2012
#7  0x000000372c83acbd in g_main_context_iterate (context=<value optimized out>,
    block=<value optimized out>, dispatch=<value optimized out>, self=<value optimized out>)
    at gmain.c:2645
#8  0x000000372c83b1ed in IA__g_main_loop_run (loop=<value optimized out>) at gmain.c:2853
#9  0x000000000042a9d3 in main (argc=1, argv=0x7fff4a530d58) at core/main.c:476

My setup is a triple-head with 2 NVidia GS7300 cards. This happens at random, usually after 2 or 3 days of uptime for my X server. 

rpms relevant are:

metacity-2.22.0-3.fc9.x86_64
xorg-x11-drv-nvidia-173.14.12-1.lvn9.x86_64
kmod-nvidia-173.14.12-3.lvn9.x86_64
xorg-x11-drv-nvidia-libs-173.14.12-1.lvn9.x86_64
kmod-nvidia-2.6.25.14-108.fc9.x86_64-173.14.12-3.lvn9.x86_64

I also note that this looks quite like bug 461885.
Comment 4 Chris Underhill 2008-10-12 08:15:39 EDT
With the latest updates, I'm lucky if metacity survives more than an hour or two before this gets triggered, making it almost unusable.

metacity-2.22.0-5.fc9.x86_64
xorg-x11-drv-nvidia-173.14.12-1.lvn9.x86_64
kmod-nvidia-173.14.12-5.lvn9.x86_64
xorg-x11-drv-nvidia-libs-173.14.12-1.lvn9.x86_64
kmod-nvidia-173.14.12-5.lvn9.x86_64
kernel-2.6.26.5-45.fc9.x86_64
Comment 5 Andy Bakun 2008-10-14 02:16:58 EDT
I mentioned I'm running imwheel, but this also happens on another install in which I'm not running imwheel.
Comment 6 Andy Bakun 2008-10-14 02:40:43 EDT
Created attachment 320258 [details]
a patch that avoids this specific null pointer dereference

Possible patch, that just puts the use of the variable holding NULL inside a check for NULL.

I notice that there are other places in the code that also use the result of meta_display_screen_for_root without checking for NULL.  meta_display_screen_for_root can explicitly return NULL, so a possible fix might be to change it so it always returns a valid screen.
Comment 7 Andy Bakun 2008-10-14 02:47:19 EDT
I've also applied the attached simple patch and generated a set of new i386 RPMs.  All this is is 2.22.0-5.fc9 with this patch applied. They can be downloaded from 
http://thwartedefforts.org/software/metacity-2.22.0-6aab/

sha1sum:
8866c54e8c5aaa00d8caafb60fe5f76da0b60bd7  metacity-2.22.0-6aab.i386.rpm
7a38c00bbf47c04d80d99ab9041b11d15a6684d2  metacity-2.22.0-6aab.src.rpm
e32820220461fe4b8724427dbfddd4563ea9f09f  metacity-debuginfo-2.22.0-6aab.i386.rpm
06e0ad2e78349aca70afffef7f5a45d0e6ab4224  metacity-devel-2.22.0-6aab.i386.rpm
93ca3844aeea22c69ed689c57f0ecc18753339a7  metacity-meta_display_screen_for_root-nullderef.patch

There are a few other places in the code where the result of meta_display_screen_for_root is used which should be looked at for this same problem by someone who knows metacity internals better than I do (which is just about everyone).

It's worth pointing out that it doesn't seem to happen when I'm switching workspaces, but may be triggered when moving the mouse between different (Xinerama) screens.  I have not noticed a definite pattern, other that it happens when I am, ahem, working on something I have not saved yet (ain't that always the way?).  It doesn't seem to happen when the machine is idle.
Comment 8 Denis Leroy 2008-10-16 05:47:02 EDT
Looks like a variant of this patch was applied to metacity 2.24.0.
Comment 9 Chris Underhill 2008-10-16 19:31:34 EDT
Looks like the problem around line 1988 of display.c was fixed upstream in revision 3664

http://svn.gnome.org/viewvc/metacity/branches/gnome-2-24/src/core/display.c?view=log&pathrev=3807
Comment 10 Chris Underhill 2008-10-22 03:10:07 EDT
I've applied the patch and recompiled metacity. Survived a couple of days this time before it went weird. The problem reappeared, but no bug-buddy this time so I could still switch between virtual displays using the keyboard shortcut, and cycle between windows using alt-tab. However I was unable to focus on anything using the mouse. Note that I too are using focus-follows-mouse.
Comment 11 Chris Underhill 2009-01-03 17:03:48 EST
I've now upgraded to Fedora 10. However the behaviour I reported in comment #10 applies to this release as well. Took about 4 hours before X began to ignore the mouse. TBH, this is making Gnome unusable for anything serious - I suspect I'll trying KDE.

Relevant RPMs:

metacity-2.24.0-2.fc10.x86_64
xorg-x11-server-Xorg-1.5.3-6.fc10.x86_64
xorg-x11-drv-nvidia-177.82-1.fc10.x86_64
akmod-nvidia-177.82-1.fc10.4.x86_64
kernel-2.6.27.9-159.fc10.x86_64
Comment 12 Andy Bakun 2009-01-03 17:28:33 EST
Chris, your problem in comment #10 has a workaround (which I've applied maybe 3 times today already).  I can't remember where I found it.  If you use the keyboard to move a window (any window) between Xinerama displays, it fixes itself and mouse buttons are seen again.  Alt-tab to a window, Alt-Space to open the window menu, select move, and use the cursor keys to drag the window entirely onto a different screen.  I believe this problem is unrelated to the original metacity crashing bug-report, since metacity isn't crashing -- but I guess may be what happens when metacity doesn't crash due to the patch/new-upstream, and it still gets into an error state.
Comment 13 Chris Underhill 2009-01-03 22:56:41 EST
Thanks :-) 

Just happened to me again and the workaround got the mouse working again. Woo-hoo I'm saved from ctrl-alt-backspace, which has been a too-regular-occurrence over the last 2 or 3 months!
Comment 14 Chris Underhill 2009-02-25 18:36:32 EST
I wonder if this is related to bug 473825 for which a fix has just been released? I shall be updating and finding out if it is...
Comment 15 Chris Underhill 2009-03-03 19:31:59 EST
Looks like my problem at least is now fixed - not reproduced this for the last week or so since updating for bug 473825
Comment 16 Bug Zapper 2009-06-09 22:35:38 EDT
This message is a reminder that Fedora 9 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 9.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '9'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 9's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 9 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 17 Bug Zapper 2009-07-14 10:38:56 EDT
Fedora 9 changed to end-of-life (EOL) status on 2009-07-10. Fedora 9 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.