Bug 1054294 - [GTK3] Gtk3 GtkSocket and Gtk2 GtkPlug are missing X sync
Summary: [GTK3] Gtk3 GtkSocket and Gtk2 GtkPlug are missing X sync
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: firefox
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Martin Stransky
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-01-16 15:01 UTC by Martin Stransky
Modified: 2014-06-09 04:52 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2014-06-03 13:05:48 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Mozilla Foundation 968196 0 None None None Never

Description Martin Stransky 2014-01-16 15:01:21 UTC
Description of problem:

firefox-gtk3-29.0-3.debug.fc19.x86_64

bt:
#6  0x00007f6ef425f6a3 in mozalloc_abort (
    msg=0x7fffd3e3f280 "[15008] ###!!! ABORT: X_SetInputFocus: BadMatch (invalid parameter attributes); 3 requests ago; id=0x3c0001a\nRe-running with MOZ_X_SYNC=1 in the environment may give a more helpful backtrace.: file /h"...)
    at /usr/src/debug/firefox-gtk3-29.0/mozilla-central-20140109/memory/mozalloc/mozalloc_abort.cpp:30
#7  0x00007f6eef80394d in Abort (
    aMsg=0x7fffd3e3f280 "[15008] ###!!! ABORT: X_SetInputFocus: BadMatch (invalid parameter attributes); 3 requests ago; id=0x3c0001a\nRe-running with MOZ_X_SYNC=1 in the environment may give a more helpful backtrace.: file /h"...)
    at /usr/src/debug/firefox-gtk3-29.0/mozilla-central-20140109/xpcom/base/nsDebugImpl.cpp:427
#8  0x00007f6eef803858 in NS_DebugBreak (aSeverity=3, 
    aStr=0x7f6ec4aa1bc8 "X_SetInputFocus: BadMatch (invalid parameter attributes); 3 requests ago; id=0x3c0001a\nRe-running with MOZ_X_SYNC=1 in the environment may give a more helpful backtrace.", aExpr=0x0, 
    aFile=0x7f6ef2cfffe0 "/home/komat/rpmbuild/BUILD/firefox-gtk3-29.0/mozilla-central-20140109/toolkit/xre/nsX11ErrorHandler.cpp", aLine=157) at /usr/src/debug/firefox-gtk3-29.0/mozilla-central-20140109/xpcom/base/nsDebugImpl.cpp:384
#9  0x00007f6ef1faed67 in X11Error (display=0x7f6ef77c9000, event=0x7fffd3e3ffd0)
    at /usr/src/debug/firefox-gtk3-29.0/mozilla-central-20140109/toolkit/xre/nsX11ErrorHandler.cpp:157
#10 0x000000318b643c2b in _XError (dpy=dpy@entry=0x7f6ef77c9000, rep=rep@entry=0x7f6ecdc7e460) at XlibInt.c:1463
#11 0x000000318b640c87 in handle_error (dpy=0x7f6ef77c9000, err=0x7f6ecdc7e460, in_XReply=<optimized out>) at xcb_io.c:213
#12 0x000000318b640d35 in handle_response (dpy=dpy@entry=0x7f6ef77c9000, response=0x7f6ecdc7e460, 
    in_XReply=in_XReply@entry=1) at xcb_io.c:325
#13 0x000000318b641c30 in _XReply (dpy=dpy@entry=0x7f6ef77c9000, rep=rep@entry=0x7fffd3e40180, extra=extra@entry=0, 
    discard=discard@entry=1) at xcb_io.c:627
#14 0x000000318b63ef83 in XTranslateCoordinates (dpy=0x7f6ef77c9000, src_win=src_win@entry=62914585, 
    dest_win=dest_win@entry=170, src_x=src_x@entry=0, src_y=src_y@entry=0, dst_x=dst_x@entry=0x7fffd3e40200, 
    dst_y=dst_y@entry=0x7fffd3e40204, child=child@entry=0x7fffd3e40208) at TrCoords.c:51
#15 0x000000319be58958 in gdk_window_x11_get_root_coords (window=0x7f6ef77fa630, x=0, y=0, root_x=0x7fffd3e40288, 
    root_y=0x7fffd3e4028c) at gdkwindow-x11.c:2941
#16 0x000000319be3236a in gdk_window_get_origin (window=0x7f6ef77fa630, x=0x7fffd3e40288, y=0x7fffd3e4028c)
    at gdkwindow.c:7180
#17 0x00007f6ef0aace7a in nsWindow::WidgetToScreenOffset (this=0x7f6ef77716a0)
    at /usr/src/debug/firefox-gtk3-29.0/mozilla-central-20140109/widget/gtk/nsWindow.cpp:1751
#18 0x00007f6ef12554e0 in nsDOMUIEvent::CalculateScreenPoint (aPresContext=0x7f6ecedb3800, aEvent=0x7fffd3e409c0)
    at /usr/src/debug/firefox-gtk3-29.0/mozilla-central-20140109/content/events/src/nsDOMUIEvent.h:60

Comment 1 Martin Stransky 2014-01-21 13:28:13 UTC
Seems to happen when more than one flash-plugin instance appears on the page.

Comment 2 Martin Stransky 2014-01-22 13:20:42 UTC
Happens only when MOZ_X_SYNC is unset and a page with the plugin is closed. Can be reproduced with any plugin (flash, vlc). Some hint:

(firefox:10896): Gtk-WARNING **: GtkSocket 0x7fffd080a850 is mapped but visible=1 child_visible=1 parent MozContainer 0x7fffe513e220 mapped=0

Comment 3 Martin Stransky 2014-01-22 13:55:43 UTC
[20536] ###!!! ABORT: X_UnmapWindow: BadWindow (invalid Window parameter);

Comment 4 Martin Stransky 2014-01-22 14:04:03 UTC
May be related to https://bugzilla.mozilla.org/show_bug.cgi?id=540114#c10

Comment 5 Martin Stransky 2014-01-27 15:05:58 UTC
Looks like XSynchronize(display, True); helps here, to sync the unmap/destroy request.

Comment 6 Martin Stransky 2014-01-31 15:15:16 UTC
It looks like the situation is an opposite to the one in mozbz#540114. The gdk still thinks it has the x window attached to GtkPlug, but the x window is already deleted.

Comment 7 Martin Stransky 2014-02-03 21:21:24 UTC
The problem is that main Firefox process still uses already deleted GtkPlug and its Window:

plugin-container (flash-plugin) process:

----------------------------------------
AnswerNPP_Destroy() GtkSocket GdkWindow = 0x7fffef5cc360, GtkSocket x11 xid = 0x400052a
AnswerNPP_Destroy()  GtkPlug GdkWindow = 0x7fffef5cc480, x11 xid = 0x3e00003

Call NPP_Destroy() to flash-plugin

AnswerNPP_Destroy() GtkSocket GdkWindow = 0x7fffef5cc360, GtkSocket x11 xid = 0x400052a
AnswerNPP_Destroy()  GtkPlug GdkWindow = (nil), x11 xid = (nil)
----------------------------------------

So the GtkPlug and its X window has been removed. But main process (Firefox) still has the GtkPlug connected to GtkSocket after that:

----------------------------------------
socket_unrealize_cb(), GtkSocketWidget=0x7fffce5ee690, socket GdkX11Window = 0x7fffcff72300, socket x11 xid = 0x400052a
socket_unrealize_cb(), plug GdkX11Window = 0x7fffcff72540, plug x11 xid = 0x3e00003
----------------------------------------

Which is incorrect and causes:
ABORT: X_UnmapWindow: BadWindow (invalid Window parameter); 8 requests ago; id=0x3e00003

Comment 8 Martin Stransky 2014-02-03 21:23:23 UTC
Note: It's mixed gtk3/gtk2 scenario, where Gtk3 is GtkSocket and Gtk2 is GtkPlug.

Comment 9 Martin Stransky 2014-02-04 12:36:23 UTC
It looks like a some race-condition. Works fine when XSynchronization is enabled in main process (Firefox with GtkSocket). Also works when it's debugged step by step in gdb.

Comment 10 Benjamin Otte 2014-02-04 13:39:47 UTC
So, I've investigated this a bit, here's what I think happens:

(0) Firefox does the bad XSetInputFocus call. IN that case I can happily ignore this bug and blame it on Firefox. But let's assume the likely case that GDK is the culprit.

(1) GDK itself does call XSetInputFocus (the function causing the X error) only in two place:
https://git.gnome.org/browse/gtk+/tree/gdk/x11/gdkwindow-x11.c?id=3.10.0#n2260
https://git.gnome.org/browse/gtk+/tree/gdk/x11/gdkdisplay-x11.c?id=3.10.0#n1236

(2) GDK protects both these calls by an error trap (with an explanation of why this is necessary).

(3) Error traps were redesigned for GTK3. Instead of requiring an XSync() call, GDK keeps track of which X requests were called while inside a trap and if an error comes in, GDK's default X error handler discards it.

(4) Firefox sets its own error handler, so the GDK handler doesn't run.

(5) The Firefox error handler doesn't know that the error was trapped and should be just ignored, because only the GDK error handler would know that.

The easiest fix from my POV would be to not add a custom error handler in Firefox. Is there a good reason for why Firefox overrides the X error handler?

Comment 11 Martin Stransky 2014-02-04 13:51:15 UTC
Thanks for the update. Does that also apply to scenario from comments 7 - 9? 

I can't reproduce the XSetInputFocus but the X_UnmapWindow: BadWindow fault is 100% reproductible, happens when I open and close any web page with flash plug-in. I can provide you testcase and test package for that too.

Comment 12 Martin Stransky 2014-02-04 13:53:03 UTC
Please also note the sync aspect - none of the bugs is shown when XSynchronization is enabled. Actually I can't find any single point of failure.

Comment 13 Martin Stransky 2014-02-04 14:48:47 UTC
(In reply to Benjamin Otte from comment #10)
> The easiest fix from my POV would be to not add a custom error handler in
> Firefox. Is there a good reason for why Firefox overrides the X error
> handler?

I have no idea why Firefox uses the custom X error handler. But I can confirm the error does not show when it's removed from the Firefox main process.

Comment 14 Martin Stransky 2014-02-04 15:02:58 UTC
(In reply to Martin Stransky from comment #13)
> I have no idea why Firefox uses the custom X error handler. But I can
> confirm the error does not show when it's removed from the Firefox main
> process.

The custom X11 handler seems to be used for automated crash reporting (something like ABRT in Fedora). All crashes are uploaded at mozilla crash server.

Comment 15 Martin Stransky 2014-02-05 10:22:37 UTC
Benjamin, mozilla uses gdk_error_trap_push()/gdk_error_trap_pop() combo in Gtk2 Firefox. Do I need to update it for the Gtk3 code? 

And can be the X11 firefox error handler registered after the gdk one? Firefox needs that for the custom crash reporting.

Thanks!

Comment 16 Martin Stransky 2014-02-05 15:03:15 UTC
Benjamin, the X failure vanishes when the Gtk3 GtkSocked destroy callback looks like:

static void
socket_deleted_cb(GtkWidget *widget, GtkWidget **widget_pointer)
{
  gdk_error_trap_push();
  gtk_widget_destroyed(widget, widget_pointer);
  gdk_error_trap_pop();
}

Does Gtk2 have a different windnow/error handling?

Comment 17 Benjamin Otte 2014-02-06 18:38:52 UTC
Yes, GTK3 has gdk_error_trap_pop_unlocked() that is the cause of your problems. I believe gdk_error_trap_pop() still works as in GTK2 so sprinkling it in the code will help against certain bugs.

And why does Mozilla need the abort handler in the X11 handler? Isn't it catching the error when GDK calls g_error() with its default crash handler?

Comment 18 Karl Tomlinson 2014-02-10 22:10:12 UTC
GDK calls XSetErrorHandler again each time gdk_x11_display_error_trap_push is
called [1], apparently to deal with the case of apps installing their own error
handler, but then, if using _pop_ignored(), restores the app's error handler
before the error has been received [2].

Xlib error handling is hard.  I haven't even thought how to make things work
with threads using the same display.  Perhaps using Display::async_handlers as
in gdkasync.c is useful.

[1] https://git.gnome.org/browse/gtk+/tree/gdk/x11/gdkdisplay-x11.c?id=3.10.7#n2596
[2] https://git.gnome.org/browse/gtk+/tree/gdk/x11/gdkdisplay-x11.c?id=3.10.7#n2669

It looks to me like GDK is not achieving what it is trying to achieve, which would mean there is a GDK bug here.

Comment 19 Martin Stransky 2014-06-03 13:05:48 UTC
Let's trace it upstream - https://bugzilla.mozilla.org/show_bug.cgi?id=968196

Comment 20 Karl Tomlinson 2014-06-09 04:52:12 UTC
The problems described in comment 13 were reported in https://bugzilla.gnome.org/show_bug.cgi?id=629608#c8, but no solution was found.


Note You need to log in before you can comment on or make changes to this bug.