Bug 231394

Summary: Multi-threaded pygtk apps deadlock if accessibility is enabled
Product: [Fedora] Fedora Reporter: Daniel Berrangé <berrange>
Component: pygtk2Assignee: Matthew Barnes <mbarnes>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: rstrode
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-03-13 21:29:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
GDB trace none

Description Daniel Berrangé 2007-03-07 23:48:18 UTC
Description of problem:
Running the latest rawhide, the virt-manager application will more or less
immediately deadlock after startup if the AT-SPI accessibility extensions are
enabled in the desktop session.  This is because the AT-SPI module will call
gdk_threads_enter from within the gtk_init() method. Since gtk_init() is called
automatically when doing 'import gtk', virt-manager has no way to initialize gdk
threading. This is major problem since AT-SPI appears to be enabled by default
in Fedora 7 rawhide, making virt-manager dead in the water, out of the box :-(

Version-Release number of selected component (if applicable):
pygtk2-2.10.4-1.fc7
gtk2-2.10.9-4.fc7
glib2-2.12.9-4.fc7
at-spi-1.17.2-1.fc7
gail-1.17.0-1.fc7
atk-1.17.0-1.fc7
virt-manager-0.3.1-3.fc7

How reproducible:
Always, if Accessibility is enabled

Steps to Reproduce:
1. Select Applications -> System Tools -> Virtual Machine Manager
2. Select 'Run unprivileged' in the console-helper dialog
3. Select Xen or QEMU hypervisor in the initial connect dialog
4. The main 'VM list' window now appears
  
Actual results:
The window does not respond to any input events

Expected results:
The window responds normally

Additional info:
This mailing list posting shows that as of a month ago the ATK bindings now
requires that  gdk_threads_init() is invoked *before*  gtk_init() is called.

http://mail.gnome.org/archives/desktop-devel-list/2007-February/msg00305.html

The PyGTK bindings to GTK do not make it possible to comply with this
requirement. eg

  import gtk
  gtk.gdk.threads_init()

Is the absolute shortest sequence with which I can initialized threading in
PyGTK. This is still too late though, because the  initializer run by the
'import gtk' call will invoke  gtk_init() :-(

If I set a breakpoint on 'atk_misc_threads_enter' it will stop at the following
location:

#0  0x0000003b9c6142d0 in atk_misc_threads_enter () from /usr/lib64/libatk-1.0.so.0
#1  0x00002aaab05e8cef in ?? () from /usr/lib64/gtk-2.0/modules/libatk-bridge.so
#2  0x00002aaab05e9289 in ?? () from /usr/lib64/gtk-2.0/modules/libatk-bridge.so
#3  0x00002aaab05eab49 in ?? () from /usr/lib64/gtk-2.0/modules/libatk-bridge.so
#4  0x0000003bc2b4321b in default_display_notify_cb (display_manager=<value
optimized out>) at gtkmodules.c:413
#5  0x0000003b9820af19 in g_closure_invoke () from /lib64/libgobject-2.0.so.0
#6  0x0000003b9821a788 in ?? () from /lib64/libgobject-2.0.so.0
#7  0x0000003b9821bbd4 in g_signal_emit_valist () from /lib64/libgobject-2.0.so.0
#8  0x0000003b9821bda3 in g_signal_emit () from /lib64/libgobject-2.0.so.0
#9  0x0000003b9820ecff in ?? () from /lib64/libgobject-2.0.so.0
#10 0x0000003b98210656 in g_object_notify () from /lib64/libgobject-2.0.so.0
#11 0x0000003bc2619ad5 in IA__gdk_display_open_default_libgtk_only () at gdk.c:289
#12 0x0000003bc2b2d4e4 in IA__gtk_init_check (argc=<value optimized out>,
argv=<value optimized out>) at gtkmain.c:889
#13 0x00002aaaaf2778f1 in _wrap_gtk_init_check (self=<value optimized out>,
args=<value optimized out>) at ./gtk.override:406
#14 0x0000003ba4cb4f3b in PyEval_EvalFrameEx (f=0x7264b0, throwflag=<value
optimized out>) at Python/ceval.c:3550
#15 0x0000003ba4cb538e in PyEval_EvalFrameEx (f=0x725690, throwflag=<value
optimized out>) at Python/ceval.c:3652
#16 0x0000003ba4cb5cd2 in PyEval_EvalCodeEx (co=0x2aaaae0e2a08, globals=<value
optimized out>, locals=<value optimized out>, args=0x0, argcount=0, kws=0x0,
kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2833
#17 0x0000003ba4cb5d12 in PyEval_EvalCode (co=0x9fec00, globals=0x1,
locals=0x3b98e0f988) at Python/ceval.c:494
#18 0x0000003ba4cc65a2 in PyImport_ExecCodeModuleEx (name=0x7fff01ccafd0 "gtk",
co=0x2aaaae0e2a08, pathname=0x7fff01cc7e30
"/usr/lib64/python2.5/site-packages/gtk-2.0/gtk/__init__.pyc") at
Python/import.c:658
#19 0x0000003ba4cc6b22 in load_source_module (name=0x7fff01ccafd0 "gtk",
pathname=0x7fff01cc7e30
"/usr/lib64/python2.5/site-packages/gtk-2.0/gtk/__init__.pyc", fp=<value
optimized out>) at Python/import.c:942
#20 0x0000003ba4cc8c58 in load_package (name=0x7fff01ccafd0 "gtk",
pathname=<value optimized out>) at Python/import.c:998
#21 0x0000003ba4cc7b52 in import_submodule (mod=0x3ba4f3a5d0,
subname=0x7fff01ccafd0 "gtk", fullname=0x7fff01ccafd0 "gtk") at Python/import.c:2383
#22 0x0000003ba4cc8034 in load_next (mod=0x3ba4f3a5d0, altmod=0x3ba4f3a5d0,
p_name=<value optimized out>, buf=0x7fff01ccafd0 "gtk", p_buflen=0x7fff01ccafc8)
at Python/import.c:2203
#23 0x0000003ba4cc8241 in import_module_level (name=0x0, globals=0x2aaaaaae54d4,
locals=<value optimized out>, fromlist=0x3ba4f3a5d0, level=<value optimized
out>) at Python/import.c:1984
#24 0x0000003ba4cc86c5 in PyImport_ImportModuleLevel (name=0x2aaaaab3a864 "gtk",
globals=0x6242c0, locals=0x6242c0, fromlist=0x3ba4f3a5d0, level=-1) at
Python/import.c:2055
#25 0x0000003ba4cae2dd in builtin___import__ (self=<value optimized out>,
args=<value optimized out>, kwds=<value optimized out>) at Python/bltinmodule.c:47
#26 0x0000003ba4c3c053 in PyObject_Call (func=0x9fec00, arg=0x1,
kw=0x3b98e0f988) at Objects/abstract.c:1860
#27 0x0000003ba4cae8ae in PyEval_CallObjectWithKeywords (func=0x2aaaaaace5f0,
arg=0x2aaaaab2c7e0, kw=0x0) at Python/ceval.c:3435
#28 0x0000003ba4cb0dda in PyEval_EvalFrameEx (f=0x66a270, throwflag=<value
optimized out>) at Python/ceval.c:2065
#29 0x0000003ba4cb5cd2 in PyEval_EvalCodeEx (co=0x2aaaaaae7eb8, globals=<value
optimized out>, locals=<value optimized out>, args=0x0, argcount=0, kws=0x0,
kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2833
#30 0x0000003ba4cb5d12 in PyEval_EvalCode (co=0x9fec00, globals=0x1,
locals=0x3b98e0f988) at Python/ceval.c:494
#31 0x0000003ba4ccf832 in run_mod (mod=<value optimized out>, filename=<value
optimized out>, globals=0x6242c0, locals=0x6242c0, flags=<value optimized out>,
arena=<value optimized out>) at Python/pythonrun.c:1264
#32 0x0000003ba4ccf906 in PyRun_FileExFlags (fp=0x601010,
filename=0x7fff01cccbe3 "/usr/share/virt-manager/virt-manager.py", start=257,
globals=0x6242c0, locals=0x6242c0, closeit=1, flags=0x7fff01ccc590) at
Python/pythonrun.c:1250
#33 0x0000003ba4cd0df8 in PyRun_SimpleFileExFlags (fp=<value optimized out>,
filename=0x7fff01cccbe3 "/usr/share/virt-manager/virt-manager.py", closeit=1,
flags=0x7fff01ccc590) at Python/pythonrun.c:870
#34 0x0000003ba4cda0a5 in Py_Main (argc=<value optimized out>,
argv=0x7fff01ccc6b8) at Modules/main.c:496
#35 0x0000003b96e1d9c4 in __libc_start_main () from /lib64/libc.so.6
#36 0x0000000000400629 in _start ()


The stack frame 14 corresponds to:

(gdb) frame 14
#14 0x0000003ba4cb4f3b in PyEval_EvalFrameEx (f=0x7264b0, throwflag=<value
optimized out>) at Python/ceval.c:3550
3550                                    C_TRACE(x, (*meth)(self,NULL));
(gdb) pyframe
/usr/lib64/python2.5/site-packages/gtk-2.0/gtk/__init__.py (59): _init

So, you can see the atk_threads_enter is being called from the gtk initializer
in 'import gtk'.

So basically any Python GTK application which uses threads is totally screwed if
AT-SPI is enabled, because there is no way for the app to initialize GTK
threading before AT-SPI is activated :-(

Comment 1 Daniel Berrangé 2007-03-08 00:00:25 UTC
Created attachment 149535 [details]
GDB trace

Bugzilla totally mangled the stack trace by line wrapping it - here's a file
with the unmangled trace which should be more readable.

Comment 2 Daniel Berrangé 2007-03-08 17:48:42 UTC
To test out whether calling 'gdk_threads_init()' before gtk_init() solves the
problem I created a dummy python module containing this:

$ cat gtkthreads.c
#include <Python.h>

#include <glib.h>
#include <gdk/gdk.h>

static struct PyMethodDef gtkthreads_module_methods[] = {
  { NULL, NULL, 0, NULL }
};

PyMODINIT_FUNC
initgtkthreads(void)
{
  Py_InitModule("gtkthreads", gtkthreads_module_methods);
  g_thread_init(NULL);
  gdk_threads_init();
}
$  gcc -Wall -fPIC -fpic -shared -o gtkthreads.so $(pkg-config --cflags --libs
gtk+-2.0 glib-2.0 gobject-2.0 gthread-2.0) -I /usr/include/python2.5  gtkthreads.c
$ mv gtkthreads.so /usr/lib64/python-2.5/site-packages/

And then changed

  import gtk

To

  import gtkthreads
  import gtk

With this nasty hack in place, I can successfully start virt-manager with
Accessibility enabled. So this confirms that if you call gdk_threads_init()
early enough, the problem goes away - just how to achieve this in a supportable
fashion with official PyGTK apis instead of my nasty hack. Perhaps pygtk itself
needs this mini  'gtkthreads' module which apps can import ?



Comment 3 Matthew Barnes 2007-03-08 18:05:36 UTC
Sounds like upstream needs to be alerted to this.

Has a b.g.o bug been filed yet?

Comment 4 Daniel Berrangé 2007-03-08 18:12:13 UTC
I've not filed any b.g.o  bug report, just this b.r.c report.

Comment 5 Ray Strode [halfline] 2007-03-08 19:20:26 UTC
well the mailing-list discussion mentioned in comment 0 has a bug report:

http://bugzilla.gnome.org/show_bug.cgi?id=329454

Something doesn't quite make sense to me though.

gtk.gdk.threads_enter() should be a noop at gtk_init time, because threading
hasn't been initialized yet!

It seems like the only potential for deadlock is the time between the
threads_init() call and the threads_enter() call.  if there are any at-spi calls
 between those two calls the problem would happen, but otherwise shouldn't?

So if you do gtk.gdk.threads_enter() immediately after gtk.gdk.threads_init() do
things work?

Is the backtrace above where the deadlock happens?


Comment 6 Daniel Berrangé 2007-03-08 20:46:17 UTC
No, the above trace was just to illustrate AT-SPI making a threads call before
threading was initialized. The trace of the actual hang is:

#0  0x0000003b98e0c6c8 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
#1  0x0000003b98e08724 in _L_mutex_lock_103 () from /lib64/libpthread.so.0
#2  0x0000003b98e081c3 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000036a924437e in gdk_event_prepare (source=0x726130,
timeout=0x7fff2e1e2274) at gdkevents-x11.c:2272
#4  0x0000003b97e2f337 in IA__g_main_context_prepare (context=0x9e6730,
priority=0x7fff2e1e22d4) at gmain.c:2364
#5  0x0000003b97e2fa2d in g_main_context_iterate (context=0x9e6730, block=1,
dispatch=1, self=<value optimized out>) at gmain.c:2657
#6  0x0000003b97e300aa in IA__g_main_loop_run (loop=0x114bc80) at gmain.c:2881
#7  0x00000036a892d023 in IA__gtk_main () at gtkmain.c:1148
#8  0x00002aaaaf277544 in _wrap_gtk_main (self=<value optimized out>) at
./gtk.override:1133
#9  0x0000003ba4cb4f3b in PyEval_EvalFrameEx (f=0xfb1820, throwflag=<value
optimized out>) at Python/ceval.c:3550
#10 0x0000003ba4cb538e in PyEval_EvalFrameEx (f=0x66a270, throwflag=<value
optimized out>) at Python/ceval.c:3652
#11 0x0000003ba4cb5cd2 in PyEval_EvalCodeEx (co=0x2aaaaaae7eb8, globals=<value
optimized out>, locals=<value optimized out>, 
    args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0)
at Python/ceval.c:2833
#12 0x0000003ba4cb5d12 in PyEval_EvalCode (co=0x726130, globals=0x0,
locals=0x3b98e0f658) at Python/ceval.c:494
#13 0x0000003ba4ccf832 in run_mod (mod=<value optimized out>, filename=<value
optimized out>, globals=0x6242c0, locals=0x6242c0, 
    flags=<value optimized out>, arena=<value optimized out>) at
Python/pythonrun.c:1264
#14 0x0000003ba4ccf906 in PyRun_FileExFlags (fp=0x601010,
filename=0x7fff2e1e4be5 "/usr/share/virt-manager/virt-manager.py", 
    start=257, globals=0x6242c0, locals=0x6242c0, closeit=1,
flags=0x7fff2e1e2aa0) at Python/pythonrun.c:1250
#15 0x0000003ba4cd0df8 in PyRun_SimpleFileExFlags (fp=<value optimized out>, 
    filename=0x7fff2e1e4be5 "/usr/share/virt-manager/virt-manager.py",
closeit=1, flags=0x7fff2e1e2aa0) at Python/pythonrun.c:870
#16 0x0000003ba4cda0a5 in Py_Main (argc=<value optimized out>,
argv=0x7fff2e1e2bc8) at Modules/main.c:496
#17 0x0000003b96e1d9c4 in __libc_start_main () from /lib64/libc.so.6
#18 0x0000000000400629 in _start ()

The GDK docs say:

http://developer.gnome.org/doc/API/2.0/gdk/gdk-Threads.html#gdk-threads-init

 "gdk_threads_init()  This call must be made before any use of the main loop
from GTK+; to be safe, call it before gtk_init()."

So, with the original AT-SPI build at-spi-1.17.2-1.fc7, AT-SPI is calling
threads_enter(), and talking to the GTK main loop before the app has initialized
threads. With the new build at-spi-1.17.2-2.fc7, AT-SPI is not calling
threads_enter() but is still talking to the GTK main loop before the app has
initialized threads.

So from my reading of the GTK docs, the root cause is that AT-SPI is talking to
the GTK main loop functions before threading is initialized - it must not call
any main loop functions until the app has initialized threading. So the patch in
-2.fc7 had no effect, because making threads_enter() a no-op didn't change the
fact that AT-SPI called GTK main loop functions before threads were initialized.

My hack in comment #3 solved the problem because it ensured threads were
initialized before AT-SPI got anywhere near the main loop.


Comment 7 Ray Strode [halfline] 2007-03-08 21:43:25 UTC
hmm still something doesn't seem right to me.

Well, I don't think that gtk_init() runs and quits the main loop temporarily
while its being run, or anything like that.

So at-spi-1.17.2-1.fc7 was doing

threads_leave()
some_synchronous_corba_call()
threads_enter()

during gtk_init.  The "drop the lock, make the call, take the lock" formula
above is fine when it results from some gtk widget function getting called in a
callback invoked by the mainloop because it's the responsibility of that
callback to take the lock before using gtk functions (what i'm saying is that we
can be relatively sure that the threads_leave() call is paired with an earlier
threads_enter() call).

Now in the gtk_init case (which is called way before gtk_main is ever run) there
isn't a guarantee that the lock is already taken, so the threads_leave() call
above would be unpaired.  This isn't a problem if the user hasn't initialized
threading before calling gtk_init() because the threads_leave() call would be a
noop anyway.  It's only a problem if the user initializes threading, then calls,
gtk_init, the calls gdk_threads_enter right before running the mainloop.  The
sequence would be

call threads init which creates mutex
call gtk_init
  at-spi then unlocks new mutex that isn't locked (an error, but a benign one)
  make corba call
  lock mutex
do other application setup
lock_mutex (deadlock because mutex is already locked!)
run mainloop
  unlock mutex
  run callbacks
  lock mutex
 
at-spi-1.17.2-2.fc7 addressed that problem by explicitly not making the
threads_leave() call (and the threads_enter() call after the corba call) if the
corba call was made before the user called gtk_main(), and so then the
previously problematic sequence would be:

call threads init which creates mutex
call gtk_init
  make corba call
do other application setup
lock_mutex
run mainloop
  unlock mutex
  run callbacks
  lock mutex

which should work fine.

In the pygtk case though, it *seems* like the bad sequence mentioned above
should never be a problem because threads_init() isn't called until after gtk_init()

Do you have any other threads running when you get the above backtrace? what are
their traces like? 

Comment 8 Daniel Berrangé 2007-03-08 21:48:49 UTC
No, there's only a single thread running at this point in the game.

> Well, I don't think that gtk_init() runs and quits the main loop temporarily
> while its being run, or anything like that."

My take from the GTK docs was that you need to init threading before calling
*any* method related to the GTK event loop - eg that would include the addition
/ removal of a event source. I could be being overly paranoid in interpreting it
that way though - the doc is very unclearly worded in this regard.


Comment 9 Ray Strode [halfline] 2007-03-08 22:05:17 UTC
So if there is only one thread running and it's hanging waiting for the lock,
that means that the lock must have been taken at some point and not released (or
that the lock somehow got corrupted).

Comment 10 Ray Strode [halfline] 2007-03-08 22:23:34 UTC
So I talked with owen a bit on irc:

[17:17:39] <halfline> so does "gdk_threads_init() This call must be made before 
any use of the main loop from GTK+; to be safe, call it before gtk_init()."
[17:17:55] <halfline> mean don't call gdk_threads_init() from within a callback 
of the mainloop
[17:18:12] <halfline> or does it mean, don't call g_timeout_add() before calling
 gdk_threads_init() ?
[17:18:42] <owen> halfline: It's talking about entering the main loop
[17:18:57] <owen> Adding sources will be harmless


Comment 11 Daniel Berrangé 2007-03-09 01:21:51 UTC
I built a custom version of GTK which logged a stack trace for every single
gdk_threads_enter() and gdk_threads_exit() call, and discovered that
threads_exit() was being called twice in a row when AT-SPI was enabled. A little
futher investigation identified an idle callback in virt-manager which was not
protected with a threads_enter/exit call. I need to do a little further
debugging to verify that this has solved the issue, but it looks like an app bug
currently - previously it was a harmless issue, because only a single thread
would ever update the GTK UI, but with the AT-SPI hooks now doing explicitly
threading calls themselves look like it finally triggered the latent bug. I'll
close this ticket once confirmed for certain.


Comment 12 Daniel Berrangé 2007-03-13 21:29:17 UTC
After much more debugging I determined that the workaround in comment #2 was not
in fact reliable - it only worked 75% of the time. The root cause of the
problems was a missing threads_enter/exit in the idle loop callback in
virt-manager. This caused a mutex to be unlocked twice, which
eventually/sometimes caused deadlock much later in the program. So it looks like
there is no problem in at-spi itself - the new locking stuff in latest at-spi is
merely exposing a latent bug that was previously benign, but now critical.