Description of problem: Running the latest rawhide, the virt-manager application will more or less immediately deadlock after startup if the AT-SPI accessibility extensions are enabled in the desktop session. This is because the AT-SPI module will call gdk_threads_enter from within the gtk_init() method. Since gtk_init() is called automatically when doing 'import gtk', virt-manager has no way to initialize gdk threading. This is major problem since AT-SPI appears to be enabled by default in Fedora 7 rawhide, making virt-manager dead in the water, out of the box :-( Version-Release number of selected component (if applicable): pygtk2-2.10.4-1.fc7 gtk2-2.10.9-4.fc7 glib2-2.12.9-4.fc7 at-spi-1.17.2-1.fc7 gail-1.17.0-1.fc7 atk-1.17.0-1.fc7 virt-manager-0.3.1-3.fc7 How reproducible: Always, if Accessibility is enabled Steps to Reproduce: 1. Select Applications -> System Tools -> Virtual Machine Manager 2. Select 'Run unprivileged' in the console-helper dialog 3. Select Xen or QEMU hypervisor in the initial connect dialog 4. The main 'VM list' window now appears Actual results: The window does not respond to any input events Expected results: The window responds normally Additional info: This mailing list posting shows that as of a month ago the ATK bindings now requires that gdk_threads_init() is invoked *before* gtk_init() is called. http://mail.gnome.org/archives/desktop-devel-list/2007-February/msg00305.html The PyGTK bindings to GTK do not make it possible to comply with this requirement. eg import gtk gtk.gdk.threads_init() Is the absolute shortest sequence with which I can initialized threading in PyGTK. This is still too late though, because the initializer run by the 'import gtk' call will invoke gtk_init() :-( If I set a breakpoint on 'atk_misc_threads_enter' it will stop at the following location: #0 0x0000003b9c6142d0 in atk_misc_threads_enter () from /usr/lib64/libatk-1.0.so.0 #1 0x00002aaab05e8cef in ?? () from /usr/lib64/gtk-2.0/modules/libatk-bridge.so #2 0x00002aaab05e9289 in ?? () from /usr/lib64/gtk-2.0/modules/libatk-bridge.so #3 0x00002aaab05eab49 in ?? () from /usr/lib64/gtk-2.0/modules/libatk-bridge.so #4 0x0000003bc2b4321b in default_display_notify_cb (display_manager=<value optimized out>) at gtkmodules.c:413 #5 0x0000003b9820af19 in g_closure_invoke () from /lib64/libgobject-2.0.so.0 #6 0x0000003b9821a788 in ?? () from /lib64/libgobject-2.0.so.0 #7 0x0000003b9821bbd4 in g_signal_emit_valist () from /lib64/libgobject-2.0.so.0 #8 0x0000003b9821bda3 in g_signal_emit () from /lib64/libgobject-2.0.so.0 #9 0x0000003b9820ecff in ?? () from /lib64/libgobject-2.0.so.0 #10 0x0000003b98210656 in g_object_notify () from /lib64/libgobject-2.0.so.0 #11 0x0000003bc2619ad5 in IA__gdk_display_open_default_libgtk_only () at gdk.c:289 #12 0x0000003bc2b2d4e4 in IA__gtk_init_check (argc=<value optimized out>, argv=<value optimized out>) at gtkmain.c:889 #13 0x00002aaaaf2778f1 in _wrap_gtk_init_check (self=<value optimized out>, args=<value optimized out>) at ./gtk.override:406 #14 0x0000003ba4cb4f3b in PyEval_EvalFrameEx (f=0x7264b0, throwflag=<value optimized out>) at Python/ceval.c:3550 #15 0x0000003ba4cb538e in PyEval_EvalFrameEx (f=0x725690, throwflag=<value optimized out>) at Python/ceval.c:3652 #16 0x0000003ba4cb5cd2 in PyEval_EvalCodeEx (co=0x2aaaae0e2a08, globals=<value optimized out>, locals=<value optimized out>, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2833 #17 0x0000003ba4cb5d12 in PyEval_EvalCode (co=0x9fec00, globals=0x1, locals=0x3b98e0f988) at Python/ceval.c:494 #18 0x0000003ba4cc65a2 in PyImport_ExecCodeModuleEx (name=0x7fff01ccafd0 "gtk", co=0x2aaaae0e2a08, pathname=0x7fff01cc7e30 "/usr/lib64/python2.5/site-packages/gtk-2.0/gtk/__init__.pyc") at Python/import.c:658 #19 0x0000003ba4cc6b22 in load_source_module (name=0x7fff01ccafd0 "gtk", pathname=0x7fff01cc7e30 "/usr/lib64/python2.5/site-packages/gtk-2.0/gtk/__init__.pyc", fp=<value optimized out>) at Python/import.c:942 #20 0x0000003ba4cc8c58 in load_package (name=0x7fff01ccafd0 "gtk", pathname=<value optimized out>) at Python/import.c:998 #21 0x0000003ba4cc7b52 in import_submodule (mod=0x3ba4f3a5d0, subname=0x7fff01ccafd0 "gtk", fullname=0x7fff01ccafd0 "gtk") at Python/import.c:2383 #22 0x0000003ba4cc8034 in load_next (mod=0x3ba4f3a5d0, altmod=0x3ba4f3a5d0, p_name=<value optimized out>, buf=0x7fff01ccafd0 "gtk", p_buflen=0x7fff01ccafc8) at Python/import.c:2203 #23 0x0000003ba4cc8241 in import_module_level (name=0x0, globals=0x2aaaaaae54d4, locals=<value optimized out>, fromlist=0x3ba4f3a5d0, level=<value optimized out>) at Python/import.c:1984 #24 0x0000003ba4cc86c5 in PyImport_ImportModuleLevel (name=0x2aaaaab3a864 "gtk", globals=0x6242c0, locals=0x6242c0, fromlist=0x3ba4f3a5d0, level=-1) at Python/import.c:2055 #25 0x0000003ba4cae2dd in builtin___import__ (self=<value optimized out>, args=<value optimized out>, kwds=<value optimized out>) at Python/bltinmodule.c:47 #26 0x0000003ba4c3c053 in PyObject_Call (func=0x9fec00, arg=0x1, kw=0x3b98e0f988) at Objects/abstract.c:1860 #27 0x0000003ba4cae8ae in PyEval_CallObjectWithKeywords (func=0x2aaaaaace5f0, arg=0x2aaaaab2c7e0, kw=0x0) at Python/ceval.c:3435 #28 0x0000003ba4cb0dda in PyEval_EvalFrameEx (f=0x66a270, throwflag=<value optimized out>) at Python/ceval.c:2065 #29 0x0000003ba4cb5cd2 in PyEval_EvalCodeEx (co=0x2aaaaaae7eb8, globals=<value optimized out>, locals=<value optimized out>, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2833 #30 0x0000003ba4cb5d12 in PyEval_EvalCode (co=0x9fec00, globals=0x1, locals=0x3b98e0f988) at Python/ceval.c:494 #31 0x0000003ba4ccf832 in run_mod (mod=<value optimized out>, filename=<value optimized out>, globals=0x6242c0, locals=0x6242c0, flags=<value optimized out>, arena=<value optimized out>) at Python/pythonrun.c:1264 #32 0x0000003ba4ccf906 in PyRun_FileExFlags (fp=0x601010, filename=0x7fff01cccbe3 "/usr/share/virt-manager/virt-manager.py", start=257, globals=0x6242c0, locals=0x6242c0, closeit=1, flags=0x7fff01ccc590) at Python/pythonrun.c:1250 #33 0x0000003ba4cd0df8 in PyRun_SimpleFileExFlags (fp=<value optimized out>, filename=0x7fff01cccbe3 "/usr/share/virt-manager/virt-manager.py", closeit=1, flags=0x7fff01ccc590) at Python/pythonrun.c:870 #34 0x0000003ba4cda0a5 in Py_Main (argc=<value optimized out>, argv=0x7fff01ccc6b8) at Modules/main.c:496 #35 0x0000003b96e1d9c4 in __libc_start_main () from /lib64/libc.so.6 #36 0x0000000000400629 in _start () The stack frame 14 corresponds to: (gdb) frame 14 #14 0x0000003ba4cb4f3b in PyEval_EvalFrameEx (f=0x7264b0, throwflag=<value optimized out>) at Python/ceval.c:3550 3550 C_TRACE(x, (*meth)(self,NULL)); (gdb) pyframe /usr/lib64/python2.5/site-packages/gtk-2.0/gtk/__init__.py (59): _init So, you can see the atk_threads_enter is being called from the gtk initializer in 'import gtk'. So basically any Python GTK application which uses threads is totally screwed if AT-SPI is enabled, because there is no way for the app to initialize GTK threading before AT-SPI is activated :-(
Created attachment 149535 [details] GDB trace Bugzilla totally mangled the stack trace by line wrapping it - here's a file with the unmangled trace which should be more readable.
To test out whether calling 'gdk_threads_init()' before gtk_init() solves the problem I created a dummy python module containing this: $ cat gtkthreads.c #include <Python.h> #include <glib.h> #include <gdk/gdk.h> static struct PyMethodDef gtkthreads_module_methods[] = { { NULL, NULL, 0, NULL } }; PyMODINIT_FUNC initgtkthreads(void) { Py_InitModule("gtkthreads", gtkthreads_module_methods); g_thread_init(NULL); gdk_threads_init(); } $ gcc -Wall -fPIC -fpic -shared -o gtkthreads.so $(pkg-config --cflags --libs gtk+-2.0 glib-2.0 gobject-2.0 gthread-2.0) -I /usr/include/python2.5 gtkthreads.c $ mv gtkthreads.so /usr/lib64/python-2.5/site-packages/ And then changed import gtk To import gtkthreads import gtk With this nasty hack in place, I can successfully start virt-manager with Accessibility enabled. So this confirms that if you call gdk_threads_init() early enough, the problem goes away - just how to achieve this in a supportable fashion with official PyGTK apis instead of my nasty hack. Perhaps pygtk itself needs this mini 'gtkthreads' module which apps can import ?
Sounds like upstream needs to be alerted to this. Has a b.g.o bug been filed yet?
I've not filed any b.g.o bug report, just this b.r.c report.
well the mailing-list discussion mentioned in comment 0 has a bug report: http://bugzilla.gnome.org/show_bug.cgi?id=329454 Something doesn't quite make sense to me though. gtk.gdk.threads_enter() should be a noop at gtk_init time, because threading hasn't been initialized yet! It seems like the only potential for deadlock is the time between the threads_init() call and the threads_enter() call. if there are any at-spi calls between those two calls the problem would happen, but otherwise shouldn't? So if you do gtk.gdk.threads_enter() immediately after gtk.gdk.threads_init() do things work? Is the backtrace above where the deadlock happens?
No, the above trace was just to illustrate AT-SPI making a threads call before threading was initialized. The trace of the actual hang is: #0 0x0000003b98e0c6c8 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0 #1 0x0000003b98e08724 in _L_mutex_lock_103 () from /lib64/libpthread.so.0 #2 0x0000003b98e081c3 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00000036a924437e in gdk_event_prepare (source=0x726130, timeout=0x7fff2e1e2274) at gdkevents-x11.c:2272 #4 0x0000003b97e2f337 in IA__g_main_context_prepare (context=0x9e6730, priority=0x7fff2e1e22d4) at gmain.c:2364 #5 0x0000003b97e2fa2d in g_main_context_iterate (context=0x9e6730, block=1, dispatch=1, self=<value optimized out>) at gmain.c:2657 #6 0x0000003b97e300aa in IA__g_main_loop_run (loop=0x114bc80) at gmain.c:2881 #7 0x00000036a892d023 in IA__gtk_main () at gtkmain.c:1148 #8 0x00002aaaaf277544 in _wrap_gtk_main (self=<value optimized out>) at ./gtk.override:1133 #9 0x0000003ba4cb4f3b in PyEval_EvalFrameEx (f=0xfb1820, throwflag=<value optimized out>) at Python/ceval.c:3550 #10 0x0000003ba4cb538e in PyEval_EvalFrameEx (f=0x66a270, throwflag=<value optimized out>) at Python/ceval.c:3652 #11 0x0000003ba4cb5cd2 in PyEval_EvalCodeEx (co=0x2aaaaaae7eb8, globals=<value optimized out>, locals=<value optimized out>, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2833 #12 0x0000003ba4cb5d12 in PyEval_EvalCode (co=0x726130, globals=0x0, locals=0x3b98e0f658) at Python/ceval.c:494 #13 0x0000003ba4ccf832 in run_mod (mod=<value optimized out>, filename=<value optimized out>, globals=0x6242c0, locals=0x6242c0, flags=<value optimized out>, arena=<value optimized out>) at Python/pythonrun.c:1264 #14 0x0000003ba4ccf906 in PyRun_FileExFlags (fp=0x601010, filename=0x7fff2e1e4be5 "/usr/share/virt-manager/virt-manager.py", start=257, globals=0x6242c0, locals=0x6242c0, closeit=1, flags=0x7fff2e1e2aa0) at Python/pythonrun.c:1250 #15 0x0000003ba4cd0df8 in PyRun_SimpleFileExFlags (fp=<value optimized out>, filename=0x7fff2e1e4be5 "/usr/share/virt-manager/virt-manager.py", closeit=1, flags=0x7fff2e1e2aa0) at Python/pythonrun.c:870 #16 0x0000003ba4cda0a5 in Py_Main (argc=<value optimized out>, argv=0x7fff2e1e2bc8) at Modules/main.c:496 #17 0x0000003b96e1d9c4 in __libc_start_main () from /lib64/libc.so.6 #18 0x0000000000400629 in _start () The GDK docs say: http://developer.gnome.org/doc/API/2.0/gdk/gdk-Threads.html#gdk-threads-init "gdk_threads_init() This call must be made before any use of the main loop from GTK+; to be safe, call it before gtk_init()." So, with the original AT-SPI build at-spi-1.17.2-1.fc7, AT-SPI is calling threads_enter(), and talking to the GTK main loop before the app has initialized threads. With the new build at-spi-1.17.2-2.fc7, AT-SPI is not calling threads_enter() but is still talking to the GTK main loop before the app has initialized threads. So from my reading of the GTK docs, the root cause is that AT-SPI is talking to the GTK main loop functions before threading is initialized - it must not call any main loop functions until the app has initialized threading. So the patch in -2.fc7 had no effect, because making threads_enter() a no-op didn't change the fact that AT-SPI called GTK main loop functions before threads were initialized. My hack in comment #3 solved the problem because it ensured threads were initialized before AT-SPI got anywhere near the main loop.
hmm still something doesn't seem right to me. Well, I don't think that gtk_init() runs and quits the main loop temporarily while its being run, or anything like that. So at-spi-1.17.2-1.fc7 was doing threads_leave() some_synchronous_corba_call() threads_enter() during gtk_init. The "drop the lock, make the call, take the lock" formula above is fine when it results from some gtk widget function getting called in a callback invoked by the mainloop because it's the responsibility of that callback to take the lock before using gtk functions (what i'm saying is that we can be relatively sure that the threads_leave() call is paired with an earlier threads_enter() call). Now in the gtk_init case (which is called way before gtk_main is ever run) there isn't a guarantee that the lock is already taken, so the threads_leave() call above would be unpaired. This isn't a problem if the user hasn't initialized threading before calling gtk_init() because the threads_leave() call would be a noop anyway. It's only a problem if the user initializes threading, then calls, gtk_init, the calls gdk_threads_enter right before running the mainloop. The sequence would be call threads init which creates mutex call gtk_init at-spi then unlocks new mutex that isn't locked (an error, but a benign one) make corba call lock mutex do other application setup lock_mutex (deadlock because mutex is already locked!) run mainloop unlock mutex run callbacks lock mutex at-spi-1.17.2-2.fc7 addressed that problem by explicitly not making the threads_leave() call (and the threads_enter() call after the corba call) if the corba call was made before the user called gtk_main(), and so then the previously problematic sequence would be: call threads init which creates mutex call gtk_init make corba call do other application setup lock_mutex run mainloop unlock mutex run callbacks lock mutex which should work fine. In the pygtk case though, it *seems* like the bad sequence mentioned above should never be a problem because threads_init() isn't called until after gtk_init() Do you have any other threads running when you get the above backtrace? what are their traces like?
No, there's only a single thread running at this point in the game. > Well, I don't think that gtk_init() runs and quits the main loop temporarily > while its being run, or anything like that." My take from the GTK docs was that you need to init threading before calling *any* method related to the GTK event loop - eg that would include the addition / removal of a event source. I could be being overly paranoid in interpreting it that way though - the doc is very unclearly worded in this regard.
So if there is only one thread running and it's hanging waiting for the lock, that means that the lock must have been taken at some point and not released (or that the lock somehow got corrupted).
So I talked with owen a bit on irc: [17:17:39] <halfline> so does "gdk_threads_init() This call must be made before any use of the main loop from GTK+; to be safe, call it before gtk_init()." [17:17:55] <halfline> mean don't call gdk_threads_init() from within a callback of the mainloop [17:18:12] <halfline> or does it mean, don't call g_timeout_add() before calling gdk_threads_init() ? [17:18:42] <owen> halfline: It's talking about entering the main loop [17:18:57] <owen> Adding sources will be harmless
I built a custom version of GTK which logged a stack trace for every single gdk_threads_enter() and gdk_threads_exit() call, and discovered that threads_exit() was being called twice in a row when AT-SPI was enabled. A little futher investigation identified an idle callback in virt-manager which was not protected with a threads_enter/exit call. I need to do a little further debugging to verify that this has solved the issue, but it looks like an app bug currently - previously it was a harmless issue, because only a single thread would ever update the GTK UI, but with the AT-SPI hooks now doing explicitly threading calls themselves look like it finally triggered the latent bug. I'll close this ticket once confirmed for certain.
After much more debugging I determined that the workaround in comment #2 was not in fact reliable - it only worked 75% of the time. The root cause of the problems was a missing threads_enter/exit in the idle loop callback in virt-manager. This caused a mutex to be unlocked twice, which eventually/sometimes caused deadlock much later in the program. So it looks like there is no problem in at-spi itself - the new locking stuff in latest at-spi is merely exposing a latent bug that was previously benign, but now critical.