Bug 231394
Summary: | Multi-threaded pygtk apps deadlock if accessibility is enabled | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Daniel Berrangé <berrange> | ||||
Component: | pygtk2 | Assignee: | Matthew Barnes <mbarnes> | ||||
Status: | CLOSED NOTABUG | QA Contact: | |||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | rawhide | CC: | rstrode | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2007-03-13 21:29:17 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Daniel Berrangé
2007-03-07 23:48:18 UTC
Created attachment 149535 [details]
GDB trace
Bugzilla totally mangled the stack trace by line wrapping it - here's a file
with the unmangled trace which should be more readable.
To test out whether calling 'gdk_threads_init()' before gtk_init() solves the problem I created a dummy python module containing this: $ cat gtkthreads.c #include <Python.h> #include <glib.h> #include <gdk/gdk.h> static struct PyMethodDef gtkthreads_module_methods[] = { { NULL, NULL, 0, NULL } }; PyMODINIT_FUNC initgtkthreads(void) { Py_InitModule("gtkthreads", gtkthreads_module_methods); g_thread_init(NULL); gdk_threads_init(); } $ gcc -Wall -fPIC -fpic -shared -o gtkthreads.so $(pkg-config --cflags --libs gtk+-2.0 glib-2.0 gobject-2.0 gthread-2.0) -I /usr/include/python2.5 gtkthreads.c $ mv gtkthreads.so /usr/lib64/python-2.5/site-packages/ And then changed import gtk To import gtkthreads import gtk With this nasty hack in place, I can successfully start virt-manager with Accessibility enabled. So this confirms that if you call gdk_threads_init() early enough, the problem goes away - just how to achieve this in a supportable fashion with official PyGTK apis instead of my nasty hack. Perhaps pygtk itself needs this mini 'gtkthreads' module which apps can import ? Sounds like upstream needs to be alerted to this. Has a b.g.o bug been filed yet? I've not filed any b.g.o bug report, just this b.r.c report. well the mailing-list discussion mentioned in comment 0 has a bug report: http://bugzilla.gnome.org/show_bug.cgi?id=329454 Something doesn't quite make sense to me though. gtk.gdk.threads_enter() should be a noop at gtk_init time, because threading hasn't been initialized yet! It seems like the only potential for deadlock is the time between the threads_init() call and the threads_enter() call. if there are any at-spi calls between those two calls the problem would happen, but otherwise shouldn't? So if you do gtk.gdk.threads_enter() immediately after gtk.gdk.threads_init() do things work? Is the backtrace above where the deadlock happens? No, the above trace was just to illustrate AT-SPI making a threads call before threading was initialized. The trace of the actual hang is: #0 0x0000003b98e0c6c8 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0 #1 0x0000003b98e08724 in _L_mutex_lock_103 () from /lib64/libpthread.so.0 #2 0x0000003b98e081c3 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00000036a924437e in gdk_event_prepare (source=0x726130, timeout=0x7fff2e1e2274) at gdkevents-x11.c:2272 #4 0x0000003b97e2f337 in IA__g_main_context_prepare (context=0x9e6730, priority=0x7fff2e1e22d4) at gmain.c:2364 #5 0x0000003b97e2fa2d in g_main_context_iterate (context=0x9e6730, block=1, dispatch=1, self=<value optimized out>) at gmain.c:2657 #6 0x0000003b97e300aa in IA__g_main_loop_run (loop=0x114bc80) at gmain.c:2881 #7 0x00000036a892d023 in IA__gtk_main () at gtkmain.c:1148 #8 0x00002aaaaf277544 in _wrap_gtk_main (self=<value optimized out>) at ./gtk.override:1133 #9 0x0000003ba4cb4f3b in PyEval_EvalFrameEx (f=0xfb1820, throwflag=<value optimized out>) at Python/ceval.c:3550 #10 0x0000003ba4cb538e in PyEval_EvalFrameEx (f=0x66a270, throwflag=<value optimized out>) at Python/ceval.c:3652 #11 0x0000003ba4cb5cd2 in PyEval_EvalCodeEx (co=0x2aaaaaae7eb8, globals=<value optimized out>, locals=<value optimized out>, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2833 #12 0x0000003ba4cb5d12 in PyEval_EvalCode (co=0x726130, globals=0x0, locals=0x3b98e0f658) at Python/ceval.c:494 #13 0x0000003ba4ccf832 in run_mod (mod=<value optimized out>, filename=<value optimized out>, globals=0x6242c0, locals=0x6242c0, flags=<value optimized out>, arena=<value optimized out>) at Python/pythonrun.c:1264 #14 0x0000003ba4ccf906 in PyRun_FileExFlags (fp=0x601010, filename=0x7fff2e1e4be5 "/usr/share/virt-manager/virt-manager.py", start=257, globals=0x6242c0, locals=0x6242c0, closeit=1, flags=0x7fff2e1e2aa0) at Python/pythonrun.c:1250 #15 0x0000003ba4cd0df8 in PyRun_SimpleFileExFlags (fp=<value optimized out>, filename=0x7fff2e1e4be5 "/usr/share/virt-manager/virt-manager.py", closeit=1, flags=0x7fff2e1e2aa0) at Python/pythonrun.c:870 #16 0x0000003ba4cda0a5 in Py_Main (argc=<value optimized out>, argv=0x7fff2e1e2bc8) at Modules/main.c:496 #17 0x0000003b96e1d9c4 in __libc_start_main () from /lib64/libc.so.6 #18 0x0000000000400629 in _start () The GDK docs say: http://developer.gnome.org/doc/API/2.0/gdk/gdk-Threads.html#gdk-threads-init "gdk_threads_init() This call must be made before any use of the main loop from GTK+; to be safe, call it before gtk_init()." So, with the original AT-SPI build at-spi-1.17.2-1.fc7, AT-SPI is calling threads_enter(), and talking to the GTK main loop before the app has initialized threads. With the new build at-spi-1.17.2-2.fc7, AT-SPI is not calling threads_enter() but is still talking to the GTK main loop before the app has initialized threads. So from my reading of the GTK docs, the root cause is that AT-SPI is talking to the GTK main loop functions before threading is initialized - it must not call any main loop functions until the app has initialized threading. So the patch in -2.fc7 had no effect, because making threads_enter() a no-op didn't change the fact that AT-SPI called GTK main loop functions before threads were initialized. My hack in comment #3 solved the problem because it ensured threads were initialized before AT-SPI got anywhere near the main loop. hmm still something doesn't seem right to me. Well, I don't think that gtk_init() runs and quits the main loop temporarily while its being run, or anything like that. So at-spi-1.17.2-1.fc7 was doing threads_leave() some_synchronous_corba_call() threads_enter() during gtk_init. The "drop the lock, make the call, take the lock" formula above is fine when it results from some gtk widget function getting called in a callback invoked by the mainloop because it's the responsibility of that callback to take the lock before using gtk functions (what i'm saying is that we can be relatively sure that the threads_leave() call is paired with an earlier threads_enter() call). Now in the gtk_init case (which is called way before gtk_main is ever run) there isn't a guarantee that the lock is already taken, so the threads_leave() call above would be unpaired. This isn't a problem if the user hasn't initialized threading before calling gtk_init() because the threads_leave() call would be a noop anyway. It's only a problem if the user initializes threading, then calls, gtk_init, the calls gdk_threads_enter right before running the mainloop. The sequence would be call threads init which creates mutex call gtk_init at-spi then unlocks new mutex that isn't locked (an error, but a benign one) make corba call lock mutex do other application setup lock_mutex (deadlock because mutex is already locked!) run mainloop unlock mutex run callbacks lock mutex at-spi-1.17.2-2.fc7 addressed that problem by explicitly not making the threads_leave() call (and the threads_enter() call after the corba call) if the corba call was made before the user called gtk_main(), and so then the previously problematic sequence would be: call threads init which creates mutex call gtk_init make corba call do other application setup lock_mutex run mainloop unlock mutex run callbacks lock mutex which should work fine. In the pygtk case though, it *seems* like the bad sequence mentioned above should never be a problem because threads_init() isn't called until after gtk_init() Do you have any other threads running when you get the above backtrace? what are their traces like? No, there's only a single thread running at this point in the game.
> Well, I don't think that gtk_init() runs and quits the main loop temporarily
> while its being run, or anything like that."
My take from the GTK docs was that you need to init threading before calling
*any* method related to the GTK event loop - eg that would include the addition
/ removal of a event source. I could be being overly paranoid in interpreting it
that way though - the doc is very unclearly worded in this regard.
So if there is only one thread running and it's hanging waiting for the lock, that means that the lock must have been taken at some point and not released (or that the lock somehow got corrupted). So I talked with owen a bit on irc: [17:17:39] <halfline> so does "gdk_threads_init() This call must be made before any use of the main loop from GTK+; to be safe, call it before gtk_init()." [17:17:55] <halfline> mean don't call gdk_threads_init() from within a callback of the mainloop [17:18:12] <halfline> or does it mean, don't call g_timeout_add() before calling gdk_threads_init() ? [17:18:42] <owen> halfline: It's talking about entering the main loop [17:18:57] <owen> Adding sources will be harmless I built a custom version of GTK which logged a stack trace for every single gdk_threads_enter() and gdk_threads_exit() call, and discovered that threads_exit() was being called twice in a row when AT-SPI was enabled. A little futher investigation identified an idle callback in virt-manager which was not protected with a threads_enter/exit call. I need to do a little further debugging to verify that this has solved the issue, but it looks like an app bug currently - previously it was a harmless issue, because only a single thread would ever update the GTK UI, but with the AT-SPI hooks now doing explicitly threading calls themselves look like it finally triggered the latent bug. I'll close this ticket once confirmed for certain. After much more debugging I determined that the workaround in comment #2 was not in fact reliable - it only worked 75% of the time. The root cause of the problems was a missing threads_enter/exit in the idle loop callback in virt-manager. This caused a mutex to be unlocked twice, which eventually/sometimes caused deadlock much later in the program. So it looks like there is no problem in at-spi itself - the new locking stuff in latest at-spi is merely exposing a latent bug that was previously benign, but now critical. |