Description of problem: I'm running evolution-0:2.12.3-3.fc8.x86_64 and quite often now it seems to hang with "formatting message" in the status bar. I've only have this problem for about a week or so. From the log files: /var/log/yum.log-20080129:Jan 25 11:21:42 Updated: evolution-2.12.3-1.fc8.x86_64 /var/log/yum.log-20080129:Jan 28 19:32:37 Updated: beagle-evolution-0.2.18-4.fc8.x86_64 /var/log/yum.log-20080312:Mar 07 21:36:54 Updated: evolution-2.12.3-3.fc8.x86_64 ...I've also started running beagle "recently", as that doesn't eat all my CPU now ... so maybe it's related to that? I had thought it was related to having evolution-zimbra installed, so I uninstalled that over the weekend ... but it still does it. This morning I tried doing "taskset -p 1 <evolution-tids>" (as I have two cores) and so far that seems to have helped, so maybe just a normal deadlock somewhere.
Does it hang on some particular message, say with special attachment like v-calendar, or is it some HTML mail with images? When it gets hang, can you attach gdb to the "running" Evolution process and paste here output of "thread apply all bt" command from gdb? It will show us where it gets stuck. Thanks in advance.
It's happened on a number of different messages, and doing "evolution --force-shutdown" then running it again, and selecting the same message has worked every time. I have HTML email rendering turned off, so it's not that. As I said, it hasn't happened since I did the taskset thing ... so hopefully it's not going to happen again, until I reboot/re-taskset, which I'm not dying to do right now. I'll try and get you that data when I am running it on multiple CPUs again, assuming it hangs.
OK, thanks for the info. I see I forgot to mention whether you can install debug info packages for gtkhtml, evolution and evolution-data-server, (and maybe evolution-exchange if you use it) so the traces will have symbols.
Yeh, I guessed that'd help ... so I already did a "debuginfo-install evolution" so I should be covered.
(gdb) info threads 5 Thread 1094719824 (LWP 29521) 0x0000003b0eacbd66 in __poll (fds=0x864bc0, nfds=2, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:87 4 Thread 1147169104 (LWP 29543) 0x0000003b0eacbd66 in __poll (fds=0xecea00, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:87 3 Thread 1147435344 (LWP 29544) 0x0000003b0eacbd66 in __poll (fds=0xed3930, nfds=7, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:87 2 Thread 1126189392 (LWP 11246) 0x0000003b0f60a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 * 1 Thread 46912496455616 (LWP 29516) 0x0000003b0eacbd66 in __poll ( fds=0xd14250, nfds=5, timeout=124) at ../sysdeps/unix/sysv/linux/poll.c:87 I'm guessing this is the problem: (gdb) thread 2 [Switching to thread 2 (Thread 1126189392 (LWP 11246))]#0 0x0000003b0f60a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 Current language: auto; currently asm (gdb) bt #0 0x0000003b0f60a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x000000301fa1c8df in ?? () from /usr/lib64/libebook-1.2.so.9 #2 0x000000301fa1cadb in e_book_get_contacts () from /usr/lib64/libebook-1.2.so.9 #3 0x00002aaab0dbcc51 in em_utils_contact_photo (cia=<value optimized out>, local=<value optimized out>) at em-utils.c:2092 #4 0x00002aaab0daa994 in efh_format_message (emf=0x82d150, stream=0xe8dc70, part=0x2aaabc7d2d98, info=<value optimized out>) at em-format-html.c:1930 #5 0x00002aaab0da9640 in efh_format_exec (m=0xef0470) at em-format-html.c:1254 #6 0x00002aaab0dcac7a in mail_msg_proxy (msg=0xef0470) at mail-mt.c:500 #7 0x00000030e3e52669 in g_thread_pool_thread_proxy ( data=<value optimized out>) at gthreadpool.c:265 #8 0x00000030e3e50b24 in g_thread_create_proxy (data=0xecb370) at gthread.c:635 #9 0x0000003b0f606407 in start_thread () from /lib64/libpthread.so.0 #10 0x0000003b0ead4b0d in clone () from /lib64/libc.so.6 But here's the full list anyway: (gdb) thread apply all bt Thread 5 (Thread 1094719824 (LWP 29521)): #0 0x0000003b0eacbd66 in __poll (fds=0x864bc0, nfds=2, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00000030e3e3209e in g_main_context_iterate (context=0x85fe80, block=1, dispatch=1, self=<value optimized out>) at gmain.c:2996 #2 0x00000030e3e3255a in IA__g_main_loop_run (loop=0x85b990) at gmain.c:2898 #3 0x00000030eaa068c3 in libnm_glib_dbus_worker (user_data=0x854750) at libnm_glib.c:427 #4 0x00000030e3e50b24 in g_thread_create_proxy (data=0x85bae0) at gthread.c:635 #5 0x0000003b0f606407 in start_thread () from /lib64/libpthread.so.0 #6 0x0000003b0ead4b0d in clone () from /lib64/libc.so.6 Current language: auto; currently c Thread 4 (Thread 1147169104 (LWP 29543)): #0 0x0000003b0eacbd66 in __poll (fds=0xecea00, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00000030e3e3209e in g_main_context_iterate (context=0xe9bca0, block=1, dispatch=1, self=<value optimized out>) at gmain.c:2996 #2 0x00000030e3e3255a in IA__g_main_loop_run (loop=0xecdac0) at gmain.c:2898 #3 0x000000301fa181fd in ?? () from /usr/lib64/libebook-1.2.so.9 #4 0x00000030e3e50b24 in g_thread_create_proxy (data=0xe9bd80) at gthread.c:635 ---Type <return> to continue, or q <return> to quit--- #5 0x0000003b0f606407 in start_thread () from /lib64/libpthread.so.0 #6 0x0000003b0ead4b0d in clone () from /lib64/libc.so.6 Thread 3 (Thread 1147435344 (LWP 29544)): #0 0x0000003b0eacbd66 in __poll (fds=0xed3930, nfds=7, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00000030e3e3209e in g_main_context_iterate (context=0xd153d0, block=1, dispatch=1, self=<value optimized out>) at gmain.c:2996 #2 0x00000030e3e3255a in IA__g_main_loop_run (loop=0xd1ba60) at gmain.c:2898 #3 0x00000030e8a463b0 in link_io_thread_fn (data=<value optimized out>) at linc.c:396 #4 0x00000030e3e50b24 in g_thread_create_proxy (data=0x8bfb80) at gthread.c:635 #5 0x0000003b0f606407 in start_thread () from /lib64/libpthread.so.0 #6 0x0000003b0ead4b0d in clone () from /lib64/libc.so.6 Thread 2 (Thread 1126189392 (LWP 11246)): #0 0x0000003b0f60a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x000000301fa1c8df in ?? () from /usr/lib64/libebook-1.2.so.9 #2 0x000000301fa1cadb in e_book_get_contacts () from /usr/lib64/libebook-1.2.so.9 #3 0x00002aaab0dbcc51 in em_utils_contact_photo (cia=<value optimized out>, ---Type <return> to continue, or q <return> to quit--- local=<value optimized out>) at em-utils.c:2092 #4 0x00002aaab0daa994 in efh_format_message (emf=0x82d150, stream=0xe8dc70, part=0x2aaabc7d2d98, info=<value optimized out>) at em-format-html.c:1930 #5 0x00002aaab0da9640 in efh_format_exec (m=0xef0470) at em-format-html.c:1254 #6 0x00002aaab0dcac7a in mail_msg_proxy (msg=0xef0470) at mail-mt.c:500 #7 0x00000030e3e52669 in g_thread_pool_thread_proxy ( data=<value optimized out>) at gthreadpool.c:265 #8 0x00000030e3e50b24 in g_thread_create_proxy (data=0xecb370) at gthread.c:635 #9 0x0000003b0f606407 in start_thread () from /lib64/libpthread.so.0 #10 0x0000003b0ead4b0d in clone () from /lib64/libc.so.6 Current language: auto; currently asm Thread 1 (Thread 46912496455616 (LWP 29516)): #0 0x0000003b0eacbd66 in __poll (fds=0xd14250, nfds=5, timeout=124) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00000030e3e3209e in g_main_context_iterate (context=0x65f520, block=1, dispatch=1, self=<value optimized out>) at gmain.c:2996 #2 0x00000030e3e3255a in IA__g_main_loop_run (loop=0x6a0e00) at gmain.c:2898 #3 0x00000030ebe2ce16 in bonobo_main () at bonobo-main.c:311 #4 0x0000000000415cfb in main (argc=<value optimized out>, argv=0x7fffeaaf4338) at main.c:602 #5 0x0000003b0ea1e074 in __libc_start_main (main=0x4159b0 <main>, argc=1, ubp_av=0x7fffeaaf4338, init=<value optimized out>, ---Type <return> to continue, or q <return> to quit--- fini=<value optimized out>, rtld_fini=<value optimized out>, stack_end=0x7fffeaaf4328) at libc-start.c:220 #6 0x0000000000409dd9 in _start () Current language: auto; currently c #0 0x0000003b0f60a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 Current language: auto; currently asm
I think you're right, looks like it's stuck waiting for a response from evolution-data-server. Can you try to get a backtrace of the evolution-data-server process when the hang occurs? Hopefully that will reveal the /real/ cause of the hang.
(gdb) bt #0 0x0000003b0eacbd66 in __poll (fds=0x8a5ea4, nfds=1, timeout=10) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00002aaab0c40e84 in ldap_result () from /usr/lib64/evolution-data-server-1.2/extensions/libebookbackendldap.so #2 0x00002aaab0c3d6d5 in ?? () from /usr/lib64/evolution-data-server-1.2/extensions/libebookbackendldap.so #3 0x00000030e3e2f68b in g_timeout_dispatch (source=0x6b96f0, callback=0x1, user_data=0xa) at gmain.c:3488 #4 0x00000030e3e2ef53 in IA__g_main_context_dispatch (context=0x61b120) at gmain.c:2061 #5 0x00000030e3e3224d in g_main_context_iterate (context=0x61b120, block=1, dispatch=1, self=<value optimized out>) at gmain.c:2694 #6 0x00000030e3e3255a in IA__g_main_loop_run (loop=0x63efa0) at gmain.c:2898 #7 0x00000030ebe2ce16 in bonobo_main () at bonobo-main.c:311 #8 0x0000000000403e8e in ?? ()
I've just turned off ldap lookups, by default, so this might go away for me now.
Thanks for the backtrace, but it's missing debugging symbols and doesn't show all active threads. Can I ask you to install evolution-data-server-debuginfo and, if you happen to see this again, run a "thread apply all bt" command from GDB?
Sure, for some reason I didn't think that was threaded ... oh well. Also it's worth noting that the ldap is over a vpn, and the vpn had gone down when the above happened. As another point/bug, I killed the evolution-data-server with kill -9 and evolution itself still didn't recover.
(In reply to comment #10) > Sure, for some reason I didn't think that was threaded ... oh well. I wish it wasn't. > Also it's worth noting that the ldap is over a vpn, and the vpn had gone down > when the above happened. > > As another point/bug, I killed the evolution-data-server with kill -9 and > evolution itself still didn't recover. On the E-D-S side, it could have just been waiting for a socket to timeout. Adding custom timeouts to connect() calls is, unfortunately, a bit tricky. I've seen these hangs myself on those /rare/ *ahem* occasions when our VPN drops out. Most of the time Evolution will eventually recover. "kill -9" may have bypassed whatever mechanism Evolution uses to detect when E-D-S dies. In any other app this would just be a matter of listening for a SIGCHLD signal, but Evolution and E-D-S talk over Bonobo, and who knows what "kill -9" does to a CORBA server.
Any updates on this James? Still seeing the hangs?
I think so, at least I haven't had it hang for ages recently ... but then I had to have it rebuild all of ~/evolution so that might have helped.