181805 – Evolution hangs during startup while accessing imap server

Bug 181805 - Evolution hangs during startup while accessing imap server

Summary: Evolution hangs during startup while accessing imap server

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	evolution
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Matthew Barnes
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	FC5Target
TreeView+	depends on / blocked

Reported:	2006-02-16 18:56 UTC by Stephen Tweedie
Modified:	2007-11-30 22:11 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2006-12-16 23:11:23 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
GDB backtrace of all running threads (5.21 KB, text/plain) 2006-02-16 18:58 UTC, Stephen Tweedie	no flags	Details
pstack of all running threads (4.15 KB, text/plain) 2006-02-16 18:58 UTC, Stephen Tweedie	no flags	Details
ps uHx -L output for evo process (688 bytes, text/plain) 2006-02-16 19:02 UTC, Stephen Tweedie	no flags	Details
strace output (4.34 KB, text/plain) 2006-02-16 19:04 UTC, Stephen Tweedie	no flags	Details
pstack series (3.14 KB, text/plain) 2006-02-16 19:10 UTC, Stephen Tweedie	no flags	Details
Workaround to disable imap LIST until it can be handled sanely. (632 bytes, patch) 2006-02-16 22:54 UTC, Stephen Tweedie	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
GNOME Bugzilla	331479	0	None	None	None	Never

Description Stephen Tweedie 2006-02-16 18:56:22 UTC

Description of problem:
I'm trying, repeatedly, to run evolution on rawhide; and failing.  For the past
couple of weeks, with and without SELinux on, on i386 and on x86_64, and using
either a new .evolution, or one inherited from either FC-4 or RHEL-4, the
symptoms are exactly the same: evo tries to download my imap data, gets about
100MB of network download done, and then goes into a 100% CPU spin performing no
network traffic and making no further progress.  The GUI remains responsive but
the imap server is never accessible and no further network IO is performed.

Version-Release number of selected component (if applicable):
evolution-2.5.90-2.1
evolution-data-server-1.5.90-2.2

How reproducible:
100%

Steps to Reproduce:
1. Run evolution.
2. Wait.  And wait.  And wait.
  
Actual results:
None.

Expected results:
My email.  :-)

Additional info:
In all cases, I have noticed that the status line has popped up
Pinging IMAP server $myserver (...)
when the CPU hang has occurred.  I believe that the actual timing of this popup
is related to the occurrence of the hang, although I cannot prove this.

pstack and gdb bt shows both the initial mailbox load and the imap ping threads
trying to access the imap stream when the problem occurs.  I suspect this is not
an accident; we really should not be trying to ping a server when we *know* we
are very busy accessing it from another thread.  But even if this is the case,
the failure mode of the normal evo-data-server thread should be better than this.

The exact behaviour seen when the problem manifests will be attached below.

Comment 1 Stephen Tweedie 2006-02-16 18:58:11 UTC

Created attachment 124774 [details]
GDB backtrace of all running threads

Comment 2 Stephen Tweedie 2006-02-16 18:58:51 UTC

Created attachment 124775 [details]
pstack of all running threads

Comment 3 Stephen Tweedie 2006-02-16 19:02:47 UTC

Created attachment 124776 [details]
ps uHx -L output for evo process

ps output showing ~90 minutes of accumulated CPU time on the affected thread
and everything else idle.

Comment 4 Stephen Tweedie 2006-02-16 19:04:01 UTC

Created attachment 124777 [details]
strace output

30-second strace output showing the thread doing nothing but grow its own
memory over the interval being watched.

Comment 5 Stephen Tweedie 2006-02-16 19:10:17 UTC

Created attachment 124778 [details]
pstack series

Series of 5 pstack snapshots of the imap thread, showing that _something_ is
happening in there --- it's not being captured in the same place each time ---
but it is never getting out of camel_imap_store_summary_full_name().  A dozen
pstacks in succession show this near the top of the stack each time.

Comment 6 Stephen Tweedie 2006-02-16 20:42:59 UTC

OK, it's nothing to do with the server ping: disabling it by adding

--- evolution-data-server-1.5.91/camel/providers/imap/camel-imap-store.c~    
2006-02-12 23:11:11.000000000 -0500
+++ evolution-data-server-1.5.91/camel/providers/imap/camel-imap-store.c     
2006-02-16 15:06:58.000000000 -0500
@@ -1644,6 +1644,8 @@
        CamelImapResponse *response;
        CamelFolder *current_folder;

+       return;
+
        CAMEL_SERVICE_LOCK (imap_store, connect_lock);

        if (!camel_imap_store_connected(imap_store, ex))

to the e-d-s build results in the same symptoms but with no ping on the status
line or in the pstack output.

Comment 7 Stephen Tweedie 2006-02-16 22:51:31 UTC

Found it!  On fetching the server folder list, we call get_folders_sync(), which
does:

	/* We do a LIST followed by LSUB, and merge the results.  LSUB may not be a strict
	   subset of LIST for some servers, so we can't use either or separately */
	present = g_hash_table_new(folder_hash, folder_eq);
	for (j=0;j<2;j++) {
		response = camel_imap_command (imap_store, NULL, ex,
					       "%s \"\" %G", j==1 ? "LSUB" : "LIST",

This has two problems.  First, doing a LIST when we haven't even asked for a
full folder list is hideously expensive if you are running an imap server that
serves out of your homedir and you have a lot of files there (like, for example,
several exploded kernel trees!)

Second, the merging of these lists is O(N^2), as we call:

parse_list_response_as_folder_info
->camel_imap_store_summary_add_from_full
  ->camel_imap_store_summary_full_name

and this last routine, called for each folder added, compares the new name to
all previous ones by exhaustive search.  Whoops.

As a simple workaround, I changed the
	for (j=0;j<2;j++) {
in get_folders_sync() to
	for (j=0;j<2;j++) {
to disable the LIST and simply use LSUB to populate the folder list; this works
perfectly for accessing my existing subscribed folders and allows me access to
my email again.  Without this change, I simply cannot open my account.

Comment 8 Stephen Tweedie 2006-02-16 22:54:26 UTC

Created attachment 124793 [details]
Workaround to disable imap LIST until it can be handled sanely.

Comment 9 Dave Malcolm 2006-02-16 23:05:09 UTC

Did you change it to
 (j=1;j<2;j++) {
rather than 
 (j=0;j<2;j++) {
?

Comment 10 Stephen Tweedie 2006-02-16 23:35:36 UTC

Yes. :)  The attached patch has it right.

Comment 11 Stephen Tweedie 2006-02-16 23:37:32 UTC

Filed upstream as

http://bugzilla.gnome.org/show_bug.cgi?id=331479

Comment 13 Michel Alexandre Salim 2006-02-26 02:12:28 UTC

Could an interim release be pushed that has this change applied? I am in the
same situation (IMAP served out of home directory) and I can't even get a folder
listing (though CPU usage did not skyrocket)

Comment 14 David Woodhouse 2006-02-28 16:26:21 UTC

Please don't push an interim release with this precise change -- it'll break
imap for anyone who views all folders, not only subscribed folders. At least
make it conditional on the use_lsub configuration.

Comment 15 David Woodhouse 2006-03-15 14:36:02 UTC

Note that there's no real need to be doing this separate LIST and LSUB on any
server which supports LISTEXT -- you can just use LIST (SUBSCRIBED) instead.

Comment 16 Matthew Barnes 2006-09-21 19:20:09 UTC

This bug just got reassigned to me.

Can someone give me an update on the status of this in Evolution 2.8?

Comment 17 Michel Alexandre Salim 2006-11-02 02:21:09 UTC

My mail account got moved to a dedicated server just before I started reusing
Evolution, unfortunately.

Comment 18 Matthew Barnes 2006-12-16 23:11:23 UTC

Resolving this bug as UPSTREAM since the problem has been filed upstream (see
comment #11) and there's been no confirmation that the bug still exists in the
current Rawhide release.

Please refer to the upstream bug report to continue tracking this issue.

Note You need to log in before you can comment on or make changes to this bug.