Bug 809364 - Evolution not closing after network failure
Summary: Evolution not closing after network failure
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: evolution-ews
Version: 17
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Matthew Barnes
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 835593 837821 839180 841729 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-04-03 08:14 UTC by Mikhail
Modified: 2012-08-08 11:26 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-04-04 12:07:35 UTC
Type: Bug


Attachments (Terms of Use)
backtrace (5.47 KB, text/plain)
2012-04-03 08:14 UTC, Mikhail
no flags Details
backtrace for evolution (8.53 KB, text/plain)
2012-07-05 12:27 UTC, Mikhail
no flags Details
backtrace for evolution 2 (8.53 KB, text/plain)
2012-07-05 12:27 UTC, Mikhail
no flags Details
yet another backtrace for evolution (evolution stuck on recieve message) (4.40 KB, text/plain)
2012-07-09 05:14 UTC, Mikhail
no flags Details
yet another backtrace for evolution (evolution stuck on exit) (7.27 KB, text/plain)
2012-07-09 05:15 UTC, Mikhail
no flags Details
stuck again.... backtrace.... (2.72 KB, text/plain)
2012-07-09 06:59 UTC, Mikhail
no flags Details
proof screenshot (362.43 KB, image/png)
2012-07-09 07:00 UTC, Mikhail
no flags Details
yet another backtrace for evolution (3.31 KB, text/plain)
2012-07-09 12:40 UTC, Mikhail
no flags Details
backtrace for evolution (4.40 KB, text/plain)
2012-07-11 10:02 UTC, Mikhail
no flags Details
backtrace for evolution (4.98 KB, text/plain)
2012-07-11 10:27 UTC, Mikhail
no flags Details
backtrace for evolution (4.39 KB, text/plain)
2012-07-11 10:28 UTC, Mikhail
no flags Details
backtrace for last case (8.53 KB, text/plain)
2012-07-11 14:47 UTC, Mikhail
no flags Details
backtrace for evolution (8.51 KB, text/plain)
2012-07-13 07:40 UTC, Mikhail
no flags Details
backtrace for evolution (7.67 KB, text/plain)
2012-07-17 06:56 UTC, Mikhail
no flags Details
backtrace for evolution (6.62 KB, text/plain)
2012-07-17 09:42 UTC, Mikhail
no flags Details
backtrace for evolution (4.77 KB, text/plain)
2012-07-17 09:43 UTC, Mikhail
no flags Details
backtrace for evolution (4.56 KB, text/plain)
2012-07-17 13:15 UTC, Mikhail
no flags Details
backtrace for evolution (evolution-ews-3.4.3-1.9) stuck again (4.72 KB, text/plain)
2012-07-19 13:14 UTC, Mikhail
no flags Details
backtrace (4.72 KB, text/plain)
2012-07-19 15:38 UTC, Mikhail
no flags Details
backtrace evolution (4.59 KB, text/plain)
2012-07-19 19:24 UTC, Mikhail
no flags Details
yet another backtrace for evolution (7.22 KB, text/plain)
2012-07-20 09:07 UTC, Mikhail
no flags Details
proposed ews patch (1.93 KB, patch)
2012-07-24 12:06 UTC, Milan Crha
no flags Details | Diff
backtrace evolution 2012-08-07 (7.50 KB, text/plain)
2012-08-07 20:31 UTC, Mikhail
no flags Details
backtrace evolution 2012-08-07.1 (8.26 KB, text/plain)
2012-08-07 20:32 UTC, Mikhail
no flags Details


Links
System ID Private Priority Status Summary Last Updated
GNOME Bugzilla 663383 0 None None None Never

Description Mikhail 2012-04-03 08:14:04 UTC
Created attachment 574778 [details]
backtrace

Description of problem:
Evolution not closed after network failure

Comment 1 Hendrik Borghorst 2012-04-03 20:39:04 UTC
I can confirm, there is something wrong with evolution-networkmanager. Evolution is also not reconnecting to imap after suspend or other wireless lan loss. 

Because the title is imprecise I changed it.

Comment 2 Milan Crha 2012-04-04 12:07:35 UTC
Thanks for a bug report. I'm afraid that Mikhail's issue is ot that much related to evolution-NetworkManager, it's rather about evolution-ews, not closing itself after connection lost. The suspend is also poorly implemented in the camel providers and calendar/book backends in evolution, which is a semi-known issue. There is an advice to stop evolution before you suspend your machine.

From the backtrace: I would expect seeing there the EWS account being stuck on a connection, but it isn't. Thread 4 is freeing the store, waiting for ews' soup thread to finish, but it doesn't need to do that for some reason. There is also Thread 3, which would like to update folder list, but it's waiting on the lock to be released by Thread 4.

I found basically the same bug report upstream [1] (there is surely for suspend too, how about network manager I'm not sure), thus I'm moving this there. Please see [1]  for any further updates.

[1] https://bugzilla.gnome.org/show_bug.cgi?id=663383

Comment 3 Milan Crha 2012-07-04 16:07:07 UTC
*** Bug 835593 has been marked as a duplicate of this bug. ***

Comment 4 Milan Crha 2012-07-04 16:21:38 UTC
I made a test build [1] with upstream patches included. I'll appreciate if you could give it a try. Thanks in advance.

[1] http://koji.fedoraproject.org/koji/taskinfo?taskID=4218557

Comment 5 Mikhail 2012-07-05 12:27:17 UTC
Created attachment 596396 [details]
backtrace for evolution

Comment 6 Mikhail 2012-07-05 12:27:53 UTC
Created attachment 596397 [details]
backtrace for evolution 2

Comment 7 Mikhail 2012-07-05 12:28:51 UTC
Today evolution with updated packages stuck twice on exit.
I attached backtraces here.

Comment 8 Mikhail 2012-07-06 03:31:08 UTC
This occurs after massive message deletion

Comment 9 Mikhail 2012-07-09 05:13:47 UTC
Today stuck again. I don't know what is happens....

Also I attach new backtraces...

Comment 10 Mikhail 2012-07-09 05:14:18 UTC
Created attachment 596989 [details]
yet another backtrace for evolution (evolution stuck on recieve message)

Comment 11 Mikhail 2012-07-09 05:15:46 UTC
Created attachment 596990 [details]
yet another backtrace for evolution (evolution stuck on exit)

Comment 12 Mikhail 2012-07-09 05:34:46 UTC
$ rpm -qa | grep evolution | sort
evolution-3.4.3-2.fc17.i686
evolution-data-server-3.4.3-1.fc17.i686
evolution-data-server-debuginfo-3.4.3-1.fc17.i686
evolution-debuginfo-3.4.3-2.fc17.i686
evolution-ews-3.4.3-1.3.fc17.i686
evolution-ews-debuginfo-3.4.3-1.3.fc17.i686
evolution-exchange-3.4.3-1.fc17.i686
evolution-exchange-debuginfo-3.4.3-1.fc17.i686
evolution-mapi-3.4.3-5.fc17.i686
evolution-mapi-debuginfo-3.4.3-5.fc17.i686
evolution-NetworkManager-3.4.3-2.fc17.i686

Comment 13 Milan Crha 2012-07-09 06:23:03 UTC
*** Bug 837821 has been marked as a duplicate of this bug. ***

Comment 14 Mikhail 2012-07-09 06:58:25 UTC
down/up em3 and we again hangs....

Comment 15 Mikhail 2012-07-09 06:59:09 UTC
Created attachment 597015 [details]
stuck again.... backtrace....

Comment 16 Mikhail 2012-07-09 07:00:30 UTC
Created attachment 597017 [details]
proof screenshot

Comment 17 Milan Crha 2012-07-09 07:09:29 UTC
I think it's waiting for a timeout on the connection, and after that it'll report failure to UI. I do not know what the default timeout is, but when I was testing my changes in the upstream bug then I made it smaller, to 10 seconds. Could you wait few minutes (I guess for 5 minutes max) to see whether this is really waiting for a connection timeout, please? If you run evolution from a terminal, then you might see changes there, like the information:
> Unable to fetch the folder hierarchy: server.com: code
where code is an error number, in your case usually 288, which is: EWS_CONNECTION_ERROR_NORESPONSE. You can see even more verbosity on the client-server communication if you run evolution with EWS_DEBUG=2 variable being set, though I'm not sure whether it'll be anyhow useful.

Interesting is that the last backtrace (comment #15) is basically empty, each thread is in idle, while the other backtraces are showing activities in multiple threads, all waiting for a server response.

Comment 18 Mikhail 2012-07-09 08:15:30 UTC
please check your mail. I send to you some info.

Comment 19 Milan Crha 2012-07-09 10:53:37 UTC
Got it, thanks. I see there that "WARNING **: No response: Cannot resolve hostname (xxx)", interestingly following network requests are working correctly. I also noticed in the log following runtime warnings:
   GLib-GObject-WARNING **: instance with invalid (NULL) class pointer

   GLib-GObject-CRITICAL **: g_signal_handler_disconnect: assertion
   `G_TYPE_CHECK_INSTANCE (instance)' failed

which are there before the "No response" warning.

I'm building [1] a test package which has set 10 seconds timeout on connection, maybe it'll help to narrow things down (and to not wait that long for timeout after your interface or connection is down).

[1] http://koji.fedoraproject.org/koji/taskinfo?taskID=4226662

Comment 20 Mikhail 2012-07-09 12:40:17 UTC
Created attachment 597064 [details]
yet another backtrace for evolution

Comment 21 Mikhail 2012-07-09 12:46:47 UTC
https://docs.google.com/open?id=0B0nwzlfiB4aQS0N3NE8zTzNYOGM

Please check your mail

Comment 22 Milan Crha 2012-07-10 08:50:28 UTC
Interesting. I see from the log that the lower timeout set on the soup session is not suitable for you, the log shows quite many timeouts. I also see in your video that the Send/Receive dialog is stuck in cancelling, though the backtrace shows that there is no pending operation in ews, basically the same as evolution being idle.

Comment 23 Milan Crha 2012-07-10 09:05:45 UTC
The issue in Send/Receive is not caused by EWS, it is a bug in evolution itself. I committed a fix for 3.4.4+ (commit dcab280) and 3.5.4+ (commit 82e9800) for it, and I'm currently building a test package of evolution at [1]. Please downgrade your evolution-ews package, to the previous version, without shortened timeout and then install test evolution package. It should be better now.

[1] http://koji.fedoraproject.org/koji/taskinfo?taskID=4229524

Comment 24 Mikhail 2012-07-10 12:45:31 UTC
Thanks, I started testing. Another would be to defeat network problems in empathy. Empathy developers someday come here? I have an very interesting video for them. https://docs.google.com/open?id=0B0nwzlfiB4aQLTFlWHp2WUFfZDQ

Comment 25 Mikhail 2012-07-10 13:42:51 UTC
So Evolution also stop receive any message after I restore connection.
Please see my video, and check email for logs.

https://docs.google.com/open?id=0B0nwzlfiB4aQcjJwQkJSVG5keXM

Sorry for long video , but I want demonstrate how long time evolution do nothing. And helps only restart program.

Comment 26 Milan Crha 2012-07-11 06:41:30 UTC
(In reply to comment #24)
> Thanks, I started testing. Another would be to defeat network problems in
> empathy. Empathy developers someday come here? I have an very interesting
> video for them. https://docs.google.com/open?id=0B0nwzlfiB4aQLTFlWHp2WUFfZDQ

I'm afraid not. File a separate bug against emapthy, either here, or preferably at upstream: https://bugzilla.gnome.org/enter_bug.cgi?product=empathy

Comment 27 Milan Crha 2012-07-11 08:44:02 UTC
(In reply to comment #25)
> So Evolution also stop receive any message after I restore connection.
> Please see my video, and check email for logs.
> 
> https://docs.google.com/open?id=0B0nwzlfiB4aQcjJwQkJSVG5keXM
> 
> Sorry for long video , but I want demonstrate how long time evolution do
> nothing. And helps only restart program.

OK, it's either stuck or I missed one more place in evolution to report "done" state. The EWS logs are particularly useless, they do not bring anything interesting. I would prefer to see backtrace of evolution of this "Updating, but doing nothing" state.

This is interesting, looking around the libsoup code I realized that its default timeout is 0, which means no timeout, the request will never stop on its own. Let's set timeouts to some sane value, say 2 minutes.

Comment 28 Milan Crha 2012-07-11 09:06:59 UTC
Here [1] is a test build which presets default connection timeout to 2 minutes. You can tweak the timeout by setting environment variable EWS_CONN_TIMEOUT to seconds you'd like to have timeout set for. The value is plain number and if it is not set properly, like it's a tring, zero or a negative value, then the 2 minutes timeout is used. I committed this change as commit 2d693ca to gnome-3-4 for 3.4.4+.

[1] http://koji.fedoraproject.org/koji/taskinfo?taskID=4232982

Comment 29 Mikhail 2012-07-11 10:02:56 UTC
Created attachment 597521 [details]
backtrace for evolution

Comment 30 Mikhail 2012-07-11 10:03:57 UTC
Build from [Comment 28] not help to solve above problem. I add backtrace for this case.

Comment 31 Mikhail 2012-07-11 10:27:33 UTC
Created attachment 597524 [details]
backtrace for evolution

Comment 32 Mikhail 2012-07-11 10:28:10 UTC
Created attachment 597525 [details]
backtrace for evolution

Comment 33 Mikhail 2012-07-11 14:46:48 UTC
And now all was hanging the whole UI.

https://docs.google.com/open?id=0B0nwzlfiB4aQUElsRkhuSVRwQkU

Comment 34 Mikhail 2012-07-11 14:47:42 UTC
Created attachment 597596 [details]
backtrace for last case

Comment 35 Milan Crha 2012-07-12 06:44:29 UTC
*** Bug 839180 has been marked as a duplicate of this bug. ***

Comment 36 Milan Crha 2012-07-12 08:01:07 UTC
The three backtraces starting at comment #29 are basically the same, one thread is waiting for finish of e_ews_connection_sync_folder_items() call.

Backtrace from comment #34 is similar, except there are 3 such threads. I see on one that it is called with NULL 'cancellable', which means it cannot be cancelled. Though it doesn't explain the UI freeze, because the main thread, which is responsible for UI, is not blocked in that backtrace. Ah, I see in the video, the UI is disabled, with grayed widgets. It's on purpose. Once you click to quit evolution it disables all its windows and then starts quitting. Such quitting is about synchronizing last changes into the server and evolution will not quit until there is no left pending operation. It had two pending operations in the status bar. There is also shown in the video that the UI is not frozen, because you resized the window and it properly updated inner widgets, thus it's just waiting for a response.

I'm currently out of idea what could happen to the response, either it's completely lost or there's a bug in ews code that doesn't call operation callback for this request for some reason. I only know that I'm not able to reproduce this on my machine that easily as you, unfortunately.

Comment 37 Mikhail 2012-07-13 07:40:37 UTC
Created attachment 597995 [details]
backtrace for evolution

Comment 38 Milan Crha 2012-07-16 17:14:39 UTC
Still the same pattern, this time three threads in e_ews_connection_update_items(). I cannot tell from it whether the right connection is used in the waiting thread or not, due to current code changes. I created yet another test build which will show in the backtrace whether the same connection is used. You can download it from:

http://koji.fedoraproject.org/koji/taskinfo?taskID=4244435

Comment 39 Mikhail 2012-07-17 06:56:20 UTC
Created attachment 598564 [details]
backtrace for evolution

Comment 40 Mikhail 2012-07-17 06:56:39 UTC
$ rpm -qa | grep evolution | sort
evolution-3.4.3-2.1.fc17.i686
evolution-data-server-3.4.3-1.fc17.i686
evolution-data-server-debuginfo-3.4.3-1.fc17.i686
evolution-debuginfo-3.4.3-2.1.fc17.i686
evolution-ews-3.4.3-1.6.fc17.i686
evolution-ews-debuginfo-3.4.3-1.6.fc17.i686
evolution-exchange-3.4.3-1.fc17.i686
evolution-exchange-debuginfo-3.4.3-1.fc17.i686
evolution-mapi-3.4.3-5.fc17.i686
evolution-mapi-debuginfo-3.4.3-5.fc17.i686
evolution-NetworkManager-3.4.3-2.1.fc17.i686

Comment 41 Mikhail 2012-07-17 09:42:22 UTC
Created attachment 598594 [details]
backtrace for evolution

Comment 42 Mikhail 2012-07-17 09:43:09 UTC
Created attachment 598596 [details]
backtrace for evolution

Comment 43 Mikhail 2012-07-17 13:15:53 UTC
Created attachment 598633 [details]
backtrace for evolution

Comment 44 Milan Crha 2012-07-17 15:19:37 UTC
Thanks for your testing and patience. I think I found finally the issue here. When I was looking into upstream bug for bug #807908, I realized that soup_session calls should be only done in soup_thread and if it's not, then in certain circumstances it can lead to such strange behaviour (see the upstream bug for more information). It should work fine, with this evolution-ews scratch build:

http://koji.fedoraproject.org/koji/taskinfo?taskID=4246707

Comment 45 Mikhail 2012-07-18 07:41:35 UTC
Great job, but evolution continue crashed
https://bugzilla.redhat.com/show_bug.cgi?id=809866

Comment 46 Mikhail 2012-07-19 13:13:27 UTC
Ops stuck with evolution-ews-3.4.3-1.9 version :(

Comment 47 Mikhail 2012-07-19 13:14:17 UTC
Created attachment 599140 [details]
backtrace for evolution (evolution-ews-3.4.3-1.9) stuck again

Comment 48 Mikhail 2012-07-19 15:38:02 UTC
Created attachment 599192 [details]
backtrace

Comment 49 Mikhail 2012-07-19 19:24:29 UTC
Created attachment 599228 [details]
backtrace evolution

Comment 50 Mikhail 2012-07-20 09:07:40 UTC
Created attachment 599339 [details]
yet another backtrace for evolution

Comment 51 Milan Crha 2012-07-24 11:34:50 UTC
Thanks for the update. I checked the changes and 1.9 didn't touch this part of the code, it did 1.8. Might be just matte of luck you didn't have 1.8 made stuck evolution.

From the bakctraces, all but the one from comment #49 are waiting for a response from the server, but there is missing ews_soup_thread, which means it was quit meanwhile, probably for server failure. The one in comment #49 does have that thread, for the connection it uses in one thread, thus that one may recover after some time, or it just didn't get to the state as the other 4 backtraces.

Comment 52 Milan Crha 2012-07-24 12:06:04 UTC
Created attachment 600002 [details]
proposed ews patch

I think this should help, but as I cannot reproduce, and I'm not completely sure, then I would like to ask you for testing it before I'll commit to upstream sources. I'm currently building the 1.10 version [1] with this patch included. Please run evolution from console, just in case there will be anything interesting when/if you manage to reproduce this issue again. I do not want any special arguments or environment variables being set, just run it like this:
   $ evolution
and attach here its output together with a backtrace. Thanks in advance.

[1] http://koji.fedoraproject.org/koji/taskinfo?taskID=4325399

Comment 53 Milan Crha 2012-07-27 07:14:36 UTC
I do not see any backtraces in your bug reports which would suggest the patch from comment #52 is causing any issue. I'll wait over the weekend and if there will not raise anything, then I'll commit it to sources in the middle of the next week (I do not want to forget about it).

Comment 54 Milan Crha 2012-08-01 08:11:14 UTC
I committed the patch as commit b2b44d4 for master (3.5.5+) and as commit cd21324 for gnome-3-4 (3.4.4+) of evolution-ews.

Comment 55 Milan Crha 2012-08-03 10:58:16 UTC
*** Bug 841729 has been marked as a duplicate of this bug. ***

Comment 56 Mikhail 2012-08-07 20:30:33 UTC
Today evolution stuck on exit again...

$ rpm -qa | grep evolution | sort
evolution-3.4.3-2.1.fc17.i686
evolution-data-server-3.4.3-1.3.fc17.i686
evolution-data-server-debuginfo-3.4.3-1.3.fc17.i686
evolution-debuginfo-3.4.3-2.1.fc17.i686
evolution-ews-3.4.3-1.15.fc17.i686
evolution-ews-debuginfo-3.4.3-1.15.fc17.i686
evolution-mapi-debuginfo-3.4.2-1.fc17.i686
evolution-NetworkManager-3.4.3-2.1.fc17.i686

Comment 57 Mikhail 2012-08-07 20:31:43 UTC
Created attachment 602848 [details]
backtrace evolution 2012-08-07

Comment 58 Mikhail 2012-08-07 20:32:34 UTC
Created attachment 602849 [details]
backtrace evolution 2012-08-07.1

Comment 59 Milan Crha 2012-08-08 11:10:26 UTC
Thanks for the update. The backtraces look fine, they have one EWS thread, which is idle - waiting for orders, and two other threads, waiting for an operation to be finished. It's hard to guess from the backtrace whether the requests are just waiting for a reply in some callback which is pushed somewhere, or what happened here.

Do you have any idea what could happened before you closed evolution? In other words, what was the reason for closing evolution, was it a network disconnect?

I agree that it's harder to spot the issue, at least for me, because we (with your help) addressed most of the issues with unreliable connection, thus I guess you are facing here some left corner-case, which requires pretty strict steps for a reproducer. I do not see anything obvious from code reading, unfortunately.

Comment 60 Mikhail 2012-08-08 11:26:01 UTC
I am try to use Evolution with slow connection (64kbps)


Note You need to log in before you can comment on or make changes to this bug.