Bug 1567572

Summary: Stale connection used after resume in EWS backends
Product: [Fedora] Fedora Reporter: Douglas Summers <ajgringo619>
Component: evolution-ewsAssignee: Milan Crha <mcrha>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 27CC: ajgringo619, mcrha
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: evolution-data-server-3.28.2 evolution-ews-3.28.2 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-26 07:20:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Douglas Summers 2018-04-15 00:24:51 UTC
Description of problem:
At random times, I get this error whenever I try to change the status of an MS Exchange calendar/task event:

Cannot modify calendar object: Connection terminated unexpectedly

Version-Release number of selected component (if applicable):
evolution-ews-3.26.6-1.2.fc27.x86_64

How reproducible:
It seems to happen regularly when the system returns from suspend, although it does happen at other times as well.

Steps to Reproduce:
1. Change task to "complete"
2. Wait for status to update
3. Message: Failed to modify a task in the task list “ajgringo619 : Tasks”

Actual results:
Tasks are now offline
When I reconnect, it works.

Expected results:
Tasks/calendar events should automatically update on the MS Exchange server


Additional info:
I'm going to run some tests with Fedora 28 (in a VM) to see if the latest version works any better. I did try disabling IPv6, but that didn't make any difference.

Comment 1 Milan Crha 2018-04-16 11:01:56 UTC
Thanks for a bug report. I agree with the tests on Fedora 28 with evolution-ews 3.28.1 or any newer. The 3.26 version is obsolet eby upstream already.

I do not recall seeing anything directly related, there was something about accepting meetings, but I'm not sure whether it's related at all. It can be that your server has certain inactivity connection timeout set, which can cause the disconnect by the server, while the client (evolution-ews) still thinks the connection is alive. But I'm only guessing here.

You can try to run the calendar factory with EWS debugging on, to see what the connection itself does. It logs raw communication between the server and the client, thus it's nothing to be shared without sanitizing private information. The command is:

   $ EWS_DEBUG=2 /usr/libexec/evolution-calendar-server -w

then run evolution and let it repeat the issue.

Comment 2 Douglas Summers 2018-04-16 16:13:49 UTC
I ran a couple of tests with Fedora 28, including returning from suspend, and everything seems OK. There does seem to be some kind of time-out with my Exchange calendars; I've had issues with my Tasks as well as a few sports calendars I'm subscribed to. I find that if I manually refresh each calendar that's lost connection before I try anything, then everything works as promised.

Comment 3 Milan Crha 2018-04-16 16:58:41 UTC
Thanks for testing it. There are some intentional timeouts being involved after connection state changes, it's like 5 seconds. It's there to avoid often cancellations of ongoing requests only due to the connection state not being settled yet. If I understand your comment properly, then you are not talking about these delays.

I looked into an EWS calendar properties and it doesn't offer any refresh time interval, like for example CalDAV (and some other remote calendars). Either evolution-ews checks for changes on the fly with "Listen for server change notifications" option in the Mail account properties on, or there is no periodic refresh at all and the calendar is refreshed only when it is connected. That's how it look like after a brief code lookup.

Having there a refresh interval setting in calendar/book properties would help, right? At least when listening for server changes doesn't work, I guess.

Comment 4 Douglas Summers 2018-04-16 17:08:08 UTC
To be honest, I can't remember if this problem started occurring (or occurring more frequently) when I enabled listening for server changes and disabled checking at prescribed intervals. The problem was worse when I was using WI-FI; never could get the drivers to work consistently. Really the only other time that this problem occurs is when I unlock my PC, but that's extremely rare.

Comment 5 Milan Crha 2018-04-17 06:30:08 UTC
I meant there is no refresh interval for EWS calendars/books/tasks/notes at the moment.

When the machine goes offline (it doesn't matter what the reason was, it could be due to suspend, intentional disconnect or unexpected disconnect), the calendars/... move to offline mode as well. They may go online when the machine goes online afterwards, but there can be some bug in the code preventing it. The online/offline state of the calendar/... can be recognized from the connection icon beside its name in the tree on the left.

Comment 6 Douglas Summers 2018-04-17 15:10:34 UTC
When I turned my PC on this morning (out of suspend mode, I mean), I checked all of the status icons of the Calendar and Tasks to make sure that they were are showing as connected. I did manual refreshes and (2) of them gave me the "Connection terminated..." error and went off-line. After a 2nd refresh everything worked as expected.

Comment 7 Milan Crha 2018-04-18 11:38:39 UTC
I see. It means they've been using stale connection. I'll try to find some way to deal with it somehow better. A quick approach would be to reconnect on that error, but that's not always correct and it can hide the main issue here, that the evolution-ews thinks it's online even after it had been notified about disconnection (I suppose it had been notified, just like the mail part).

Comment 8 Douglas Summers 2018-04-18 18:58:18 UTC
OK...I ran into the same issue with my Fedora-28 VM (updated as of this morning). (2) of the Calendars showed as off-line; when I refreshed them I got the 
 "Connection reset.." error.

Comment 9 Milan Crha 2018-04-26 07:20:47 UTC
I finally got into this and after some tests it turned out that when suspending the machine, the backends are notified about network unavailability and there's scheduled an update of the online state with 5 seconds delay (this delay is intentional, because there are many connection changes, thus it's there to avoid spurious changes). The problem is that the backend is in an online state while in this 5 second delay, when the machine is offline, and during those 5 seconds the machine can establish internet connection thus the upcoming test for host reachability succeeds, effectively settings the online state to TRUE, where it already is, which means a no-op. I made a change in evolution-data-server to set the backend offline immediately, when the network is reported as unavailable.

I also thought the backends can reuse "disconnected" connection, instead of creating a new (that's when there are for example multiple calendars and tasks, which are run in a single process and which reuse existing connection), but my tests didn't trigger it. Nonetheless, I made the change in evoluton-ews to prevent this issue anyway, just in case.

Created commit 62ad88803 in eds master (3.29.2+) [1]
Created commit fa48afd90 in eds gnome-3-28 (3.28.2+)

Created commit 1ec1802 in ews master (3.29.2+) [2]
Created commit 16b3999 in ews gnome-3-28 (3.28.2+)

[1] https://git.gnome.org/browse/evolution-data-server/commit/?id=62ad88803
[2] https://git.gnome.org/browse/evolution-ews/commit/?id=1ec1802

Comment 10 Douglas Summers 2018-04-26 20:39:20 UTC
The issue has started to come up during normal idle times now, not just coming out of sleep mode. Any time I don't use Evolution for more than about 15 minutes, I get terminated connections on every Exchange-based account - email, calendar, tasks, and contacts. I had thought this was tied into the network being down (as during sleep mode), but it seems to be something else.

Is there a setting I can change to prevent the connection from timing out?

Comment 11 Milan Crha 2018-04-27 05:35:57 UTC
It seems your server changed some setting, even evolution-ews uses
Connection: Close, thus it should not be a problem. Did you update any package recently, say glib2, glib-networking or libsoup?

With the mail you can open Preferences of the EWS account and change the refresh interval in the Receiving Options tab. Other parts (book/calendar/tasks/memos) do not have any such option.

Maybe run evolution with EWS debugging on, to see what had been received from the server:

   $ EWS_DEBUG=2 evolution

The log contains raw information sent between the server and the client, including message content, server addresses and such, thus it's not suitable for public sharing without editing it. I'm not interested in the whole log, I'd like to see the log in time of the failure only.

Comment 12 Milan Crha 2018-04-30 12:43:02 UTC
I see from the part of the log you sent me that there's not much clue what happened in the background. It really just failed. The server is alive, because that GetStreamingEventsResponseMessage had been received successfully, it's only that the connection used to do all other requests to the server is a separate connection, which could be left idle for too long, that the server though the connection is gone or something.

I also noticed that I've been wrong with that "Connection: Close". The evolution-ews doesn't use it, it uses "Connection: Keep-Alive" instead.

I'm afraid the only thing to do is to retry once again on this "HTTP/1.1 7 Connection terminated unexpectedly" error and fail only if also the second attempt fails (with the same or another error).

Comment 13 Milan Crha 2018-04-30 15:42:46 UTC
I've still hard time to reproduce this. I tried with an outlook.office365.com account, and with the "Listen for server change notification" options, which you seem to have enabled too, the connection is refreshed every 10 minutes, approximately. That seems to be enough time to not cause disconnection on the server side. I also tried with disabled "Listen for server change notifications" and I left the evolution idle for approximately 20 minutes, then I modified a task in the Tasks view and I also marked a mail message as read in the Mail view and none of the connections had been unexpectedly disconnected. This had been done with a up-to-date Fedora 27 as of today, using 'updates', not 'updates-testing', repository.

I suspect whether there's any issue specific to your machine and/or location, like some issue on the route from your machine to the Exchange server, especially when it begun to misbehave only recently (as I understood from your comments above).

Comment 14 Douglas Summers 2018-05-07 17:00:01 UTC
After examining all of the logs generated, it's looking more and more likely that the bug is with gnome-online-accounts and not evolution-ews. As soon as I created an account through GOA, my message log started to get flooded with this:

May  6 12:57:15 DSS-Fedora27 journal[1984]: goa_ews_client_autodiscover() failed
: 405 — Method Not Allowed

I've recreated all of my EWS accounts via Evolution and everything seems to be working (and much faster, too.) I should have noticed this before, but I didn't see an easy way to create an EWS account without using GOA. If I had only unchecked the "Look up mail server details..." box :(

Comment 15 Milan Crha 2018-05-07 17:23:34 UTC
(In reply to Douglas Summers from comment #14)
> If I had only unchecked the "Look up mail server details..." box :(

Right, that one can hide quite many things. I didn't think of it myself. You get some better experience with Edit->Accounts->Add->Collection Account in evolution 3.28.x, which can lookup also EWS accounts. This doesn't work for mail accounts, because EWS autodiscovery requires password.

I'm surprised you see the EWS configured in evolution quicker than configured through GOA, because GOA only provides account information and the password. All the rest is the same code. The password check can make things slower, though it is not supposed to be done often, only on connection changes, like after disconnect, when a new connection is created.