Bug 1207661 - EWS stuck waiting for a message download
Summary: EWS stuck waiting for a message download
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: evolution
Version: 21
Hardware: x86_64
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Milan Crha
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-03-31 12:52 UTC by Louis van Dyk
Modified: 2015-12-02 17:48 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2015-12-02 10:42:00 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
evolution backtrace 20150407 (31.18 KB, text/plain)
2015-04-07 14:20 UTC, Louis van Dyk
no flags Details
eds patch (with debugging) (2.17 KB, patch)
2015-04-08 08:11 UTC, Milan Crha
no flags Details | Diff
evolution backtrace 20150408 (26.96 KB, text/plain)
2015-04-08 15:22 UTC, Louis van Dyk
no flags Details
evolution backtrace 20150415-1 (28.45 KB, text/plain)
2015-04-15 14:13 UTC, Louis van Dyk
no flags Details
evolution backtrace 20150422 (27.11 KB, text/plain)
2015-04-22 12:23 UTC, Louis van Dyk
no flags Details

Description Louis van Dyk 2015-03-31 12:52:33 UTC
Description of problem:
Ever since upgrading to Fedora 21 I have had Evolution woes.  MY configuration is as follows:
1.  I have as my primary account, an IMAP+ connection to my Fedora 21 server at home, which I connect to via SSL on port 993.  I have 1 thread enabled.  I have tried with 2 but found no difference.
2.  I connect to my office's Exchange server, which is my primary calendar and provides an addressbook lookup.  The account is setup via Gnome Online Accounts.
3.  I have a *LOT* of Search Folders, some nested up to 3 levels.  Mostly these only look at the IMAP+ account, owing to prior performance issues when being applied to the Exchange mailbox.
4.  I colour my messages based on Message Filters on incoming messages.
5.  All my passwords are saved to prevent other Evolution timeouts.
6.  My internet connection is via a VPN Tunnel across the LAN to a Linux Server running OpenVPN and some iptables rules, which connects to the external Internet router.

When I open Evolution, it usually works OK for a while, although sometimes it gets stuck filtering messages as it processes the Inboxes.  Sometimes I just have to kill that by clicking the red circle on the task bar.  Sometimes it dies, other times it doesn't and then I must run "killev".

Assuming I get a working session, I works for a bit, but then weirdness ensues.  Say I am in a Search Folder, and I want to open a message.  Often the message preview just stays blank.  I double click the message, and the message window opens up.  Sometimes this shows Retrieving message ID long number, and other times shows nothing in the details box.

If I then go to another search folder, or sent items, or switch to say the Exchange inbox, sometimes absolutely nothing happens.  I can click whatever folder and it will be stuck on (usually) the IMAP+ Inbox.

By this time there is a ROW of little Gnome Feet with Red Cirles lining the bottom of the evolution window.  If I hover on them, like now, it shows the message "Unknown background operation".  Sometimes it shows nothing.  

I can go to the Calendar, and it works, but mail doesn't.  (Although sometimes Calendar itself can be full of issues too!)

If I click the red circles now, they change the text to "Unknown background operation (cancelling)"  but they just stay there.  If I select FILE-QUIT sometimes it quits and other times it does nothing.  Even the X on the Window Bar does nothing.  At those times it's again killev time.




Version-Release number of selected component (if applicable):
# rpm -qa | grep -Ei "evolution|libsoup|webkit" | sort
evolution-3.12.11-1.fc21.x86_64
evolution-data-server-3.12.11-1.fc21.x86_64
evolution-ews-3.12.11-1.fc21.x86_64
evolution-help-3.12.11-1.fc21.noarch
kdelibs-webkit-4.14.6-2.fc21.x86_64
libsoup-2.48.1-2.fc21.x86_64
qt5-qtwebkit-5.4.1-1.fc21.x86_64
qtwebkit-2.3.4-1.fc21.i686
qtwebkit-2.3.4-1.fc21.x86_64
webkitgtk-2.4.8-1.fc21.x86_64
webkitgtk3-2.4.8-2.fc21.x86_64
webkitgtk4-2.6.5-3.fc21.x86_64



How reproducible:
It happens every day.



Steps to Reproduce:
1.  Start evolution and use it.
2.  Jump between search folders, inbox and sent items, sometime with extended wait periods in between.
3.  Eventually it acts out.



Actual results:
The mail client becomes unresponsive and I have to close or killev it and restart.



Expected results:
It should be a stable reliable client that can be used at an office without feeling embarrassed around the Microsoft colleagues.


Additional info:
Let me know how I can debug or provide other useful information.  When running "evolution" from the command prompt, not much is reported.  I get the occasional:
(evolution:19402): GLib-GObject-WARNING **: invalid cast from 'CamelStreamFs' to 'CamelObject'

Comment 1 Milan Crha 2015-04-07 07:08:17 UTC
Thanks for a bug report. This really shouldn't be there:

> (evolution:19402): GLib-GObject-WARNING **: invalid cast from 'CamelStreamFs' to 'CamelObject'

but it's hard to guess where it exactly comes from. It can be either during the message download or somewhere around it. I didn't see any such console error myself, it would worth to investigate the place and the cause of it.

Those "Unknown background operation"-s can be caused by too many pending operations in the background, being stuck for some reason. Can it happen, that you intermittently lost the connection to one of your servers? I suppose evolution is waiting for a response from the server, which may partly explain it. To be sure, could you install debuginfo packages for evolution-data-server and evolution, and once evolution gets to this odd state get a backtrace of it, please? You can get the backtrace with command like this:
   $ gdb --batch --ex "t a a bt" -pid=`pidof evolution` &>bt.txt
Please check the bt.txt for any private information, like passwords, email address, server addresses,... I usually search for "pass" at least (quotes for clarity only).

Please note that the upcoming 3.16.1, part of Fedora 22, will have some related fixes. The "Unknown background operation" is gone there. It cannot fix the "wait for the server response", but it can have fixed other things around it.

Comment 2 Louis van Dyk 2015-04-07 14:20:35 UTC
Created attachment 1011775 [details]
evolution backtrace 20150407

Hi ... thanks for the reply.  

I have installed debuginfo packages for:
evolution-data-server-debuginfo-3.12.11-1.fc21.x86_64
evolution-debuginfo-3.12.11-1.fc21.x86_64
evolution-ews-debuginfo-3.12.11-1.fc21.x86_64
libsoup-debuginfo-2.48.1-2.fc21.x86_64

Are these sufficient?

Attached is the backtrace info as requested.  It took only a few minutes for the situation to occur.

Comment 3 Louis van Dyk 2015-04-07 14:30:52 UTC
(In reply to Milan Crha from comment #1)

> Those "Unknown background operation"-s can be caused by too many pending
> operations in the background, being stuck for some reason. Can it happen,
> that you intermittently lost the connection to one of your servers? I
> suppose evolution is waiting for a response from the server, which may
> partly explain it. 

To answer that, I don't think the connection itself is at fault, because a) it happens far too often, and b) when it's happened I have been ssh'd onto the server it would have timed-out speaking to.

However, I agree with the waiting for a response situation ... as menitoned, I have a lot of search folders.  I just counted ... there are 112.  Do you think that this could be the cause?  They did seem to work fine on Fedora 20, though.

I've also noticed that since Fedora 20 it seems to refresh these more often.  There are probably about 40 I could delete as they now have little or no info in them - the rules have followed from one export/backup to the next.

Comment 4 Milan Crha 2015-04-08 06:41:10 UTC
Thanks for the update. The backtrace is nice, it shows a (possible) deadlock in the evolution-ews code, when updating content of some search folders. Thread 22 is waiting for a response from the EWS server, while the other threads are waiting for it to finish. It can be that the EWS account lost connection or something and waits "indefinitely". There is a 90 seconds timeout for the connection, though the backtrace doesn't show that the connection is active.

I'm wondering whether this could be another variant of the thread unsafety of EAsyncClosure, which is the cause of the upstream bug [1]. If you can reproduce this more or less easily, then I'll provide a test build of evolution-data-server with a possible fix and you might give it a try.

I'm not aware of any particular changes in the search folders between 3.10 and 3.12, but it's a long time ago, thus there can be some related. If you open Properties of the search folders, then you can set there whether they should update automatically on any change in its folders. Turning that off will cause an update of the folder only when it's selected. It has some side-effects, like not accurate unread count on the folder. It will not help with the core issue, but you can make the search folders more relaxing, not consuming your CPU when it's not necessary.

[1] https://bugzilla.gnome.org/show_bug.cgi?id=746276

Comment 5 Milan Crha 2015-04-08 08:11:48 UTC
Created attachment 1012075 [details]
eds patch (with debugging)

This is a patch I used to build test package [1]. Please try it and let me know whether it fixed the issue for you. It can eventually print a message on the console when the conditions for the error were met, but it's also possible that the conditions will be avoided due to added locking.

[1] http://koji.fedoraproject.org/koji/taskinfo?taskID=9434338

Comment 6 Louis van Dyk 2015-04-08 15:22:06 UTC
Created attachment 1012272 [details]
evolution backtrace 20150408

Thanks for the patch.  Herewith my findings:

1.  After installing it, when I started evolution, the EWS connection timed out.  The EWS arrows went red, and after retrying, the pop-up message appeared just above the Inbox view.  I had run it from the command line, and the message displayed was:
(evolution:6989): evolution-mail-WARNING **: receive_update_got_folderinfo: Timeout was reached

2.  I left it for a while, as I was busy with something else, but then noticed that it had recovered and the EWS Inbox and the IMAP Inbox were both open.  The application worked well, and was responsive, without the usual "Unknown" tasks appearing at the bottom.

3.  I decided to shut down and restart evolution to see if the timeouts occurred again.  Gladly they did not ... however ... I was reading a message in the preview pane in the EWS Inbox which had a small PDF attachment.  I don't know when they happened, but when I closed the PDF (I used evince) the row of red dots and Unknown Background Operation messages was back.  I tried to open the PDF again, but at this point evolution was again stuck.

4.  Attached is the backtrace again.

Comment 7 Louis van Dyk 2015-04-08 16:30:13 UTC
Since restarting it's been good ...

There have been these logged to the console though ...

(evolution:14953): GLib-GObject-WARNING **: invalid cast from 'CamelStreamFs' to 'CamelObject'

(evolution:14953): GLib-GObject-WARNING **: invalid cast from 'CamelStreamFs' to 'CamelObject'

(evolution:14953): GLib-GObject-WARNING **: invalid cast from 'CamelStreamFs' to 'CamelObject'

(evolution:14953): GLib-GObject-WARNING **: invalid cast from 'CamelStreamFs' to 'CamelObject'

(evolution:14953): GLib-GObject-WARNING **: invalid cast from 'CamelStreamFs' to 'CamelObject'

(evolution:14953): GLib-GObject-WARNING **: invalid cast from 'CamelStreamFs' to 'CamelObject'


And then, just as I thought all was well, I went into the EWS Deleted Items folder, which has a lot of messages in it.  I again have the row of boxes at the bottom of the screen.  I right-clicked Deleted Items and selected "Empty Trash".  The boxes stayed there.  Some are updating various search folders, two have Generating Message List, others have the Unknown Background Operation and there is one Retrieving message item.  This time I was able to click the red dots and have them display cancelling.  It seems that when I cancelled the "Retrieving message" item that suddenly all the boxes had lines drawn through them, and they cleared.  The Deleted Items folder is now empty.

Comment 8 Milan Crha 2015-04-09 08:57:31 UTC
Thanks for the try of the test package. Your issue is different than I thought, the test package didn't have any influence on it. There is still one thread waiting for a response from the server and other threads waiting for it to finish. Once it'll be finished the rest will be done quite quickly.

(In reply to Louis van Dyk from comment #7)
> (evolution:14953): GLib-GObject-WARNING **: invalid cast from
> 'CamelStreamFs' to 'CamelObject'

I do not see these either. It would be nice to get a backtrace of it, to see where the invalid cast is happening and eventually fix it.

Comment 9 Milan Crha 2015-04-15 09:37:09 UTC
I realized yesterday that the eds patch wasn't correct, it didn't do anything good, thus I fixed it and committed it to sources, while I also built a test package with it for Fedora 21. You can find it here:
http://koji.fedoraproject.org/koji/taskinfo?taskID=9478633

Please update to that version and watch whether anything changes. If it'll still not help, then I aim this into a wrong direction.

Comment 10 Louis van Dyk 2015-04-15 14:13:41 UTC
Created attachment 1014799 [details]
evolution backtrace 20150415-1

I have installed the latest patch: 
evolution-data-server-3.12.11-2.1.fc21.x86_64.rpm

This backtrace has the many stuck threads at the bottom displaying descriptions when hovered over, stating things similar to "generating message content" and so on.

I was no longer able to change view from the sent items window I was on.

Comment 11 Milan Crha 2015-04-15 17:20:09 UTC
Thanks for a quick testing. I see from the backtrace that the issue is really elsewhere, might be somewhere in the evolution-ews code probably. I'm sorry for misleading this bug report in a wrong direction.

Please try to catch the EWS log, maybe it'll shed a bit of light to the issue. You can capture the log with this command:

   $ EWS_DEBUG=2 evolution &>log.txt

only note that the log will contain raw communication between evolution and the exchange server, which means server addresses, your message and folder contents and so on. I would be interested only in the end of the log anyway, before this state happens, just to see whether anything was happening with the connection.

Feel free to send the log only to me, rather than share it publicly, only write a reference to this bug report in the subject, thus I'd not overlook it in my spam folder.

Comment 12 Louis van Dyk 2015-04-16 17:17:32 UTC
Thanks Milan.  No apologies necessary.  As IT people we are like Doctors.  We don't really know what the problem is but we make our best guess and then treat that symptom!  :)

I will do the requested log tomorrow - I was working from home today.

However, that does lead me to an interesting observation ...

So today I was working on my HOME PC, not my WORK laptop.  This is the Fedora 21 system that also hosts my IMAP Mail Server.  I was working on Evolution all day today and did not have the above problem at all.  I also have almost all the same search folders on this system - with a possible few exceptions - so the configuration is very similar.  The only notable exception is that I have configured EWS directly in Evolution at home, and did NOT use the Gnome Online Accounts.  I doubt it makes a difference - but I had to mention it.

More info tomorrow then ...

Comment 13 Louis van Dyk 2015-04-22 12:23:42 UTC
Created attachment 1017392 [details]
evolution backtrace 20150422

At the time I took this, some of the operations at the bottom still said "Updating folder SearchFolderName", until eventuall the unknown operation descriptions replaced all the message boxes, working from left to right.

I will send the EWS Log to you directly.

Interestingly, around line 3950 of the EWS log I had openend a calendar meeting request, which did not allow me to respond (Accept or Decline) or even open the calendar.

I opened another invite which did allow me to click Decline, but the thing I notice then is that the calendar now shows TWO items.  The original message is still there (and in Outlook Web Access it is still waiting for me to respond)

Do you think it's related, or should I open a new bug for that?

Comment 14 Milan Crha 2015-04-23 10:07:35 UTC
Thanks for the update. The backtraces show basically the same thing, one thread is filtering a folder content and reading a message data from the server while the other threads are waiting until the other search is finished. The problem is that the reading of the message data got stuck. The EWS_DEBUG logs show that evolution-ews received a response, but it was cut in the middle for some reason, which is not shown in the logs, unfortunately.

There is one interesting thing, the 'A' debug log ends at message 40, while
the 'B' log ends at message 39. It's probably just a coincidence, but a nice one.

(In reply to Louis van Dyk from comment #13)
> Interestingly, around line 3950 of the EWS log I had openend a calendar
> meeting request, which did not allow me to respond (Accept or Decline) or
> even open the calendar.

Once a meeting invitation is selected in the Mail view, the evolution-calendar-factory is contacted and is searched in all calendars for the existence of the meeting. The view has disabled all the buttons and the top of it shows "Searching for an existing version of this appointment". Once the search is over the text disappears and the UI gets enabled.

I suppose that the former invitation event wasn't downloaded in the backend yet, while the later was, thus the backend could return quickly that the meeting exists in that particular calendar.

I'd guess the duplication of the event is caused by an incorrect calendar being chosen (and/or the event not being found in the EWS Calendar). You might always pick the EWS calendar to store the event response in, because it's the one used by the Exchange server.

I'd like to know why you have such trouble with evolution-ews, while I do not see them here. One aspect can be those search folders, but that doesn't explain the behaviour with the calendar, because that all is done in the evolution-calendar-factory process, far from the Mail view of the Evolution. Which is also the reason why the EWS debug logs do not contain the decline request, that call to the server was not done within the Evolution process.

Comment 15 Louis van Dyk 2015-04-23 12:08:24 UTC
As a test, perhaps I should make a complete clean setup of Evolution.  I have been using it for many many many years, and that is why there are so may search folders.  The data folders have just been updated each release, or at best, an Evolution backup was made when a new version of Fedora is installed, and then the Evolution backup is restored.

I accept that there is going to be a lot of editing involved, but is there a nice way I can export my local address book, my search folders, and the message filter rules (I use them to colour my messages for easy identification)?  I would like to edit them in LibreOffice Calc and just keep what I need, and then import them somehow into a fresh clean installation of Evolution. 

What do you think?

Comment 16 Milan Crha 2015-04-23 16:57:01 UTC
(In reply to Louis van Dyk from comment #15)
> I accept that there is going to be a lot of editing involved, but is there a
> nice way I can

Good idea to try from scratch, though it'll be very time consuming for you. Make a backup before doing any changes, thus you'll have something to return to.

> export my local address book

Go to the Contacts view, right-click the address book and pick Save as vCard there. Verify briefly that it contains multiple contacts.

A `cat vcard.vcf | grep BEGIN:VCARD | wc -l` should return the same number as is the count of contacts in the address book.

> my search folders,

unfortunately not. They are stored in:
   ~/.config/evolution/mail/searches.xml

> and the message filter rules

similar here, unfortunately no tool for the backup. They are stored in:
   ~/.config/evolution/mail/filters.xml

Both might be hard to edit, and easy to break, as they are pure XML files with a very similar structure.

Comment 17 Louis van Dyk 2015-04-24 11:40:15 UTC
To quote Barney Stinson from How I Met Your Mother ...

Challenge ACCEPTED.

This should keep me busy for the weekend.  :)

Comment 18 Louis van Dyk 2015-04-24 12:57:19 UTC
Oh, and one other question: my IMAP mail spool file is very large.  7 GB to be precise.  Would that matter?

Comment 19 Milan Crha 2015-04-27 07:03:51 UTC
(In reply to Louis van Dyk from comment #18)
> Oh, and one other question: my IMAP mail spool file is very large.  7 GB to
> be precise.  Would that matter?

I guess you mean directory, the one at
   ~/.cache/evolution/mail/<imap-account-uid>/
right? The size as such might not matter. Good amount of it should be downloaded viewed messages, those you saw in the message preview. The other part is the folders.db file, where is a summary of known messages. Its size might have impact on performance, in certain cases, but otherwise nothing evolution specific.

Also, you said IMAP, did you mean EWS? Its cache is in the same folder, only the last part is the <ews-account-uid>.

Comment 20 Louis van Dyk 2015-05-22 11:19:19 UTC
Sorry I have left this alone for so long ...

It seems that a re-configuration was not needed.  On both my home and work devices, both which were experiencing the same thing, it seems that deleting (completely) the vfolders.xml file has done the trick.  I no longer get the rows of stuck tasks there.

However, I don't seem to be able to create many of these Virtual Folders before the problem begins again.  I am doing some testing to see how many I can support, and if using the IMAP or EWS accounts make a difference.  I see that the VFolders update every time that I change from any folder to another by default.  Perhaps turning that option off might also help.

I need a while to test conclusively and will report back when I have clearer findings.

Thanks for the help so far.

Comment 21 Fedora End Of Life 2015-11-04 11:51:48 UTC
This message is a reminder that Fedora 21 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 21. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '21'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 21 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 22 Fedora End Of Life 2015-12-02 10:42:04 UTC
Fedora 21 changed to end-of-life (EOL) status on 2015-12-01. Fedora 21 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.