Bug 1491627 - after standby/suspend, evolution cannot reconnect to exchange server
Summary: after standby/suspend, evolution cannot reconnect to exchange server
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: evolution-ews
Version: 26
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Milan Crha
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-09-14 10:03 UTC by ingli
Modified: 2018-05-29 12:47 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-29 12:47:41 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
bt file as proposed by Milan Crha 2017-09-14 11:52:15 EDT (3.49 KB, text/plain)
2017-09-15 13:12 UTC, ingli
no flags Details
bt file as proposed by Milan Crha 2017-09-14 11:52:15 EDT (2.76 KB, text/plain)
2017-09-15 13:15 UTC, ingli
no flags Details

Description ingli 2017-09-14 10:03:12 UTC
Description of problem:

When I turn on the system, evolution can well connect to all the accounts configured with gnome online accounts. After the system has been in standby or suspend, then I cannot edit the exchange server provided contacts/calendar items anymore. (I have not tested mail through that).

I would expect that the exchange server is still usable after the suspend.

Sorry, am not very experienced with reporting bugs; am happy to provide more detail.

Comment 1 Milan Crha 2017-09-14 15:52:15 UTC
Thanks for a bug report. I would expect that either evolution-ews or gnome-online-accounts (GOA) got stuck on stale connection. As I saw a similar issue with GOA only recently, I'd more keen to it.

Could you help me to verify it, please?

It requires to install debuginfo packages for evolution-data-server, evolution-ews and GOA, with a command like this:

   $ sudo dnf install evolution-data-server-debuginfo evolution-ews-debuginfo \
        gnome-online-accounts-debuginfo --enablerepo=updates-debuginfo

only verify that the installed package versions exactly match their corresponding binary packages, which can be done like this:

   $ rpm -qa evolution-* gnome-online-accounts

And then, when you happen to reproduce the issue, capture backtrace of running goa-daemon:

   $ gdb --batch --ex "t a a bt" -pid=`pidof goa-daemon` &>goa-bt.txt

and evolution calendar or addressbook subprocess which has "--factory ews" on its command line. You can look for such processes with:

   $ ps ax | grep evolution | grep "factory ews"

The first number in the output is the process ID, which you write as PID in the below command:

   $ gdb --batch --ex "t a a bt" -pid=PID &>ews-bt.txt

Please check the bt.txt files for any private information, like passwords, email address, server addresses,... I usually search for "pass" at least (quotes for clarity only).

You can also try to run:

   $ /usr/libexec/goa-daemon --replace

which runs the goa-daemon on the terminal where it had been started. I've been told that this fixes the issue and it also supports the idea of the problem being on the GOA side.

Comment 2 ingli 2017-09-15 13:12:38 UTC
Created attachment 1326466 [details]
bt file as proposed by Milan Crha 2017-09-14 11:52:15 EDT

Comment 3 ingli 2017-09-15 13:15:16 UTC
Created attachment 1326467 [details]
bt file as proposed by Milan Crha 2017-09-14 11:52:15 EDT

goa file

Comment 4 ingli 2017-09-15 13:17:58 UTC
Thank you for your immediate response.

I believe I have followed your suggestion and have now attached the two files produced.

> 
> You can also try to run:
> 
>    $ /usr/libexec/goa-daemon --replace
> 
> which runs the goa-daemon on the terminal where it had been started. I've
> been told that this fixes the issue and it also supports the idea of the
> problem being on the GOA side.

running "goa-daemon --replace--replace", as suggested, did not fix the issue.

Comment 5 Milan Crha 2017-09-18 10:51:48 UTC
Thanks for the update. Both backtraces show the processes idle. The one for the evolution-calendar-factory-subprocess even shows that the EWS account is still offline and there is no attempt to connect to the server at the time when the backtrace had been caught. GOA backend is also idling, not trying to connect anywhere.

What was the evolution UI showing at the moment when the backtraces had been caught, please?

Also, when you said the contacts/calendar items cannot be edited, what does it do exactly, please? Is there any error message shown in the UI or anything like that? Maybe if you run evolution from a terminal, then some information would be shown there.

Comment 6 ingli 2017-09-19 10:43:10 UTC
I just tested it again.

I activated the computer after suspend. 

I tried to move an event within the calendar. 

Immediately then at the top of the evolution window, a blue error message shows up 'Failed to modify an event in the calendar “[Exchange server calendar name]”: Cannot modify calendar object: Repository offline'.

The same happens when I open an event, edit and try to save it.

Starting evolution from the shell does not get me any error messages.

I tried taking the respective exchange account offline/online within GOA. This does not seem to affect anything in this respect.

Comment 7 Milan Crha 2017-09-19 12:19:22 UTC
(In reply to ingli from comment #6)
> 'Failed to modify an event in the calendar “[Exchange server
> calendar name]”: Cannot modify calendar object: Repository offline'.

I see. It looks like the backend cannot reach the server address after you resume for some reason. There is a delay around 5 seconds to try to reach the server address after connection setup changes, but I see from your comment that it was much longer than 5 seconds and you still face the issue. The backend runs in one of evolution-calendar-factory-subprocess processes, as written in comment #1 and where your later-attached backtrace confirms the situation, the EWS backend is not connected. As you can access the Mail part, I believe the server as such is reachable after resume from suspend (like it's not behind some VPN or anything like that).

Even unlikely (because the Mail part works), it can be that the connection change listener (a GNetworkMonitor object from glib2) didn't claim the connection as being reachable. When you open Edit->Preferences->Network Preferences, then there is an option to change which network monitor implementation should be used. Depending on the way your network is configured, the "networkmanager" implementation might work as expected.

You can also kill the "--factory ews" evolution-calendar-factory-subprocess in this stage (when the calendar is in offline), it'll be restarted when needed. You may see some warnings in Evolution, when you kill the subprocess, thus I'd suggest to also close Evolution. This re-run of the subprocess will be a more "drastic" way to re-check connection information, just like after fresh start.

I still do not understand why the backend might want to stay in offline, when the connection had been re-established shortly after resume from suspend.

Comment 8 ingli 2017-09-19 13:27:56 UTC
killing the "--factory ews" process seems to be an acceptable workaround.

I am happy to provide more information, react to further questions.

And, yes, the server as such is reachable (I access it via thunderbird/calendar without problems).

When closing evolution, the shell reports a range of warnings like

   (evolution:15389): GLib-GObject-WARNING **: gsignal.c:2641: instance '0x555d35c222a0' has no handler with id '5926'

Comment 9 Milan Crha 2017-09-20 14:16:47 UTC
Thanks for the update. I gave this a try and while most of the times the resume from suspend (or disabling wifi/disconnecting and then enabling wifi/connecting), after at least 10 seconds, the network monitor works properly, while, in some rare cases, I see that it reports network as available, the NetworkManager shows the wifi connected, still the host cannot be reached, neither when I try to 'ping' it from a terminal. From that I'd guess that the issue is even lower than with the network monitor. I also saw a case where the network monitor reported network available, but it took more than a minute until I could ping external server.

You can find a little test program at bug #1302658 comment #4 , which requires glib2-devel package and gcc to be installed and when you run it from a terminal it shows what network monitor thinks about connections. While it listens for "network-changed" and "notify::network-available" signals, evolution-data-server listens only for the "network-changed" signal.

One thing to mention, I had more trouble to connect to an Exchange server when using libsoup-2.58.0, but after I updated it to 2.58.2, the connection to the exchange server begun to work more reliably. It doesn't have direct relation to the network monitor, it's only for the exchange server itself.

I know that your issue is different, the rest of the system can reach network, it's only evolution(-data-server) insisting on the server not being reachable. I've got that too, but only once. The backend had been set to offline, while the destination host could be reached. I do not know how to reproduce it reliably, I guess there's either missing some signal about "network-changed" (unlikely) or there was some coincidence in the code execution in eds. I'll try to investigate this further and let you know if I find anything.

Comment 10 Milan Crha 2017-09-20 16:16:39 UTC
I've just got a state where GNetworkMonitor reported network available, the other one (in another process) reported server reachable, still it claimed the exchange server as unreachable. I _guess_ it's due to the getinetaddress() also using a stale/old/invalidated connection.

I think there are many corner cases. There had been also added a 5 seconds delay before trying to test reachability of the servers, just for this reason, that the connection changes heavily right after resume from suspend, but also after connecting to wifi or wired connection. I'm afraid we cannot cover every case.

By the way, I've got one idea, instead of killing the evolution-calendar-factory-subprocess process, try to disconnect from the network and connect back again. That will cause GNetworkMonitor to re-issue the signals and then, eventually, it'll report the server reachable. It might be easier and cleaner than killing the background process.

Comment 11 Fedora End Of Life 2018-05-03 08:31:24 UTC
This message is a reminder that Fedora 26 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 26. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '26'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 26 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 12 Fedora End Of Life 2018-05-29 12:47:41 UTC
Fedora 26 changed to end-of-life (EOL) status on 2018-05-29. Fedora 26
is no longer maintained, which means that it will not receive any
further security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.