1155752 – High CPU load when evolution interact with evolution-data-server

Bug 1155752 - High CPU load when evolution interact with evolution-data-server

Summary: High CPU load when evolution interact with evolution-data-server

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	evolution-data-server
Sub Component:
Version:	20
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Milan Crha
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-10-22 18:12 UTC by Bart Ratgers
Modified:	2014-10-27 09:26 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2014-10-24 05:40:53 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Sample 1 of huge number of threads (16.39 KB, text/plain) 2014-10-22 18:12 UTC, Bart Ratgers	no flags	Details
Another sample of active threads (32.48 KB, text/plain) 2014-10-22 18:17 UTC, Bart Ratgers	no flags	Details
A scenario with more than 2400 threads (283.80 KB, text/plain) 2014-10-22 18:31 UTC, Bart Ratgers	no flags	Details
Total of threads increased to more than 4550 (505.36 KB, text/plain) 2014-10-22 18:33 UTC, Bart Ratgers	no flags	Details
Extract from the journalctl with evolution errors (1.25 MB, text/plain) 2014-10-22 18:52 UTC, Bart Ratgers	no flags	Details
bt-evolution-1.txt --> Timestamp 1 (13.94 KB, text/plain) 2014-10-23 13:19 UTC, Bart Ratgers	no flags	Details
bt-evolution-2.txt --> Timestamp 2 (13.64 KB, text/plain) 2014-10-23 13:21 UTC, Bart Ratgers	no flags	Details
bt-evolution-calendar-2.txt --> Timestamp 2 (702.92 KB, text/plain) 2014-10-23 13:24 UTC, Bart Ratgers	no flags	Details
bt-evolution-3.txt --> Timestamp 3 (14.90 KB, text/plain) 2014-10-23 13:27 UTC, Bart Ratgers	no flags	Details
bt-evolution-calendar-3.txt --> Timestamp 3 (1.76 MB, text/plain) 2014-10-23 13:29 UTC, Bart Ratgers	no flags	Details
bt-evolution-4.txt --> Timestamp 4 (141 bytes, text/plain) 2014-10-23 14:47 UTC, Bart Ratgers	no flags	Details
bt-evolution-calendar-4.txt --> Timestamp 4 (209.56 KB, text/plain) 2014-10-23 14:48 UTC, Bart Ratgers	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
GNOME Bugzilla	739106	0	None	None	None	Never

Description Bart Ratgers 2014-10-22 18:12:06 UTC

Created attachment 949511 [details]
Sample 1 of huge number of threads

At least a year I have several problems with evolution when I'm doing something in evolution that needs an interaction with the evolution-data-server. The problem is that it happens not all the time. but it looks like that there are scenario's that the frequency of evolution-data-server increase when the network bandwidth is small.

My current environment is that I configured 3 Google e-mail addresses via GOA, including contact persons and calendar facilities.
   
These are the scenarios when I faced the problem:
1. When I reply to a mail and it looks like that the evolution server does a lookup to the contact persons;
2. When I create a new mail and it looks like that the evolution server does a lookup to the contacts persons;
3. When I accept an appointment via the mail component;
3. When I create an save an appointment in the calendar component;
4. When I change an appointment in the calendar component;

Till today sometimes the evolution-data-server crashed, like bug https://bugzilla.redhat.com/show_bug.cgi?id=1051770
After some investigation I realize that the crashes occurred by the limitation of allowable threads, see: https://bugzilla.gnome.org/show_bug.cgi?id=731554 and https://bugzilla.redhat.com/show_bug.cgi?id=432903

I changed the thread limit following this guide: http://rudametw.github.io/blog/posts/2014.04.10/not-enough-threads.html and set the limit to 8K.

The following occurred when I change an appointment and the bandwidth is limited:
- The evolution application freezes for 5 minutes.
- CPU utilization became very high more than 100% for the evolution-data-server
- The count of active threads increased to 250. Most of the threads is created by the evolution-data-server.


So I think that many bugs are related to the thread creation by the evolution-data-server when the bandwidth is poor, in case of Google accounts or maybe also other external calendar and contact persons accounts.


Version-Release number of selected component (if applicable):


How reproducible:
First I can not guarantee this happens also anywhere else. 

Steps to Reproduce:
1. Add more than 1 Google accounts to evolution. 
2. Reduce the available bandwidth. 
3. Change an appointment in the evolution calendar component.

Actual results:
- The evolution application freezes for 5 more than minutes.
- CPU utilization become very high more than 100% for the evolution-data-server
- The count of active threads increased to 250. Most of the threads is created by the evolution-data-server, see attachment.
- after 5 minutes evolution is available again and the cpu utilzation is lowered, also the number of active threads. 


Expected results:
The appointment is saved an synchronized with the Google calendar. 


Additional info:

Comment 1 Bart Ratgers 2014-10-22 18:17:00 UTC

Created attachment 949515 [details]
Another sample of active threads

Comment 2 Bart Ratgers 2014-10-22 18:20:56 UTC

Add also the relevant components:

evolution-3.10.4-4.fc20.x86_64
evolution-data-server-devel-3.10.4-6.fc20.x86_64
evolution-data-server-3.10.4-6.fc20.x86_64

Comment 3 Bart Ratgers 2014-10-22 18:31:04 UTC

Created attachment 949532 [details]
A scenario with more than 2400 threads

After saving a calendar item evolution freezes again and the thread count increase to 2400 threads.

Comment 4 Bart Ratgers 2014-10-22 18:33:32 UTC

Created attachment 949533 [details]
Total of threads increased to more than 4550

Comment 5 Bart Ratgers 2014-10-22 18:52:02 UTC

Created attachment 949534 [details]
Extract from the journalctl with evolution errors

Comment 6 Milan Crha 2014-10-23 08:53:50 UTC

Thanks for a bug report. These thread lists are not that useful, they do not show what the application does in those threads. As you mentioned "Reduced available bandwidth", then I guess that they try to resolve an address of a Google server, but that's only a guess.

The list of critical warnings is interesting, they surely might not be there.

I think you face something similar to bug #1148247. There were other bug reports upstream [1] where users claimed that setting the GMail accounts directly in evolution, instead of in GOA, helped them with some false password prompts. I do not know whether this issue belongs to the same basket, maybe it doesn't.

Could you get a backtrace of the running misbehaving process, please? You can get the backtrace with command like this:
   $ gdb --batch --ex "t a a bt" -pid=`pidof evolution` &>bt.txt
Please check the bt.txt for any private information, like passwords, email address, server addresses,... I usually search for "pass" at least (quotes for clarity only). Also make sure that you'll have installed debuginfo package for evolution-data-server, the same version as the binary package is. That may show what the threads are doing (or waiting for).

[1] https://bugzilla.gnome.org/show_bug.cgi?id=728496

Comment 7 Milan Crha 2014-10-23 08:56:28 UTC

(In reply to Milan Crha from comment #6)
>    $ gdb --batch --ex "t a a bt" -pid=`pidof evolution` &>bt.txt

Oops, of course replace 'pidof evolution' in the above command with the right running executable process ID.

(In reply to Bart Ratgers from comment #0)
> 2. Reduce the available bandwidth. 

I forgot to ask, how do you do that? I'd like to be able to reproduce it here, thus I could investigate the root of the cause.

Comment 8 Bart Ratgers 2014-10-23 12:32:49 UTC

What I have actual done is I update the agenda while a huge dropbox synchronization process was running in the background. The dropbox clients consume almost the complete bandwidth to upload the files from my laptop to the dropbox server

I think you can reproduce this by starting a random upload that consume the full bandwidth.

I will also try to reproduce it, and start backtrace.

Comment 9 Bart Ratgers 2014-10-23 12:34:51 UTC

(In reply to Milan Crha from comment #6)
> Thanks for a bug report. These thread lists are not that useful, they do not
> show what the application does in those threads. 

I agree that the list doesn't show what the program actual does, but it shows that the number of threads is enormous growing.

Comment 10 Bart Ratgers 2014-10-23 13:18:19 UTC

Okay. I started again a huge upload, that consumes almost the complete bandwidth.
And playing a litle bitby adding and deleting appointments.
After some time evolution freezes and a segfault occurred, see journalctl-evolution-1.log

I start again evolution and play again a little bit with appointments. Evolution freeze again. And no the following happens:

Timestamp 1:
=================================================================================
Evolution freezes and the CPU load of evolution grow, see (top):
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                         
 8332 bart      20   0 4248320 667760  83900 R 107,7  8,3   6:21.32 evolution                                       
 4486 bart      20   0 2710800 231424  20144 S  26,6  2,9   2:14.03 evolution-calen  

A create with gdb --batch --ex "t a a bt" -pid=8332 &>bt-evolution-1.txt a backtrace of the evolution process.

 
Timestamp 2: 
=================================================================================
Evolution is still freezing and the load of the evolution-calendar servers increase including the number of threads: > 500
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                         
 4486 bart      20   0 3114880 263368  20144 S 109,9  3,3   3:24.25 evolution-calen                                 
 8332 bart      20   0 4315920 689464  83904 R 104,3  8,5   8:28.94 evolution

I create the following backtraces:
gdb --batch --ex "t a a bt" -pid=4486 &>bt-evolution-calendar-2.txt
gdb --batch --ex "t a a bt" -pid=8332 &>bt-evolution-2.txt


Timestamp 3:
=================================================================================
Evolution is still freezing and the number of threads is still increasing: > 1500

I create the following backtraces:
gdb --batch --ex "t a a bt" -pid=4486 &>bt-evolution-calendar-3.txt
gdb --batch --ex "t a a bt" -pid=8332 &>bt-evolution-3.txt


I kill the evolution process.

Comment 11 Bart Ratgers 2014-10-23 13:19:41 UTC

Created attachment 949869 [details]
bt-evolution-1.txt --> Timestamp 1

The result of:gdb --batch --ex "t a a bt" -pid=8332 &>bt-evolution-1.txt

Comment 12 Bart Ratgers 2014-10-23 13:21:37 UTC

Created attachment 949870 [details]
bt-evolution-2.txt --> Timestamp 2

bt-evolution-1.txt --> Timestamp 2

The result of:gdb --batch --ex "t a a bt" -pid=8332 &>bt-evolution-2.txt

Comment 13 Bart Ratgers 2014-10-23 13:24:25 UTC

Created attachment 949873 [details]
bt-evolution-calendar-2.txt --> Timestamp 2

The result of:gdb --batch --ex "t a a bt" -pid=8332 &>bt-evolution-calendar-2.txt

Comment 14 Bart Ratgers 2014-10-23 13:27:34 UTC

Created attachment 949874 [details]
bt-evolution-3.txt --> Timestamp 3

The result of:gdb --batch --ex "t a a bt" -pid=8332 &>bt-evolution-3.txt

Comment 15 Bart Ratgers 2014-10-23 13:29:57 UTC

Created attachment 949875 [details]
bt-evolution-calendar-3.txt --> Timestamp 3

The result of:gdb --batch --ex "t a a bt" -pid=8332 &>bt-evolution-3.txt

Comment 16 Bart Ratgers 2014-10-23 14:46:36 UTC

Just another temporary freeze (approximately 2 minutes) while opening a new E-mail window, after push the button 'new E-mail' 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                         
  543 bart      20   0 4143736 286312  80276 R 112,5  3,5   1:52.94 evolution                                       
 4486 bart      20   0 3090896 238912  20028 S 106,2  3,0   8:29.33 evolution-calen 


Backtrace:
=================================================================================
gdb --batch --ex "t a a bt" -pid=4486 &>bt-evolution-calendar-4.txt
gdb --batch --ex "t a a bt" -pid=8332 &>bt-evolution-4.txt

Comment 17 Bart Ratgers 2014-10-23 14:47:37 UTC

Created attachment 949948 [details]
bt-evolution-4.txt --> Timestamp 4

gdb --batch --ex "t a a bt" -pid=8332 &>bt-evolution-4.txt

Comment 18 Bart Ratgers 2014-10-23 14:48:42 UTC

Created attachment 949954 [details]
bt-evolution-calendar-4.txt --> Timestamp 4

gdb --batch --ex "t a a bt" -pid=4486 &>bt-evolution-calendar-4.txt

Comment 19 Milan Crha 2014-10-23 16:43:37 UTC

Thanks for the update. I see in the calendar factory that some CalDAV calendars (I do not know from it how many you have) are starting new views, the same as some file backend (On This Computer calendar) - those are all the threads about. How many of the calendars of each type do you have configured, please? It can partly influence the behaviour.

The evolution backtrace doesn't show the reason why it begun the flood, but I'm sure this is fixed for 3.14.0 (will be released in the spring of the next year). The change cannot be "backported" into 3.10 or even 3.12, because it was a huge change in the way of dealing with calendars in the Calendar view(s, including Memos and Tasks).

Comment 20 Bart Ratgers 2014-10-23 20:46:01 UTC

Hello Milan,

Thank you for the quick response and I'm glad to see that a possible fix will be released next year. For your information I have 4 google calendars configured. All those calendars using the same google account. 

Is there a way that I can investigate the origin of the flood?

Comment 21 Milan Crha 2014-10-24 05:40:53 UTC

(In reply to Bart Ratgers from comment #20)
> Is there a way that I can investigate the origin of the flood?

I think it's caused by changes in the left tree of the Calendar view, like when you enable/disable the calendar. The current code basically stops all views and runs new, for all enabled calendars. This was causing unnecessary workload on the backend and client side and is fixed in the current development version. As I cannot backport the whole change, I will try to fix at least this particular part, to limit the workload on both sides, for 3.12.x (Fedora 21).

Thinking of it, better to deal with it upstream, for better visibility. Please see [1] for any further updates.

[1] https://bugzilla.gnome.org/show_bug.cgi?id=739106

Comment 22 Milan Crha 2014-10-27 09:26:19 UTC

Just as a follow-up, I checked the 3.12.7 code and tried to reproduce it, but no luck there, thus I consider this [2] being fixed in 3.12.x.

[2] I mean the issue with full rebuilds of all calendar views when anything changes in the source selector - it is not what is happening in 3.12.7.

Note You need to log in before you can comment on or make changes to this bug.