1317369 – [abrt] evolution: camel_imapx_folder_set_mailbox(): evolution killed by SIGSEGV

Bug 1317369 - [abrt] evolution: camel_imapx_folder_set_mailbox(): evolution killed by SIGSEGV

Summary: [abrt] evolution: camel_imapx_folder_set_mailbox(): evolution killed by SIGSEGV

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	evolution
Sub Component:
Version:	23
Hardware:	x86_64
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Milan Crha
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:	https://retrace.fedoraproject.org/faf...
Whiteboard:	abrt_hash:a25e1de1dcf5f563530cab1bc5c...
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-03-14 06:45 UTC by Heiko Adams
Modified:	2016-12-20 19:26 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2016-12-20 19:26:12 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
File: backtrace (127.90 KB, text/plain) 2016-03-14 06:45 UTC, Heiko Adams	no flags	Details
File: cgroup (148 bytes, text/plain) 2016-03-14 06:45 UTC, Heiko Adams	no flags	Details
File: core_backtrace (87.23 KB, text/plain) 2016-03-14 06:45 UTC, Heiko Adams	no flags	Details
File: dso_list (26.55 KB, text/plain) 2016-03-14 06:45 UTC, Heiko Adams	no flags	Details
File: environ (1.13 KB, text/plain) 2016-03-14 06:46 UTC, Heiko Adams	no flags	Details
File: exploitable (100 bytes, text/plain) 2016-03-14 06:46 UTC, Heiko Adams	no flags	Details
File: limits (1.29 KB, text/plain) 2016-03-14 06:46 UTC, Heiko Adams	no flags	Details
File: maps (127.51 KB, text/plain) 2016-03-14 06:46 UTC, Heiko Adams	no flags	Details
File: mountinfo (3.46 KB, text/plain) 2016-03-14 06:46 UTC, Heiko Adams	no flags	Details
File: namespaces (85 bytes, text/plain) 2016-03-14 06:46 UTC, Heiko Adams	no flags	Details
File: open_fds (4.78 KB, text/plain) 2016-03-14 06:46 UTC, Heiko Adams	no flags	Details
File: proc_pid_status (1.04 KB, text/plain) 2016-03-14 06:46 UTC, Heiko Adams	no flags	Details
File: var_log_messages (1.75 KB, text/plain) 2016-03-14 06:46 UTC, Heiko Adams	no flags	Details
Valgrind run from 5/23/2016 (70.25 KB, application/octet-stream) 2016-05-24 13:25 UTC, Michael H. Warfield	no flags	Details
Debug log file for imapx (2.61 KB, application/x-gzip) 2016-05-26 18:31 UTC, Michael H. Warfield	no flags	Details
Debug log from the Valgrind run w/ glib2, glib-networking, and gnutls symbols (47.11 KB, application/x-gzip) 2016-05-26 18:33 UTC, Michael H. Warfield	no flags	Details
Server log for Dovecot imap service from Remus.WittsEnd.com (3.92 KB, application/x-gzip) 2016-05-26 18:34 UTC, Michael H. Warfield	no flags	Details
View All

Description Heiko Adams 2016-03-14 06:45:46 UTC

Version-Release number of selected component:
evolution-3.18.5.1-1.fc23

Additional info:
reporter:       libreport-2.6.4
backtrace_rating: 4
cmdline:        evolution
crash_function: camel_imapx_folder_set_mailbox
executable:     /usr/bin/evolution
global_pid:     2936
kernel:         4.4.5-300.fc23.x86_64
runlevel:       N 5
type:           CCpp
uid:            1000

Truncated backtrace:
Thread no. 1 (6 frames)
 #0 camel_imapx_folder_set_mailbox at camel-imapx-folder.c:1194
 #1 camel_imapx_folder_list_mailbox at camel-imapx-folder.c:1290
 #2 imapx_synchronize_sync at camel-imapx-folder.c:774
 #3 camel_folder_synchronize_sync at camel-folder.c:3636
 #4 session_job_thread at camel-session.c:194
 #6 g_thread_proxy at gthread.c:778

Comment 1 Heiko Adams 2016-03-14 06:45:53 UTC

Created attachment 1136046 [details]
File: backtrace

Comment 2 Heiko Adams 2016-03-14 06:45:54 UTC

Created attachment 1136047 [details]
File: cgroup

Comment 3 Heiko Adams 2016-03-14 06:45:57 UTC

Created attachment 1136048 [details]
File: core_backtrace

Comment 4 Heiko Adams 2016-03-14 06:45:58 UTC

Created attachment 1136049 [details]
File: dso_list

Comment 5 Heiko Adams 2016-03-14 06:46:00 UTC

Created attachment 1136050 [details]
File: environ

Comment 6 Heiko Adams 2016-03-14 06:46:01 UTC

Created attachment 1136051 [details]
File: exploitable

Comment 7 Heiko Adams 2016-03-14 06:46:03 UTC

Created attachment 1136052 [details]
File: limits

Comment 8 Heiko Adams 2016-03-14 06:46:06 UTC

Created attachment 1136053 [details]
File: maps

Comment 9 Heiko Adams 2016-03-14 06:46:08 UTC

Created attachment 1136054 [details]
File: mountinfo

Comment 10 Heiko Adams 2016-03-14 06:46:09 UTC

Created attachment 1136055 [details]
File: namespaces

Comment 11 Heiko Adams 2016-03-14 06:46:10 UTC

Created attachment 1136056 [details]
File: open_fds

Comment 12 Heiko Adams 2016-03-14 06:46:12 UTC

Created attachment 1136057 [details]
File: proc_pid_status

Comment 13 Heiko Adams 2016-03-14 06:46:13 UTC

Created attachment 1136058 [details]
File: var_log_messages

Comment 14 Milan Crha 2016-03-14 18:46:36 UTC

Thanks for a bug report. I see that the crash happened when the evolution was about to update content of one of your IMAP folders, but it failed due to (probably) some memory issue, I'd guess use-after-free or similar, maybe caused by a re/unref imbalance.

Do you know what had been done in the evolution before the crash, please? I can think of actions like folder creation, rename, deletion, subscribing, unsubscribing and similar, involving changes on the folder list (probably).

Comment 15 Heiko Adams 2016-03-15 08:42:59 UTC

IIRC i just started Evolution and wanted to read a mail while Evolution was still fetching mails from the remote mailboxes.

Comment 16 Michael H. Warfield 2016-05-21 20:05:07 UTC

I may have run into a related problem (that may be this problem) with Evolution 3.18.5.2 that's a memory issue and could be related to comment #14.

I'm using Evolution on Fedora 23 and several months ago I noticed that my system would occasionally slow to a crawl and, eventually, Evolution would crash out (started from gui). I have a massive number of imap mailboxes from all the Linux related projects I've been involved in down through the years.

Last few weeks, looking into it, I noticed Evolution would be taking more CPU time than I expected (but not excessive) but that memory reported by top would be monotonically increasing over the course of hours until the system would be heavily into swap. The system is an IBM T530 laptop with 8G of RAM and 8G of swap. At one point I saw Evolution topping out over 85% of memory with less than a few meg of memory and less than 1G of swap left! It eventually crashed (presumably with a memory error) and the vast majority of that memory was immediately freed and system performance returned to normal. If I caught it soon enough, I could just exit Evolution and restart and the system would be back to normal for several more hours.

If I leave the system running with Evolution up overnight, Evolution will have crashed by the morning or the system will be highly unresponsive (load average over 16 this morning) until Evolution is quit at which time the system rapidly returns to being responsive as the rest of the running system is restored out of swap. It doesn't require any activity in the UI at moment of crash. It takes hours to reproduce as the memory utilization grows very slowly. I'm retesting this with 3.18.1 and reserving judgment as yet.

This has all the hallmarks of a severe memory leak in an Evolution component.

Comment 17 Michael H. Warfield 2016-05-21 20:12:40 UTC

Minor correction...  "massive number of imap mailboxes" is incorrect.  I have a "massive number of folders accross several imap mailboxes" would be more correct.

Comment 18 Milan Crha 2016-05-23 09:49:17 UTC

Thanks for the information. The IMAP terminology uses 'mailbox' for 'folders', thus the two are (more or less) interchangeable here. What surprised me more is that you wrote:
> I'm retesting this with 3.18.1 and reserving judgment as yet.
Did you really mean 3.18.1, or rather 3.20.1, thus the Fedora 24 version?

In any case, could I ask you to install debuginfo packages for the evolution-data-server and evolution, it's enough for these two only, thus with a command (run as root) like:
   # dnf install evolution-data-server-debuginfo evolution-debuginfo \
        --enablerepo=updates-debuginfo
eventually
   # dnf update ...
Only make sure that the installed main package and the debuginfo package will have the same version, thus the output below will be more accurate and useful.
   $ rpm -qa evolution* | sort
If you've installed other evolution packages, then install their debuginfo packages as well, please.

Once it's done, run the evolution under valgrind overnight, which may give some hints whether there are any full memory leaks, or whether some global cache is increasing its size in the time. The command (run as a regular user) is:
   $ G_SLICE=always-malloc valgrind --leak-check=full evolution &>log.txt

Ideally close the evolution gracefully in the morning, but even without that the definitely lost memory will be logged nicely, I hope. Before attaching it here, please check for any private information, though the valgrind doesn't show function arguments, neither local variables, in the log, thus the most some console messages might show your email address or a server name. Even that is unlikely, but one never knows. If you prefer, feel free to send the log only to me, to my Bugzilla emails address, only name the bug report in the subject, otherwise I would overlook the message in my spam folder.

Thanks in advance.

Comment 19 Michael H. Warfield 2016-05-23 13:21:11 UTC

No, I meant 3.18.1 as that was the Fedora 23 rollback / downgrade version.  I was under the impression that this problem had gotten worse, relatively recently, so I wanted to regression test it.  Been running 3.18.1 under gdb the last couple of nights but, while no crash, it still exhibits the monotonically increasing memory usage.  Maybe it was after upgrading to Fedora 23 (which I did in January) where this started to become problematical, and I just didn't notice it right away.

Running it under valgrind was something I had been considering but I understand it results in as much as an order of magnitude performance hit to an application, I was hesitating until that was the last option or it was requested.  It's now been requested.  May take a few days.  :-)

I'll first upgrade back to 3.18.5.2 and try that under valgrind before jumping to 3.20 (which was going to be my next option to test).

Comment 20 Milan Crha 2016-05-23 15:01:05 UTC

(In reply to Michael H. Warfield from comment #19)
> Running it under valgrind was something I had been considering but I
> understand it results in as much as an order of magnitude performance hit to
> an application, I was hesitating until that was the last option or it was
> requested.  It's now been requested.  May take a few days.  :-)

That's true, valgrind makes a big performance hit due to all the memory checking. That's why i suggested to run it during the night, especially when you see the same with the evolution just running, not only when using it (reading and writing emails and so on).

> I'll first upgrade back to 3.18.5.2 and try that under valgrind before
> jumping to 3.20 (which was going to be my next option to test).

Right, please try with the most recent version of the evolution/evolution-data-server you can find in the repositories (beware of the debuginfo packages version, they should be the same as the binary packages). The recent code contains more fixes, than the old code. I agree with the "regression test" in general, but when we end up on the valgrind run, then the most recent code is better. It's possible that the 3.20.x has some things already fixed, but it's fine to verify it from the valgrind log of the 3.18.5.2 (eds 3.18.5).

Thanks again for your help with this.

Comment 21 Michael H. Warfield 2016-05-24 01:12:27 UTC

Valgrind run is running now.  Seen some interesting anomalies.  Memory rose to 45% of "real memory".  That percentage hasn't changed a bit since starting it several hours ago.  Maybe an artifact of running it under valgrind.  "memcheck-amd64-"??? (the valgrind process) is running at 100% CPU on this quad core, so that's different as well.  Will let it continue to run through the night.

Not happy when I see these sorts of messages in the log...

"Conditional jump or move depends on uninitialised value(s)
   at 0xCEBA8D0: WTF::fastFree(void*) (in /usr/lib64/libjavascriptcoregtk-3.0.so.0.16.19)"

May be nothing but...  Seeing a lot of them with different addresses.

Tried bumping to 3.20 using dnf and releasever=24.  Dnf wanted to update the entire known universe and I wasn't prepared for that.  Maybe it's getting time to jump to Fedora 24, but still...

Downloaded the source rpm's and tried "rpm --rebuild" on the evolution rpm.  It wanted the entire new universe in dependencies right down to systemd (YUCK!) just to build! (Note: this is a developement system and I build packages for LXC, Samba, IPSec, Quagga and others so I have routine -devel packages installed)  Guess I'm not going that route, either.  I think the dependency revs seem to be much too tight on that build.  Is it really saying 3.20 can not build and run on Fedora 23?

Comment 22 Michael H. Warfield 2016-05-24 01:16:46 UTC

Some preliminary results...  Still running but I'm running a tail -f on the log and just saw this...

==13804== 524,288 bytes in 1 blocks are possibly lost in loss record 39,806 of 39,872
==13804==    at 0x4C2A988: calloc (vg_replace_malloc.c:711)
==13804==    by 0xF443530: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.4600.2)
==13804==    by 0xF42C0B7: ??? (in /usr/lib64/libglib-2.0.so.0.4600.2)
==13804==    by 0xF42C31E: ??? (in /usr/lib64/libglib-2.0.so.0.4600.2)
==13804==    by 0xF42C709: ??? (in /usr/lib64/libglib-2.0.so.0.4600.2)
==13804==    by 0x298FB61B: imapx_server_process_fetch_changes_infos.isra.5 (camel-imapx-server.c:4706)
==13804==    by 0x29902C87: imapx_server_fetch_changes (camel-imapx-server.c:4781)
==13804==    by 0x29903285: camel_imapx_server_refresh_info_sync (camel-imapx-server.c:4958)
==13804==    by 0x298E8670: imapx_conn_manager_refresh_info_run_sync (camel-imapx-conn-manager.c:1450)
==13804==    by 0x298F41CD: camel_imapx_job_run_sync (camel-imapx-job.c:473)
==13804==    by 0x298ECA12: camel_imapx_conn_manager_run_job_sync (camel-imapx-conn-manager.c:1234)
==13804==    by 0x298ED9D0: camel_imapx_conn_manager_refresh_info_sync (camel-imapx-conn-manager.c:1477)
==13804== 
==13804== 524,288 bytes in 1 blocks are possibly lost in loss record 39,807 of 39,872
==13804==    at 0x4C2A988: calloc (vg_replace_malloc.c:711)
==13804==    by 0xF443530: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.4600.2)
==13804==    by 0xF42C0D6: ??? (in /usr/lib64/libglib-2.0.so.0.4600.2)
==13804==    by 0xF42C31E: ??? (in /usr/lib64/libglib-2.0.so.0.4600.2)
==13804==    by 0xF42C709: ??? (in /usr/lib64/libglib-2.0.so.0.4600.2)
==13804==    by 0x298FB61B: imapx_server_process_fetch_changes_infos.isra.5 (camel-imapx-server.c:4706)
==13804==    by 0x29902C87: imapx_server_fetch_changes (camel-imapx-server.c:4781)
==13804==    by 0x29903285: camel_imapx_server_refresh_info_sync (camel-imapx-server.c:4958)
==13804==    by 0x298E8670: imapx_conn_manager_refresh_info_run_sync (camel-imapx-conn-manager.c:1450)
==13804==    by 0x298F41CD: camel_imapx_job_run_sync (camel-imapx-job.c:473)
==13804==    by 0x298ECA12: camel_imapx_conn_manager_run_job_sync (camel-imapx-conn-manager.c:1234)
==13804==    by 0x298ED9D0: camel_imapx_conn_manager_refresh_info_sync (camel-imapx-conn-manager.c:1477)
==13804== 
==13804== 1,002,776 bytes in 769 blocks are possibly lost in loss record 39,830 of 39,872
==13804==    at 0x4C28BF6: malloc (vg_replace_malloc.c:299)
==13804==    by 0x77E7656: ??? (in /usr/lib64/libsqlite3.so.0.8.6)
==13804==    by 0x77C2D3B: ??? (in /usr/lib64/libsqlite3.so.0.8.6)
==13804==    by 0x77C6396: ??? (in /usr/lib64/libsqlite3.so.0.8.6)
==13804==    by 0x77C6E6B: ??? (in /usr/lib64/libsqlite3.so.0.8.6)
==13804==    by 0x77FCC74: ??? (in /usr/lib64/libsqlite3.so.0.8.6)
==13804==    by 0x77FDC8F: ??? (in /usr/lib64/libsqlite3.so.0.8.6)
==13804==    by 0x77FDE77: ??? (in /usr/lib64/libsqlite3.so.0.8.6)
==13804==    by 0x781F701: ??? (in /usr/lib64/libsqlite3.so.0.8.6)
==13804==    by 0x7828A06: sqlite3_step (in /usr/lib64/libsqlite3.so.0.8.6)
==13804==    by 0x78298C9: sqlite3_exec (in /usr/lib64/libsqlite3.so.0.8.6)
==13804==    by 0x782A043: ??? (in /usr/lib64/libsqlite3.so.0.8.6)
==13804== 
==13804== 2,589,744 bytes in 1,986 blocks are possibly lost in loss record 39,854 of 39,872
==13804==    at 0x4C28BF6: malloc (vg_replace_malloc.c:299)
==13804==    by 0x77E7656: ??? (in /usr/lib64/libsqlite3.so.0.8.6)
==13804==    by 0x77C2D3B: ??? (in /usr/lib64/libsqlite3.so.0.8.6)
==13804==    by 0x77C6396: ??? (in /usr/lib64/libsqlite3.so.0.8.6)
==13804==    by 0x77C6E6B: ??? (in /usr/lib64/libsqlite3.so.0.8.6)
==13804==    by 0x77FCC74: ??? (in /usr/lib64/libsqlite3.so.0.8.6)
==13804==    by 0x77FDC8F: ??? (in /usr/lib64/libsqlite3.so.0.8.6)
==13804==    by 0x77FDE77: ??? (in /usr/lib64/libsqlite3.so.0.8.6)
==13804==    by 0x781F701: ??? (in /usr/lib64/libsqlite3.so.0.8.6)
==13804==    by 0x7828A06: sqlite3_step (in /usr/lib64/libsqlite3.so.0.8.6)
==13804==    by 0x78298C9: sqlite3_exec (in /usr/lib64/libsqlite3.so.0.8.6)
==13804==    by 0x74C9577: cdb_sql_exec (camel-db.c:513)
==13804== 
==13804== LEAK SUMMARY:
==13804==    definitely lost: 159,155 bytes in 1,243 blocks
==13804==    indirectly lost: 271,352 bytes in 5,626 blocks
==13804==      possibly lost: 5,080,826 bytes in 3,257 blocks
==13804==    still reachable: 218,252,146 bytes in 4,462,942 blocks
==13804==                       of which reachable via heuristic:
==13804==                         length64           : 8,275,661 bytes in 56,679 blocks
==13804==                         newarray           : 8,688 bytes in 87 blocks
==13804==                         multipleinheritance: 1,800 bytes in 1 blocks
==13804==         suppressed: 0 bytes in 0 blocks
==13804== Reachable blocks (those to which a pointer was found) are not shown.
==13804== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==13804== 
==13804== For counts of detected and suppressed errors, rerun with: -v
==13804== Use --track-origins=yes to see where uninitialised values come from
==13804== ERROR SUMMARY: 21576 errors from 440 contexts (suppressed: 0 from 0)

Comment 23 Milan Crha 2016-05-24 08:32:50 UTC

(In reply to Michael H. Warfield from comment #21)
> Valgrind run is running now.  Seen some interesting anomalies.  Memory rose
> to 45% of "real memory".  That percentage hasn't changed a bit since
> starting it several hours ago.  Maybe an artifact of running it under
> valgrind.  "memcheck-amd64-"??? (the valgrind process) is running at 100%
> CPU on this quad core, so that's different as well.

It's expected.

> "Conditional jump or move depends on uninitialised value(s)
>    at 0xCEBA8D0: WTF::fastFree(void*) (in
> /usr/lib64/libjavascriptcoregtk-3.0.so.0.16.19)"
> 
> May be nothing but...  Seeing a lot of them with different addresses.

I have generated a valgrind suppression for it. It can happen due to read-ahead or something like that.

> Tried bumping to 3.20 using dnf and releasever=24.  Dnf wanted to update the
> entire known universe and I wasn't prepared for that.

That's correct, because you tried to install something built in koji, which uses the latest packages from it, thus requires them.

> I think the dependency revs seem to be much too tight on that build.
> Is it really saying 3.20 can not build and run on Fedora 23?

No, it can be build, but you do not need to build only the evolution, but also the evolution-data-server, then install it, and only then build the evolution. The evolution-data-server is used by other projects, like gnome-shell, which needs to be rebuild too in order to be able to install the newer evolution-data-server, and so on. Not using rpm/fedpkg, but build into a separate prefix is doable, I use it all the time. You then need to tell D-Bus where the new services are located ($PREFIX/share/dbus-1/services/) thus it knows where to find the correct background processes. It's still a nightmare under GNOME Shell, which keeps restarting the evolution-calendar-factory when it disappears from the D-Bus. Some developers (not me) use jhbuild.

In other words, it's doable to run 3.20.x of the evolution* software in Fedora 23, even 22 and maybe older, but it's not trivial to achieve it.

Comment 24 Milan Crha 2016-05-24 08:38:37 UTC

(In reply to Michael H. Warfield from comment #22)
> ==13804== 524,288 bytes in 1 blocks are possibly lost in loss record 39,806
> of 39,872
> ==13804==    at 0x4C2A988: calloc (vg_replace_malloc.c:711)
> ==13804==    by 0xF443530: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.4600.2)
> ==13804==    by 0xF42C0B7: ??? (in /usr/lib64/libglib-2.0.so.0.4600.2)

Hmm, something allocated in glib2. I hoped we'd not need this library (its debuginfo). If you'll be going to install then, then install also glib-networking-debuginfo, as they go side by side. Similarly with gnutls, but that's quite low.

> ...
> ==13804==    definitely lost: 159,155 bytes in 1,243 blocks
> ==13804==    indirectly lost: 271,352 bytes in 5,626 blocks
> ==13804==      possibly lost: 5,080,826 bytes in 3,257 blocks

The most interesting are the 'definitely lost', where it's assured that nothing references them. The 'possibly lost' is a memory, which is still referenced from somewhere, like from some global caches. These snapshots in the middle of the run are misleading, they usually happen on fork() or similar occasions, where the caches might not be properly freed. On the other hand, I agree that if the global cache is constantly made larger and freed at the application quit, then the valgrind log might not show the memory as not freed. Still, logs from valgrind with graceful quit are usually more useful than those where the application was killed or crashed.

Comment 25 Michael H. Warfield 2016-05-24 13:23:14 UTC

Ok...  Ran overnight.  Started late yesterday afternoon, so we're probably talking an 18+ hour run.  Running under Valgind, memory use didn't get above 22.8.  Skimmmed the logs.  No passwords, just my user id in a few places (but my userid and domain are known around the world anyways).

Comment 26 Michael H. Warfield 2016-05-24 13:25:49 UTC

Created attachment 1161046 [details]
Valgrind run from 5/23/2016

Valgrind log from overnight run of 3.18.5.2.

Comment 27 Michael H. Warfield 2016-05-24 13:28:24 UTC

Hope that helps.  Oh, and I gracefully exited the program this morning.  It didn't crash, I didn't kill it, and I didn't have to use --force-shutdown.  Just Cntrl-Q.

Comment 28 Milan Crha 2016-05-25 11:32:18 UTC

Thanks for the update. The most leaking part seems to be related to the connections to a server:

 458,032 (99,760 direct, 358,272 indirect) bytes in 2,494 blocks are definitely lost in loss record 30,172 of 30,241
    at 0xF1D4C88: g_type_create_instance (in /usr/lib64/libgobject-2.0.so.0.4600.2)
    by 0xF1B6B7A: ??? (in /usr/lib64/libgobject-2.0.so.0.4600.2)
    by 0xF1B8460: g_object_newv (in /usr/lib64/libgobject-2.0.so.0.4600.2)
    by 0xF1B8D93: g_object_new (in /usr/lib64/libgobject-2.0.so.0.4600.2)
    by 0xEE88340: g_native_socket_address_new (in /usr/lib64/libgio-2.0.so.0.4600.2)
    by 0xEE9D2D4: g_socket_address_new_from_native (in /usr/lib64/libgio-2.0.so.0.4600.2)
    by 0xEE9AA10: g_socket_receive_message (in /usr/lib64/libgio-2.0.so.0.4600.2)
    by 0xEEC7D3E: ??? (in /usr/lib64/libgio-2.0.so.0.4600.2)
    by 0xEE97AE0: ??? (in /usr/lib64/libgio-2.0.so.0.4600.2)
    by 0xF43DE39: g_main_context_dispatch (in /usr/lib64/libglib-2.0.so.0.4600.2)
    by 0xF43E1CF: ??? (in /usr/lib64/libglib-2.0.so.0.4600.2)
    by 0xF43E4F1: g_main_loop_run (in /usr/lib64/libglib-2.0.so.0.4600.2)

Could you install debuginfo for the glib2 and the glib-networking packages, then rerun valgrind with the below command, please? It'll log in more detail (longer backtraces) too.

   $ G_SLICE=always-malloc valgrind --leak-check=full --num-callers=40 \
         evolution &>log.txt

By the way, do you have any issues with the connection to the server, causing the evolution to reconnect to the server often? I'd guess there is happening something with the connection. If you run evolution with a debugging for the IMAPx connection manager, then it'll show whether it needs reconnect or not. I would not mix such log with the valgrind log, thus rather run it separately. It also doesn't have any performance impact, thus it can be logged during your usual work. The command is:
   $ CAMEL_DEBUG=imapx:conman evolution &>imapx.log

Also, what are the versions of the glib2 and glib-networking packages, please?
   $ rpm -q glib2 glib-networking

Thanks in advance.

Comment 29 Milan Crha 2016-05-25 12:09:23 UTC

I fixed two other memory leaks from your log upstream, with commit 1965f43 [1] in eds master for 3.21.3+ and commit 4f2f41a in eds gnome-3-20 for 3.20.3+.

[1] https://git.gnome.org/browse/evolution-data-server/commit/?id=1965f43

Comment 30 Michael H. Warfield 2016-05-25 14:29:19 UTC

(In reply to Milan Crha from comment #28)
> Thanks for the update. The most leaking part seems to be related to the
> connections to a server:
> 
>  458,032 (99,760 direct, 358,272 indirect) bytes in 2,494 blocks are
> definitely lost in loss record 30,172 of 30,241
>     at 0xF1D4C88: g_type_create_instance (in
> /usr/lib64/libgobject-2.0.so.0.4600.2)
>     by 0xF1B6B7A: ??? (in /usr/lib64/libgobject-2.0.so.0.4600.2)
>     by 0xF1B8460: g_object_newv (in /usr/lib64/libgobject-2.0.so.0.4600.2)
>     by 0xF1B8D93: g_object_new (in /usr/lib64/libgobject-2.0.so.0.4600.2)
>     by 0xEE88340: g_native_socket_address_new (in
> /usr/lib64/libgio-2.0.so.0.4600.2)
>     by 0xEE9D2D4: g_socket_address_new_from_native (in
> /usr/lib64/libgio-2.0.so.0.4600.2)
>     by 0xEE9AA10: g_socket_receive_message (in
> /usr/lib64/libgio-2.0.so.0.4600.2)
>     by 0xEEC7D3E: ??? (in /usr/lib64/libgio-2.0.so.0.4600.2)
>     by 0xEE97AE0: ??? (in /usr/lib64/libgio-2.0.so.0.4600.2)
>     by 0xF43DE39: g_main_context_dispatch (in
> /usr/lib64/libglib-2.0.so.0.4600.2)
>     by 0xF43E1CF: ??? (in /usr/lib64/libglib-2.0.so.0.4600.2)
>     by 0xF43E4F1: g_main_loop_run (in /usr/lib64/libglib-2.0.so.0.4600.2)
> 
> Could you install debuginfo for the glib2 and the glib-networking packages,
> then rerun valgrind with the below command, please? It'll log in more detail
> (longer backtraces) too.

I had the glib2-debuginfo installed (but not before the run I posted) but didn't yet have the glib-networking-debuginfo installed.  I've got that now and preparing a new run.

ITMT, I got 3.20.2 built.  I basically dropped back to plan B and installed the srpms, then hand edited the spec files to remove the Koji inserted depends and resolved any other dependencies until it built.  Got Evolution-data-server and Evolution built and installed (with a little forcing around gnome-shell - bah).

Runs both with and without valgrind seem to parallel the 3.18.5 cases.  I'm not sure if there's much of a noticeble difference between the two.  Memory still increase, though, without valgrind, I only saw it reach around 60%, but the run was shorter.  I'm running 3.20.2 now.

>    $ G_SLICE=always-malloc valgrind --leak-check=full --num-callers=40 \
>          evolution &>log.txt

> By the way, do you have any issues with the connection to the server,
> causing the evolution to reconnect to the server often?

Funny you should mention that.  Define "often".  I'm currently actively connecting to 3 servers.

wittsend.com (mine at my colo-site).
malamber.org (my son's also at my colo-site)
uucg.org (hosted at DreamHost)

I occasionally connect to these servers (but have not during these runs) directly but generally just pull them into the WittsEnd server using fetchmail:

gmail.com (generally accessed using fetchmail from my central server)
samba.org (I'm a Samba Team developer / team member / security consultant)
att.net (ISP account - generally accessed using fetchmail)
charter.net (ISP account - generally accessed using fetchmail)

All are accessed over imaps.

Over the last couple of months I've been seeing connection failures to the uucg.org server (at DreamHost) with the errors "failed TLS negotiation".  This last week (actually after posting to this initially) I was upgrading "malamber.org" to F23 and was getting some connection failures, "connection timeout", during that upgrade process (not unexpected).  So, yeah, I've been having some errors, though not particularly frequent per se and all three servers appear to be rock solid, ATM.

> I'd guess there is
> happening something with the connection. If you run evolution with a
> debugging for the IMAPx connection manager, then it'll show whether it needs
> reconnect or not. I would not mix such log with the valgrind log, thus
> rather run it separately. It also doesn't have any performance impact, thus
> it can be logged during your usual work. The command is:
>    $ CAMEL_DEBUG=imapx:conman evolution &>imapx.log

I will do that.

> Also, what are the versions of the glib2 and glib-networking packages,
> please?
>    $ rpm -q glib2 glib-networking

[mhw@canyon ~]$ rpm -qa glib2\* glib-network\*
glib2-debuginfo-2.46.2-1.fc23.x86_64
glib2-devel-2.46.2-1.fc23.x86_64
glib-networking-2.46.1-1.fc23.x86_64
glib-networking-debuginfo-2.46.1-1.fc23.x86_64
glib2-doc-2.46.2-1.fc23.noarch
glib2-2.46.2-1.fc23.i686
glib2-2.46.2-1.fc23.x86_64

> Thanks in advance.

I'll post the logs as they become available.

Thanks!

Comment 31 Michael H. Warfield 2016-05-26 18:29:26 UTC

Ok, results from the latest runs are on the away and will be attached shortly.

All runs were with 3.20.2 now.

I ran the imapx.log run first.  That ran for a few hours.  Memory usage topped over 60% and it eventually crashed.

The Valgrind run with debugging symbols for glib2, glib-networking, and gnutls followed.  That followed the same trajectory in memory usage and crashed overnight with a console message saying simply "Killed".

Since you asked about server connections, disconnections, and reconnections, I snatched the Dovecot imap log messages from my WittsEnd server, Romulus.  That will also be attached.

Worth noting...  During the two runs, I did NOT receive any warnings about "TLS failure to negotiate" or "Unable to connect".  However, there do appear to be some similar errors in the imapx log like this:

-- 
[imapx:L] Removed connection 0x559d93480380 (server:0x559d940cff00) due to error: Error performing NOOP: TLS connection closed unexpectedly
-- 

I presume a routine retry recovered successfully and I didn't see an error in the Evolution UI.

Files to follow...

Comment 32 Michael H. Warfield 2016-05-26 18:31:15 UTC

Created attachment 1162236 [details]
Debug log file for imapx

Log file during the debugging run...

CAMEL_DEBUG=imapx:conman evolution &>imapx.log

Comment 33 Michael H. Warfield 2016-05-26 18:33:18 UTC

Created attachment 1162237 [details]
Debug log from the Valgrind run w/ glib2, glib-networking, and gnutls symbols

Second Valgrind run.  This time with the debuginfo for glib2, glib-networking, and gnutls.

Comment 34 Michael H. Warfield 2016-05-26 18:34:19 UTC

Created attachment 1162238 [details]
Server log for Dovecot imap service from Remus.WittsEnd.com

Comment 35 Milan Crha 2016-05-27 07:21:29 UTC

Thanks for the update. If I read the dovecot log properly, then there are mixed login attempts of two users. One uses IPv6 (it might be you), the other one uses IPv4. The log doesn't show any reason why the connection was closed, but seeing the time stamps, the connection lasted for an hour or longer, which is pretty good.

The IMAPx log shows "TLS connection closed unexpectedly", which refers to [1]. If you read the description there, then it is suspicious behaviour from their (glib/gio) side. The dovecot log doesn't show anything suspicious, though.

I see from the valgrind log that the evolution-ews also requires rebuild, against updated evolution-data-server and evolution. The other runtime warning:
> No provider available for protocol 'imap4'
is also interesting, because 'imap4' is a very old IMAP implementation, which was rather for testing, than for a production usage. If it's it, of course.

Finally, the log shows this as the most leaking part:
 118,864 (25,840 direct, 93,024 indirect) bytes in 646 blocks are definitely lost in loss record 25,430 of 25,505
    at 0xF1E9C88: g_type_create_instance (gtype.c:1853)
    by 0xF1CBB7A: g_object_new_internal (gobject.c:1779)
    by 0xF1CD460: g_object_newv (gobject.c:1926)
    by 0xF1CDD93: g_object_new (gobject.c:1619)
    by 0xEE9D340: g_native_socket_address_new (gnativesocketaddress.c:150)
    by 0xEEB22D4: g_socket_address_new_from_native (gsocketaddress.c:303)
    by 0xEEAFA10: cache_recv_address (gsocket.c:4333)
    by 0xEEAFA10: g_socket_receive_message (gsocket.c:4574)
    by 0xEEDCD3E: read_netlink_messages (gnetworkmonitornetlink.c:324)
    by 0xEEACAE0: socket_source_dispatch (gsocket.c:3284)
    by 0xF452E39: g_main_dispatch (gmain.c:3154)
    by 0xF452E39: g_main_context_dispatch (gmain.c:3769)
    by 0xF4531CF: g_main_context_iterate.isra.29 (gmain.c:3840)
    by 0xF4534F1: g_main_loop_run (gmain.c:4034)

This comes from the GNetworkMonitor implementation, which is responsible to notify the application about network changes (when the machine connects/disconnects to the Internet, when new routes are available and so on.
I moved this part upstream as [2].

If you want to check what the GNetworkMonitor things about your connection, then get a test program from bug #1302658 comment #4 and just run it for some time and see what it does, as it'll print any network changes on the console. When you'll keep it use the default, which is probably netlink, then you'll see what the kernel detects on your network connection.

[1] https://developer.gnome.org/gio/2.46/gio-TLS-Overview.html#G-TLS-ERROR-EOF:CAPS
[2] https://bugzilla.gnome.org/show_bug.cgi?id=766933

Comment 36 Fedora End Of Life 2016-11-24 16:04:10 UTC

This message is a reminder that Fedora 23 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 23. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '23'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 23 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 37 Fedora End Of Life 2016-12-20 19:26:12 UTC

Fedora 23 changed to end-of-life (EOL) status on 2016-12-20. Fedora 23 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.