1914917 – [abrt] evolution: camel_imapx_folder_set_mailbox(): evolution killed by SIGSEGV

Bug 1914917 - [abrt] evolution: camel_imapx_folder_set_mailbox(): evolution killed by SIGSEGV

Summary: [abrt] evolution: camel_imapx_folder_set_mailbox(): evolution killed by SIGSEGV

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	evolution-data-server
Sub Component:
Version:	33
Hardware:	x86_64
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Milan Crha
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:	https://retrace.fedoraproject.org/faf...
Whiteboard:	abrt_hash:a25e1de1dcf5f563530cab1bc5c...
Duplicates (1):	1925422 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-01-11 13:30 UTC by Tadej Janež
Modified:	2021-02-15 01:31 UTC (History)
CC List:	8 users (show)
Fixed In Version:	evolution-data-server-3.39.2 evolution-data-server-3.38.4 evolution-data-server-3.38.4-1.fc33
Clone Of:
Environment:
Last Closed:	2021-02-08 11:08:27 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
File: backtrace (147.48 KB, text/plain) 2021-01-11 13:31 UTC, Tadej Janež	no flags	Details
File: core_backtrace (65.39 KB, text/plain) 2021-01-11 13:31 UTC, Tadej Janež	no flags	Details
File: cpuinfo (2.40 KB, text/plain) 2021-01-11 13:31 UTC, Tadej Janež	no flags	Details
File: dso_list (8.05 KB, text/plain) 2021-01-11 13:31 UTC, Tadej Janež	no flags	Details
File: environ (2.45 KB, text/plain) 2021-01-11 13:31 UTC, Tadej Janež	no flags	Details
File: exploitable (82 bytes, text/plain) 2021-01-11 13:31 UTC, Tadej Janež	no flags	Details
File: limits (1.29 KB, text/plain) 2021-01-11 13:31 UTC, Tadej Janež	no flags	Details
File: maps (63.98 KB, text/plain) 2021-01-11 13:31 UTC, Tadej Janež	no flags	Details
File: mountinfo (2.85 KB, text/plain) 2021-01-11 13:31 UTC, Tadej Janež	no flags	Details
File: open_fds (7.22 KB, text/plain) 2021-01-11 13:31 UTC, Tadej Janež	no flags	Details
File: proc_pid_status (1.35 KB, text/plain) 2021-01-11 13:31 UTC, Tadej Janež	no flags	Details
Output of coredumpctl debug <pid> for an incomplete coredump (14.29 KB, text/plain) 2021-01-12 11:37 UTC, Tadej Janež	no flags	Details
valgrind-2021-01-13T10:13:33+01:00.txt (5.82 KB, text/plain) 2021-01-13 09:28 UTC, Tadej Janež	no flags	Details
Output of coredumpctl debug <pid> for a complete coredump (29.77 KB, text/plain) 2021-01-13 11:58 UTC, Tadej Janež	no flags	Details
valgrind-2021-01-14T09:55:31+01:00.txt (8.27 KB, text/plain) 2021-01-14 09:41 UTC, Tadej Janež	no flags	Details
valgrind-2021-01-14T09:55:31+01:00.txt (9.40 KB, text/plain) 2021-01-14 09:52 UTC, Tadej Janež	no flags	Details
Output of coredumpctl debug 3351 that is truncated (17.68 KB, text/plain) 2021-01-25 13:20 UTC, Tadej Janež	no flags	Details
Output of "coredumpctl debug 52305" (26.27 KB, text/plain) 2021-01-27 21:01 UTC, Tadej Janež	no flags	Details
Backtraces for 0x7f8f440d7c20 corresponding to "coredumpctl debug 52305" (47.07 KB, text/plain) 2021-01-27 21:05 UTC, Tadej Janež	no flags	Details
Output of "coredumpctl debug 80768" (29.10 KB, text/plain) 2021-01-27 21:28 UTC, Tadej Janež	no flags	Details
Backtraces for 0x560a04460520 corresponding to "coredumpctl debug 80768" (47.07 KB, text/plain) 2021-01-27 21:43 UTC, Tadej Janež	no flags	Details
Backtraces for 0x7fffb80072f0 (52.30 KB, text/plain) 2021-02-02 09:28 UTC, Tadej Janež	no flags	Details
Output of gdb "t a a bt" for crash in comment 61 (38.53 KB, text/plain) 2021-02-02 10:13 UTC, Tadej Janež	no flags	Details
Show Obsolete (1) View All

Description Tadej Janež 2021-01-11 13:30:55 UTC

Description of problem:
I don't know how this happened, but it happened after updating Evolution from 3.38.1 to 3.38.3.

Version-Release number of selected component:
evolution-3.38.3-1.fc33

Additional info:
reporter:       libreport-2.14.0
backtrace_rating: 4
cgroup:         0::/user.slice/user-1000.slice/user/app-gnome-org.gnome.Evolution-6392.scope
cmdline:        evolution
crash_function: camel_imapx_folder_set_mailbox
executable:     /usr/bin/evolution
journald_cursor: s=3bb169c398ff4d10a3ec0ddf2ce10026;i=870c6;b=e49bb94f53a74be5aac9c28906a527bf;m=17c72433a;t=5b89fba000099;x=970f97ed7511d44c
kernel:         5.9.16-200.fc33.x86_64
rootdir:        /
runlevel:       N 5
type:           CCpp
uid:            1000

Truncated backtrace:
Thread no. 1 (6 frames)
 #0 camel_imapx_folder_set_mailbox at /usr/src/debug/evolution-data-server-3.38.3-1.fc33.x86_64/src/camel/providers/imapx/camel-imapx-folder.c:1371
 #1 camel_imapx_folder_list_mailbox at /usr/src/debug/evolution-data-server-3.38.3-1.fc33.x86_64/src/camel/providers/imapx/camel-imapx-folder.c:1462
 #2 imapx_synchronize_sync at /usr/src/debug/evolution-data-server-3.38.3-1.fc33.x86_64/src/camel/providers/imapx/camel-imapx-folder.c:845
 #3 camel_folder_synchronize_sync at /usr/src/debug/evolution-data-server-3.38.3-1.fc33.x86_64/src/camel/camel-folder.c:4109
 #4 session_job_thread at /usr/src/debug/evolution-data-server-3.38.3-1.fc33.x86_64/src/camel/camel-session.c:192
 #6 g_thread_proxy at ../glib/gthread.c:820

Potential duplicate: bug 1678440

Comment 1 Tadej Janež 2021-01-11 13:31:00 UTC

Created attachment 1746249 [details]
File: backtrace

Comment 2 Tadej Janež 2021-01-11 13:31:02 UTC

Created attachment 1746250 [details]
File: core_backtrace

Comment 3 Tadej Janež 2021-01-11 13:31:03 UTC

Created attachment 1746251 [details]
File: cpuinfo

Comment 4 Tadej Janež 2021-01-11 13:31:05 UTC

Created attachment 1746252 [details]
File: dso_list

Comment 5 Tadej Janež 2021-01-11 13:31:06 UTC

Created attachment 1746253 [details]
File: environ

Comment 6 Tadej Janež 2021-01-11 13:31:08 UTC

Created attachment 1746254 [details]
File: exploitable

Comment 7 Tadej Janež 2021-01-11 13:31:09 UTC

Created attachment 1746255 [details]
File: limits

Comment 8 Tadej Janež 2021-01-11 13:31:11 UTC

Created attachment 1746256 [details]
File: maps

Comment 9 Tadej Janež 2021-01-11 13:31:12 UTC

Created attachment 1746257 [details]
File: mountinfo

Comment 10 Tadej Janež 2021-01-11 13:31:14 UTC

Created attachment 1746258 [details]
File: open_fds

Comment 11 Tadej Janež 2021-01-11 13:31:16 UTC

Created attachment 1746259 [details]
File: proc_pid_status

Comment 12 Milan Crha 2021-01-11 13:45:26 UTC

Thanks for a bug report. I see from the backtrace that the crashing thread is accessing a CamelFolder, which is freeing in Thread 10. That should not happen for sure.

Comment 13 Milan Crha 2021-01-11 17:03:42 UTC

I tried to read the code, suspecting some ref/unref imbalance, but I didn't notice anything obvious. I'm afraid getting this fixed without a reproducer might be close to impossible, though finding the reproducer will be even harder, due to catching the part of the code which does the invalid unref and seeing it crashing at the same/similar place is not easy, due to the nature of the problem. That's only my guess.

Comment 14 Tadej Janež 2021-01-11 20:24:23 UTC

Milan, thanks for your quick response!

I've been on 3.38.1 for a while and haven't encountered this, so I would go with a hypothesis that it is related to a change between 3.38.1 and 3.38.3.

Looking at the diff at https://gitlab.gnome.org/GNOME/evolution/-/compare/3.38.1...3.38.3, there are 62 commits (some are just translation changes), so I don't know how easy it would be to pinpoint potential changes that broke this.

Comment 15 Milan Crha 2021-01-12 09:19:36 UTC

The change might be rather in the evolution-data-server, where the IMAP code resides (provided by libcamel). Nonetheless, I see odd reports like that yours for some time now, but I've never been able to find out which place does the ref/unref imbalance - if that's the reason at all. I do not believe this had been anything new, you've been only lucky not triggering the conditions.

Comment 16 Tadej Janež 2021-01-12 11:33:46 UTC

(In reply to Milan Crha from comment #15)
> The change might be rather in the evolution-data-server, where the IMAP code
> resides (provided by libcamel).

Oh, yes, I forgot about that. Indeed, the offending code is probably in evolution-data-server.

> Nonetheless, I see odd reports like that
> yours for some time now, but I've never been able to find out which place
> does the ref/unref imbalance - if that's the reason at all. I do not believe
> this had been anything new, you've been only lucky not triggering the
> conditions.

Curiously, Evolution just crashed again for me, but unfortunately, I don't have a usable back trace to show to you.

Somehow the coredump is not complete, I suspect due to ABRT's MaxCrashReportsSize (default to 5000MB) being exceeded.

Here are ABRT's logs:
[tadej@toronto ~]$ sudo journalctl -b -u abrt*.service
-- Logs begin at Sun 2020-11-29 21:15:41 CET, end at Tue 2021-01-12 12:10:09 CET. --
Jan 11 12:31:18 toronto systemd[1]: Starting ABRT Automated Bug Reporting Tool...
Jan 11 12:31:19 toronto systemd[1]: Started ABRT Automated Bug Reporting Tool.
Jan 11 12:31:19 toronto systemd[1]: Started Creates ABRT problems from coredumpctl messages.
Jan 11 12:31:19 toronto systemd[1]: Started ABRT kernel log watcher.
Jan 11 12:31:19 toronto systemd[1]: Condition check resulted in Harvest vmcores for ABRT being skipped.
Jan 11 12:31:19 toronto systemd[1]: Started ABRT Xorg log watcher.
Jan 11 12:31:20 toronto abrt-dump-journal-oops[1326]: abrt-dump-journal-oops: Found oopses: 2
Jan 11 12:31:20 toronto abrt-dump-journal-oops[1326]: abrt-dump-journal-oops: Creating problem directories
Jan 11 12:31:20 toronto abrtd[1299]: Size of '/var/spool/abrt' >= 5000 MB (MaxCrashReportsSize), deleting old directory 'ccpp-2021-01-11-12:25:01.509800-6701'
Jan 11 12:31:20 toronto abrt-notification[3612]: System encountered a non-fatal error in kvasprintf()
Jan 11 12:31:21 toronto abrt-server[4327]: Deleting problem directory oops-2021-01-11-12:31:20-1326-1 (dup of oops-2021-01-11-12:31:20-1326-0)
Jan 11 12:31:21 toronto abrt-notification[4334]: System encountered a non-fatal error in kvasprintf()
Jan 11 12:31:22 toronto abrt-dump-journal-oops[1326]: Reported 2 kernel oopses to Abrt
Jan 11 14:17:25 toronto abrtd[1299]: Size of '/var/spool/abrt' >= 5000 MB (MaxCrashReportsSize), deleting old directory 'oops-2021-01-11-12:31:20-1326-0'
Jan 11 14:17:26 toronto abrt-notification[52187]: Process 6392 (evolution) crashed in camel_imapx_folder_set_mailbox()
Jan 12 12:04:50 toronto abrt-dump-journal-core[1325]: Failed to obtain all required information from journald
Jan 12 12:05:00 toronto abrtd[1299]: Size of '/var/spool/abrt' >= 5000 MB (MaxCrashReportsSize), deleting old directory 'ccpp-2021-01-11-14:17:23.577377-6392'
Jan 12 12:05:00 toronto abrt-server[97798]: Error: No segments found in coredump './coredump'
Jan 12 12:05:01 toronto abrt-server[97798]: Can't open file 'core_backtrace' for reading: No such file or directory
Jan 12 12:05:01 toronto abrt-notification[97847]: Process 53703 (evolution) crashed in ??()

Using coredumpctl revealed a bit more information (I'll attach the full output separately):
... truncated ...
Reading symbols from /usr/bin/evolution...
Reading symbols from /usr/lib/debug/usr/bin/evolution-3.38.3-1.fc33.x86_64.debug...
BFD: warning: /var/tmp/coredump-O1UJGW is truncated: expected core file size >= 2743730176, found: 2147483648
... truncated ...

I'm guessing it doesn't make sense to try to rescue this coredump, but I'll bump ABRT's MaxCrashReportsSize to see if that helps save the whole coredump if/when it happens again.

Comment 17 Tadej Janež 2021-01-12 11:37:20 UTC

Created attachment 1746608 [details]
Output of coredumpctl debug <pid> for an incomplete coredump

Comment 18 Milan Crha 2021-01-12 13:35:29 UTC

Sadly the coredumpctl output is full of unresolved symbols (those question marks), thus unusable as well. I usually do not need the coredump, the only useful thing is the backtrace for me. I'm surprised 5GB is not enough for the ABRT, because 5GB is a quite a lot.

Maybe you are able to reproduce this somehow. Could you install debuginfo packages, like this:

$ sudo dnf install glib2-debuginfo gtk3-debuginfo evolution-data-server-debuginfo evolution-debuginfo --enablerepo=updates-debuginfo

Make sure the installed version matches precisely the version of the binary packages. Eventually use `dnf update...`. Then run Evoltuion under valgrind for some time, please? The command would be like this:

$ export GIGACAGE_ENABLED=0
$ G_SLICE=always-malloc valgrind --show-leak-kinds=definite --num-callers=30 --leak-check=no --aspace-minaddr=0x100000000 evolution &>log.txt

This will make evolution run very slowly, due to all the memory checking. It can eventually reveal some memory issues. The valgrind can report about them, instead of crashing the application, thus even it'll not crash the log.txt file can contain some relevant information. I do not know whether it'll help, though, and I know it's painful to run under valgrind, but if you could keep it running over the lunch or something, then maybe it'll help.

By the way, I guess your usual work, before evo crashes, is mainly in the Mail, read and replying to the emails, right? Something from what you do triggers the conditions, thus I'm trying to limit the places where this can happen and maybe a little bit narrow the steps to reproduce the crash.

Comment 19 Tadej Janež 2021-01-13 09:25:01 UTC

(In reply to Milan Crha from comment #18)
> Sadly the coredumpctl output is full of unresolved symbols (those question
> marks), thus unusable as well.

Yeah, I thought so.

> I'm surprised 5GB is not enough for the ABRT, because 5GB is a quite a lot.

He he, I guess my Evolution instance needs lots of memory, hence the large size of the core dumps :).

> 
> Maybe you are able to reproduce this somehow. Could you install debuginfo
> packages, like this:
> 
>    $ sudo dnf install glib2-debuginfo gtk3-debuginfo
> evolution-data-server-debuginfo evolution-debuginfo
> --enablerepo=updates-debuginfo
> 
> Make sure the installed version matches precisely the version of the binary
> packages. Eventually use `dnf update...`. Then run Evoltuion under valgrind
> for some time, please? The command would be like this:
> 
>    $ export GIGACAGE_ENABLED=0
>    $ G_SLICE=always-malloc valgrind --show-leak-kinds=definite
> --num-callers=30 --leak-check=no --aspace-minaddr=0x100000000 evolution
> &>log.txt
> 
> This will make evolution run very slowly, due to all the memory checking. It
> can eventually reveal some memory issues. The valgrind can report about
> them, instead of crashing the application, thus even it'll not crash the
> log.txt file can contain some relevant information. I do not know whether
> it'll help, though, and I know it's painful to run under valgrind, but if
> you could keep it running over the lunch or something, then maybe it'll help.

Hmm... I tried running it under Valgrind and it crashed before it even opened...

I'll attach the log. Can you figure out what is going on?

> By the way, I guess your usual work, before evo crashes, is mainly in the
> Mail, read and replying to the emails, right? Something from what you do
> triggers the conditions, thus I'm trying to limit the places where this can
> happen and maybe a little bit narrow the steps to reproduce the crash.

Indeed, I basically only use the Mail part. The only other things I have is I sync Google Contacts and Google Calendars.

Comment 20 Tadej Janež 2021-01-13 09:28:03 UTC

Created attachment 1746967 [details]
valgrind-2021-01-13T10:13:33+01:00.txt

Output of:

G_SLICE=always-malloc valgrind --show-leak-kinds=definite --num-callers=30 --leak-check=no --aspace-minaddr=0x100000000 evolution &>valgrind-$(date --iso-8601=seconds).txt

Comment 21 Tadej Janež 2021-01-13 11:58:47 UTC

Created attachment 1747036 [details]
Output of coredumpctl debug <pid> for a complete coredump

Milan, I just got another crash, but this time with a complete coredump.

AFAICT, I have all the debuginfo packages installed and "coredumpctl debug" drops me at a GDB prompt. I've attached its output.

Let me know what GDB commands to run to extract more info.

Comment 22 Milan Crha 2021-01-13 13:44:04 UTC

> I'll attach the log. Can you figure out what is going on?

I see it comes from WebkitGTK code, but I'd expect it doesn't make so when it also claims:


   Warning: disabling gigacage because GIGACAGE_ENABLED=0!

few lines above.

@mcatanza , I see this (see attachment 1746967 [details]) problem reported by other users too, but I do not see it here. Do you have any idea how to fix it, thus one can run evolution under valgrind, please?

----------------------------------------------------------------

> Output of coredumpctl debug <pid> for a complete coredump

I guess the imapx_summary variable is NULL there. That can be another exhibition of the same problem, just striking in a different place.

Comment 23 Michael Catanzaro 2021-01-13 16:36:36 UTC

(In reply to Milan Crha from comment #22)
> > I'll attach the log. Can you figure out what is going on?
> 
> I see it comes from WebkitGTK code, but I'd expect it doesn't make so when
> it also claims:
> 
> 
>    Warning: disabling gigacage because GIGACAGE_ENABLED=0!
> 
> few lines above.

That's good. Gigacage is never going to work under valgrind since it requires 80+ GB of address space and valgrind needs to mirror all of that in resident memory.

I guess the problem here is:

==129718== Warning: unimplemented fcntl command: 1033

** (evolution:129718): WARNING **: 10:15:40.602: Failed to seal memfd: Invalid argument
==129718== Warning: unimplemented fcntl command: 1033

** (evolution:129718): WARNING **: 10:15:43.598: Failed to seal memfd: Invalid argument

** (evolution:129718): ERROR **: 10:15:43.598: Failed to write memfd

Maybe we have to add special support in valgrind for that? There are instructions at https://www.valgrind.org/docs/manual/dist.readme-missing.html, but it looks complicated.

Anyway, workaround is to just disable the sandbox with WEBKIT_FORCE_SANDBOX=0 when valgrinding. You also want to disable bmalloc with Malloc=1 when valgrinding WebKit, just as you want to disable the slice allocator with G_SLICE=always-malloc when valgrinding any glib application. Then --track-origins=yes helps to find sources of memory corruption. So try something like:

$ WEBKIT_FORCE_SANDBOX=0 GIGACAGE_ENABLED=0 G_SLICE=always-malloc Malloc=1 valgrind --track-origins=yes evolution

Comment 24 Milan Crha 2021-01-14 08:15:59 UTC

Does the memfd things come from the sanbox? I did not think of it. Good to know. And all the other things. Thanks for sharing those.

Comment 25 Tadej Janež 2021-01-14 09:39:49 UTC

(In reply to Michael Catanzaro from comment #23)
> 
> Anyway, workaround is to just disable the sandbox with
> WEBKIT_FORCE_SANDBOX=0 when valgrinding. You also want to disable bmalloc
> with Malloc=1 when valgrinding WebKit, just as you want to disable the slice
> allocator with G_SLICE=always-malloc when valgrinding any glib application.
> Then --track-origins=yes helps to find sources of memory corruption. So try
> something like:
> 
> $ WEBKIT_FORCE_SANDBOX=0 GIGACAGE_ENABLED=0 G_SLICE=always-malloc Malloc=1
> valgrind --track-origins=yes evolution

Michael, thanks for your help in debugging this.

I've left Evolution running through valgrind for a little more than 30 minutes. I'll attach the logs.

Comment 26 Tadej Janež 2021-01-14 09:41:11 UTC

Created attachment 1747340 [details]
valgrind-2021-01-14T09:55:31+01:00.txt

Output of:

WEBKIT_FORCE_SANDBOX=0 GIGACAGE_ENABLED=0 G_SLICE=always-malloc Malloc=1 valgrind --track-origins=yes evolution &>valgrind-$(date --iso-8601=seconds).txt

Comment 27 Tadej Janež 2021-01-14 09:52:04 UTC

Created attachment 1747341 [details]
valgrind-2021-01-14T09:55:31+01:00.txt


Output of:

WEBKIT_FORCE_SANDBOX=0 GIGACAGE_ENABLED=0 G_SLICE=always-malloc Malloc=1 valgrind --track-origins=yes evolution &>valgrind-$(date --iso-8601=seconds).txt

(updated to include logs after exiting Evolution)

Comment 28 Michael Catanzaro 2021-01-14 14:39:53 UTC

(In reply to Milan Crha from comment #24)
> Does the memfd things come from the sanbox? I did not think of it. Good to
> know. And all the other things. Thanks for sharing those.

I'm sure the sandbox isn't the *only* use of memfds, but it seems to be the only use that valgrind cannot handle. So yes.

Comment 29 Milan Crha 2021-01-15 10:29:55 UTC

Thanks. The only thing it shows is this thing from the WebKitGTK:

==168123== Thread 30 ReceiveQueue:
==168123== Syscall param sendmsg(msg.msg_iov[1]) points to uninitialised byte(s)
==168123==    at 0x8F5CE8D: __libc_sendmsg (sendmsg.c:28)
==168123==    by 0x8F5CE8D: sendmsg (sendmsg.c:25)
==168123==    by 0x5F77BB5: IPC::Connection::sendOutputMessage(IPC::UnixMessage&) (ConnectionUnix.cpp:519)
==168123==    .....

which I thought had been addressed some time ago. It doesn't show anything else. I guess the conditions for the crash were not met when running under valgrind. It can happen also due to the delays caused by all the memory checking, as some issues trigger only with certain thread interleaving and/or "proper" timing.

> (evolution:168123): evolution-util-CRITICAL **: 10:07:42.182: e_filter_option_add:
> assertion 'find_option (option, value) == NULL' failed
> 
> (evolution:168123): evolution-util-CRITICAL **: 10:07:42.192: e_filter_option_add:
> assertion 'find_option (option, value) == NULL' failed

The above two runtime warnings are suspicious. I do not think they are related to the crash, it's just that the code expects something, but it doesn't match. It looks like being related to filters, which means that possibly one of the filters.xml, searches.xml or vfolders.xml in ~/.config/evolution/mail contains one or more rules, which has set invalid value for the option filter (these are combo boxes, where the user can pick one value of many).

Comment 30 Tadej Janež 2021-01-15 11:39:00 UTC

(In reply to Milan Crha from comment #29)
> I guess the conditions for the crash were not met when running under
> valgrind. It can happen also due to the delays caused by all the memory
> checking, as some issues trigger only with certain thread interleaving
> and/or "proper" timing.

Yes, I understand. Running it under valgrid is also only an option when I'm away from the system because it is too slow for normal daily use.

> > (evolution:168123): evolution-util-CRITICAL **: 10:07:42.182: e_filter_option_add:
> > assertion 'find_option (option, value) == NULL' failed
> > 
> > (evolution:168123): evolution-util-CRITICAL **: 10:07:42.192: e_filter_option_add:
> > assertion 'find_option (option, value) == NULL' failed
> 
> The above two runtime warnings are suspicious. I do not think they are
> related to the crash, it's just that the code expects something, but it
> doesn't match. It looks like being related to filters, which means that
> possibly one of the filters.xml, searches.xml or vfolders.xml in
> ~/.config/evolution/mail contains one or more rules, which has set invalid
> value for the option filter (these are combo boxes, where the user can pick
> one value of many).

Huh, I admin I haven't touched these in a long long while.

Filters are empty because I do all my email filtering server side (Sieve).

And I took this opportunity to delete all search folders and saved searches since I don't use these 2 features anymore.

Comment 31 Tadej Janež 2021-01-15 11:40:37 UTC

As far as the crashing frequency goes, it is at least once a day after the upgrade:

$ coredumpctl list | grep evolution
Mon 2021-01-11 14:17:23 CET    6392  1000  1000  11 present   /usr/bin/evolution
Tue 2021-01-12 12:04:57 CET   53703  1000  1000  11 truncated /usr/bin/evolution
Wed 2021-01-13 10:02:19 CET  101453  1000  1000  11 truncated /usr/bin/evolution
Wed 2021-01-13 11:33:36 CET  131872  1000  1000  11 present   /usr/bin/evolution
Wed 2021-01-13 14:19:33 CET  140450  1000  1000  11 present   /usr/bin/evolution
Thu 2021-01-14 12:50:32 CET  171146  1000  1000  11 present   /usr/bin/evolution
Fri 2021-01-15 11:13:05 CET  179259  1000  1000  11 present   /usr/bin/evolution

Comment 32 Michael Catanzaro 2021-01-15 17:17:36 UTC

(In reply to Milan Crha from comment #29)
> Thanks. The only thing it shows is this thing from the WebKitGTK:
> 
> ==168123== Thread 30 ReceiveQueue:
> ==168123== Syscall param sendmsg(msg.msg_iov[1]) points to uninitialised
> byte(s)
> ==168123==    at 0x8F5CE8D: __libc_sendmsg (sendmsg.c:28)
> ==168123==    by 0x8F5CE8D: sendmsg (sendmsg.c:25)
> ==168123==    by 0x5F77BB5:
> IPC::Connection::sendOutputMessage(IPC::UnixMessage&)
> (ConnectionUnix.cpp:519)
> ==168123==    .....
> 
> which I thought had been addressed some time ago.

That's https://bugs.webkit.org/show_bug.cgi?id=146729#c27. Your help was invaluable there... but sadly something else is still wrong. :/

It's surely not related to this bug.

Comment 33 Milan Crha 2021-01-18 16:40:07 UTC

I've a patch for glib2, which can add some debugging functions related to ref/unref, but they provide the output on exit, which is not when the application crashes. I can tweak it a bit. It may slow down things too, but it'll be more targeted. Can I try to provide you a test build of glib2 and evolution-data-server with it, please?

Comment 34 Tadej Janež 2021-01-19 11:14:14 UTC

(In reply to Milan Crha from comment #33)
> I've a patch for glib2, which can add some debugging functions related to
> ref/unref, but they provide the output on exit, which is not when the
> application crashes. I can tweak it a bit. It may slow down things too, but
> it'll be more targeted. Can I try to provide you a test build of glib2 and
> evolution-data-server with it, please?

That's sounds like a more reasonable approach to try to track this down than running Evolution with valgrind.

I'll be to run this custom glib2 and evolution-data-server build.

And yes, perhaps try to tweak so it would also provide output on an app crash.

Comment 35 Milan Crha 2021-01-19 16:56:29 UTC

Thanks, I managed to create a COPR repo with the packages. I do not know whether it's a good idea to enable it, because it replaces the glib2 package (which is core for GNOME/gtk+ applications), thus I'd rather suggest to use only the packages from there:
https://copr.fedorainfracloud.org/coprs/mcrha/imapx-debug/packages/
Even though there might not be any difference in it.

Let's start with prerequisites, to make things simpler:

a) I suppose you use the RPM based Evolution, that is, the Help->About in it says "3.38.3 (3.38.3-1.fc33)". If not, remove any flatpak/Flathub variants first, please.

b) replace the start binary, to enable logging even when Evolution is run from a terminal.
b1) my PATH contains ~/.local/bin (as it's in the .bashrc), thus I created that directory and
    made in it a file named "evolution" (quotes for clarity only) with this content:

#!/bin/bash

FILESUFFIX=`date +%Y%m%d-%H%M%S`

/usr/bin/evolution "$@" &>$HOME/evo-$FILESUFFIX.log

    Then I mark it as executable:
    $ chmod a+x ~/.local/bin/evolution

b2) test it works, by running Evolution from the desktop shortcut as before. There should be a new evo-DATE-TIME.log file in your home directory.

c) download the built packages - make sure you install also the debuginfo + debugsource packages, thus the backtraces are usable. I'd do it as this from a terminal:
   $ mkdir /tmp/pkgs
   $ cd /tmp/pkgs
   $ wget https://copr-be.cloud.fedoraproject.org/results/mcrha/imapx-debug/fedora-33-x86_64/01888076-glib2/glib2-2.66.4-1.imapxdebug.1.fc33.x86_64.rpm \
       https://copr-be.cloud.fedoraproject.org/results/mcrha/imapx-debug/fedora-33-x86_64/01888076-glib2/glib2-debuginfo-2.66.4-1.imapxdebug.1.fc33.x86_64.rpm \
       https://copr-be.cloud.fedoraproject.org/results/mcrha/imapx-debug/fedora-33-x86_64/01888076-glib2/glib2-debugsource-2.66.4-1.imapxdebug.1.fc33.x86_64.rpm \
       https://copr-be.cloud.fedoraproject.org/results/mcrha/imapx-debug/fedora-33-x86_64/01888092-evolution-data-server/evolution-data-server-3.38.3-1.imapxdebug.1.fc33.x86_64.rpm \
       https://copr-be.cloud.fedoraproject.org/results/mcrha/imapx-debug/fedora-33-x86_64/01888092-evolution-data-server/evolution-data-server-debuginfo-3.38.3-1.imapxdebug.1.fc33.x86_64.rpm \
       https://copr-be.cloud.fedoraproject.org/results/mcrha/imapx-debug/fedora-33-x86_64/01888092-evolution-data-server/evolution-data-server-debugsource-3.38.3-1.imapxdebug.1.fc33.x86_64.rpm \
       https://copr-be.cloud.fedoraproject.org/results/mcrha/imapx-debug/fedora-33-x86_64/01888092-evolution-data-server/evolution-data-server-langpacks-3.38.3-1.imapxdebug.1.fc33.noarch.rpm

   In case you've installed more packages (-devel, -doc, ...) download them as well. They are stored in subdirectories here:
   https://copr-be.cloud.fedoraproject.org/results/mcrha/imapx-debug/fedora-33-x86_64/

d) Install the downloaded packages - still in /tmp/pkgs run:

   $ sudo dnf update ./*.rpm
   $ sudo dnf install ./*debuginfo*.rpm ./*debugsource*.rpm

e) install debuginfo & debugsource packages for the evolution itself too, thus the backtraces from that code are usable:

   $ sudo dnf install evolution-debuginfo --enablerepo=updates-debuginfo

   In case you've it installed already, but of an older version, then use:

   $ sudo dnf update evolution-debuginfo --enablerepo=updates-debuginfo

f) make sure the packages are updated as expected, that is, the versions of the respective packages match:

   $ rpm -qa | egrep "glib2|evolution" | sort

g) with all of this prepared you might be ready to go. Simply start Evolution as usual. It'll be slightly slower, but should be usable (it is here). It can use significantly more memory, due to remembering all the backtraces where the object had been ref-ed and unref-ed.

Once Evolution will crash for you, save the latest log from your home directory. The gdb may show what folder it was about, whole memory address you can look for in the log (possibly at the very end of the log). The parts between:

 ------ backtraces for MEMORYADDRESS begin -------
 ------ backtraces for MEMORYADDRESS end -------

are those I'm interested in. I do not want a whole log, it tracks ref/unref calls for every CamelIMAPXFolder object created during the life time of the application, which can be eventually a lot, when there is used a real Trash and a real Junk folder for the IAMPx account and none of the affected folders are part of any search folder.

For example your coredumpctl output at comment #21 has this as the end:

   #0  0x00007fdaf19b4044 in camel_imapx_folder_set_mailbox (folder=0x55b90788a6c0, mailbox=0x55b9004bf660)
       at /usr/src/debug/evolution-data-server-3.38.3-1.fc33.x86_64/src/camel/providers/imapx/camel-imapx-folder.c:1371

The folder=0x55b90788a6c0 contains the address, which is most likely the culprit here, thus the MEMORYADDRESS above would be 0x55b90788a6c0 . It can be in the new log multiple times, it's when the memory is reused, thus the block nearest the end of the log is the one to look on.

If anything goes wrong when you've it installed you can always:

   $ sudo dnf downgrade glib2 evolution-data-server

and remove ~/.local/bin/evolution file.

Comment 36 Milan Crha 2021-01-21 07:44:19 UTC

I just added an evolution build to the COPR with commit [1], which helps significantly when changing folders in Evolution, both performance wise and the log size wise. Install it please.

The packages are here:
https://download.copr.fedorainfracloud.org/results/mcrha/imapx-debug/fedora-33-x86_64/01890387-evolution/

[1] https://gitlab.gnome.org/GNOME/evolution/-/commit/fcbab20f85db1228e0c9b6c222663d3dae5d5928

Comment 37 Tadej Janež 2021-01-21 12:04:39 UTC

Milan, thanks for providing these detailed instructions!

(In reply to Milan Crha from comment #35)
> 
> Let's start with prerequisites, to make things simpler:
> 
> b) replace the start binary, to enable logging even when Evolution is run
> from a terminal.
> b1) my PATH contains ~/.local/bin (as it's in the .bashrc), thus I created
> that directory and
>     made in it a file named "evolution" (quotes for clarity only) with this
> content:
> 
> #!/bin/bash
> 
> FILESUFFIX=`date +%Y%m%d-%H%M%S`
> 
> /usr/bin/evolution "$@" &>$HOME/evo-$FILESUFFIX.log
> 
>     Then I mark it as executable:
>     $ chmod a+x ~/.local/bin/evolution

Note that ~/local/bin is listed after /usr/bin in PATH (which I think is a standard security practice to prevent a user's programs from replacing system binaries), so I installed your wrapper Bash script in /usr/local/bin.


> b2) test it works, by running Evolution from the desktop shortcut as before.
> There should be a new evo-DATE-TIME.log file in your home directory.
> 
> c) download the built packages - make sure you install also the debuginfo +
> debugsource packages, thus the backtraces are usable. I'd do it as this from
> a terminal:

I did these steps a little differently:

$ sudo dnf copr enable mcrha/imapx-debug
[sudo] password for tadej: 
Enabling a Copr repository. Please note that this repository is not part
of the main distribution, and quality may vary.

The Fedora Project does not exercise any power over the contents of
this repository beyond the rules outlined in the Copr FAQ at
<https://docs.pagure.org/copr.copr/user_documentation.html#what-i-can-build-in-copr>,
and packages are not held to any quality or security level.

Please do not file bug reports about these packages in Fedora
Bugzilla. In case of problems, contact the owner of this repository.

Do you really want to enable copr.fedorainfracloud.org/mcrha/imapx-debug? [y/N]: y
Repository successfully enabled.

$ sudo dnf --disablerepo=* --enablerepo=copr:copr.fedorainfracloud.org:mcrha:imapx-debug update
Last metadata expiration check: 0:00:08 ago on Thu 21 Jan 2021 12:59:32 PM CET.
Dependencies resolved.
=======================================================================================================================================
 Package                             Arch     Version                         Repository                                          Size
=======================================================================================================================================
Upgrading:
 evolution                           x86_64   3.38.3-1.imapxdebug.1.fc33      copr:copr.fedorainfracloud.org:mcrha:imapx-debug   3.7 M
 evolution-data-server               x86_64   3.38.3-1.imapxdebug.1.fc33      copr:copr.fedorainfracloud.org:mcrha:imapx-debug   2.3 M
 evolution-data-server-debuginfo     x86_64   3.38.3-1.imapxdebug.1.fc33      copr:copr.fedorainfracloud.org:mcrha:imapx-debug   6.0 M
 evolution-data-server-debugsource   x86_64   3.38.3-1.imapxdebug.1.fc33      copr:copr.fedorainfracloud.org:mcrha:imapx-debug   1.9 M
 evolution-data-server-langpacks     noarch   3.38.3-1.imapxdebug.1.fc33      copr:copr.fedorainfracloud.org:mcrha:imapx-debug   1.4 M
 evolution-debuginfo                 x86_64   3.38.3-1.imapxdebug.1.fc33      copr:copr.fedorainfracloud.org:mcrha:imapx-debug   8.8 M
 evolution-debugsource               x86_64   3.38.3-1.imapxdebug.1.fc33      copr:copr.fedorainfracloud.org:mcrha:imapx-debug   2.2 M
 evolution-langpacks                 noarch   3.38.3-1.imapxdebug.1.fc33      copr:copr.fedorainfracloud.org:mcrha:imapx-debug   5.8 M
 glib2                               x86_64   2.66.4-1.imapxdebug.1.fc33      copr:copr.fedorainfracloud.org:mcrha:imapx-debug   2.7 M
 glib2-debuginfo                     x86_64   2.66.4-1.imapxdebug.1.fc33      copr:copr.fedorainfracloud.org:mcrha:imapx-debug   3.8 M
 glib2-debugsource                   x86_64   2.66.4-1.imapxdebug.1.fc33      copr:copr.fedorainfracloud.org:mcrha:imapx-debug   2.4 M
 glib2-devel                         x86_64   2.66.4-1.imapxdebug.1.fc33      copr:copr.fedorainfracloud.org:mcrha:imapx-debug   469 k

Transaction Summary
=======================================================================================================================================
Upgrade  12 Packages

Total download size: 41 M
Is this ok [y/N]: y

... trimmed ...

> 
> g) with all of this prepared you might be ready to go. Simply start
> Evolution as usual. It'll be slightly slower, but should be usable (it is
> here). It can use significantly more memory, due to remembering all the
> backtraces where the object had been ref-ed and unref-ed.
> 
> Once Evolution will crash for you, save the latest log from your home
> directory. The gdb may show what folder it was about, whole memory address
> you can look for in the log (possibly at the very end of the log). The parts
> between:
> 
>  ------ backtraces for MEMORYADDRESS begin -------
>  ------ backtraces for MEMORYADDRESS end -------
> 
> are those I'm interested in. I do not want a whole log, it tracks ref/unref
> calls for every CamelIMAPXFolder object created during the life time of the
> application, which can be eventually a lot, when there is used a real Trash
> and a real Junk folder for the IAMPx account and none of the affected
> folders are part of any search folder.
> 
> For example your coredumpctl output at comment #21 has this as the end:
> 
>    #0  0x00007fdaf19b4044 in camel_imapx_folder_set_mailbox
> (folder=0x55b90788a6c0, mailbox=0x55b9004bf660)
>        at
> /usr/src/debug/evolution-data-server-3.38.3-1.fc33.x86_64/src/camel/
> providers/imapx/camel-imapx-folder.c:1371
> 
> The folder=0x55b90788a6c0 contains the address, which is most likely the
> culprit here, thus the MEMORYADDRESS above would be 0x55b90788a6c0 . It can
> be in the new log multiple times, it's when the memory is reused, thus the
> block nearest the end of the log is the one to look on.

Ok, I'll try to run Evolution like this and report back when I encounter a crash.

Comment 38 Milan Crha 2021-01-21 13:04:44 UTC

(In reply to Tadej Janež from comment #37)
> Note that ~/local/bin is listed after /usr/bin in PATH (which I think is a
> standard security practice to prevent a user's programs from replacing
> system binaries), so I installed your wrapper Bash script in /usr/local/bin.

I saw this on a fresh user created in Fedora 33. Some things may changed there for some reason. It doesn't matter for us, as long as you can make it running.

> I did these steps a little differently:

That's okay, I chose a way to not adding the COPR. You can pick a way which works the best for you.

> Ok, I'll try to run Evolution like this and report back when I encounter a
> crash.

Thanks. I hope it'll be possible to reproduce it with the added checks and all of that.

Comment 39 Tadej Janež 2021-01-25 08:28:53 UTC

(In reply to Milan Crha from comment #38)
> 
> Thanks. I hope it'll be possible to reproduce it with the added checks and
> all of that.

I've been running Evolution as normally for the past 4 days and I haven't gotten a single crash.

To be fair, the crashes were less frequent even before switching to the custom glib2, evolution-data-server and evolution packages from your Copr repo.

As you can see, it only happened once during the week of Jan 18 - Jan 24:
$ coredumpctl list | grep evolution
Mon 2021-01-11 14:17:23 CET    6392  1000  1000  11 missing   /usr/bin/evolution
Tue 2021-01-12 12:04:57 CET   53703  1000  1000  11 missing   /usr/bin/evolution
Wed 2021-01-13 10:02:19 CET  101453  1000  1000  11 missing   /usr/bin/evolution
Wed 2021-01-13 11:33:36 CET  131872  1000  1000  11 missing   /usr/bin/evolution
Wed 2021-01-13 14:19:33 CET  140450  1000  1000  11 missing   /usr/bin/evolution
Thu 2021-01-14 12:50:32 CET  171146  1000  1000  11 missing   /usr/bin/evolution
Fri 2021-01-15 11:13:05 CET  179259  1000  1000  11 missing   /usr/bin/evolution
Fri 2021-01-15 13:38:16 CET  210832  1000  1000  11 missing   /usr/bin/evolution
Wed 2021-01-20 14:32:31 CET  217119  1000  1000  11 missing   /usr/bin/evolution

(whereas it as a lot more frequent in the week from Jan 11 - Jan 17)

Nonetheless, Evolution has been logging the backtraces steadily. These are the log file sizes:
$ ls -alh
total 2.3G
drwxr-xr-x. 2 tadej tadej 4.0K Jan 24 19:52 .
drwxr-xr-x. 3 tadej tadej 4.0K Jan 21 12:50 ..
-rw-r--r--. 1 tadej tadej  887 Jan 21 13:00 evo-20210121-125510.log
-rw-r--r--. 1 tadej tadej 1.8G Jan 24 15:13 evo-20210121-130504.log
-rw-r--r--. 1 tadej tadej 218M Jan 24 19:28 evo-20210124-153616.log
-rw-r--r--. 1 tadej tadej 255M Jan 25 09:20 evo-20210124-195223.log

Could we extract anything useful from the backtraces even if Evolution doesn't crash?

Comment 40 Milan Crha 2021-01-25 10:07:38 UTC

> Could we extract anything useful from the backtraces even if Evolution doesn't crash?

I'm afraid not. Trying to analyse 1.8GB of backtraces is unbearable. Even the smaller logs are useless without the crash, which helps to limit where to look.

The thing is that the folders are created/freed/ref-ed/unref-ed a lot of times, from various places. Trying to guess which one does a wrong thing in that is close to impossible. I tried that by code reading, several times, but I did not spot anything with that.

If you do not mind and you can use Evolution even with these changes, I'd like to ask you to keep it using. Maybe it'll strike before a new version of either Evolution or glib2 is released.

Comment 41 Tadej Janež 2021-01-25 13:14:29 UTC

(In reply to Milan Crha from comment #40)
> > Could we extract anything useful from the backtraces even if Evolution doesn't crash?
> 
> I'm afraid not. Trying to analyse 1.8GB of backtraces is unbearable. Even
> the smaller logs are useless without the crash, which helps to limit where
> to look.
> 
> The thing is that the folders are created/freed/ref-ed/unref-ed a lot of
> times, from various places. Trying to guess which one does a wrong thing in
> that is close to impossible. I tried that by code reading, several times,
> but I did not spot anything with that.

Ok, understood.

> If you do not mind and you can use Evolution even with these changes, I'd
> like to ask you to keep it using. Maybe it'll strike before a new version of
> either Evolution or glib2 is released.

Yes, I'm happy to run this as I can practically use Evolution as normal.

Comment 42 Tadej Janež 2021-01-25 13:20:10 UTC

So, Evolution actually crashed while I was away today.

Unfortunately, coredumpctl reports it is truncated:

Mon 2021-01-25 11:02:33 CET    3351  1000  1000  11 truncated /usr/bin/evolution

I'll attach the coredumpctl debug output.

I have all the backtraces in the log file. But I'm guessing it is not usable without a full coredump to view the "folder=0x55b90788a6c0" part.

I'm wondering why the core dump is truncated and how to avoid that?

GNU gdb (GDB) Fedora 10.1-2.fc33

... trimmed...

Reading symbols from /usr/bin/evolution...
Reading symbols from /usr/lib/debug/usr/bin/evolution-3.38.3-1.imapxdebug.1.fc33.x86_64.debug...
BFD: warning: /var/tmp/coredump-ksdFfr is truncated: expected core file size >= 2664153088, found: 2147483648

... trimmed ...

Comment 43 Tadej Janež 2021-01-25 13:20:59 UTC

Created attachment 1750509 [details]
Output of coredumpctl debug 3351 that is truncated

Comment 44 Michael Catanzaro 2021-01-25 15:53:54 UTC

systemd-coredump truncates dumps larger than 2 GB by default. This is a bad default. I would change this to 4 GB or 8 GB in /etc/systemd/coredump.conf:

ProcessSizeMax=8G
ExternalSizeMax=8G

Then I'm not sure what is necessary for that change to become active, so I would reboot for safety.

Sadly there's no way to recover the truncated core dump. All you can do is hope for Evolution to crash again after making this change.

Comment 45 Tadej Janež 2021-01-25 16:32:08 UTC

Thanks for the tip, Michael!

I've set ProcessSizeMax and ExternalSizeMax to 8G and I hope Evolution will crash again.

Comment 46 Milan Crha 2021-01-25 18:35:33 UTC

I guess it needs so much memory for the core file (also) because of the backtaces being stored in the memory for the ref/unref debugging. You are right the truncated backtrace is not usable, it's full of question marks.

Nonetheless, the saved backtraces can be interesting, the last one or two blocks between " ------ backtraces for XXXXX begin -------" and " ------ backtraces for XXXXX end -------", may eventually cover the just freed folder - if it's this part, where the code crashed this time.

Comment 47 Tadej Janež 2021-01-27 20:50:16 UTC

Good news,

I've got 3 crashes in the last 2 days:

$ coredumpctl list | grep evolution

... trimmed ...

Tue 2021-01-26 11:51:32 CET   52305  1000  1000  11 present   /usr/bin/evolution
Tue 2021-01-26 16:47:41 CET   80768  1000  1000  11 present   /usr/bin/evolution
Wed 2021-01-27 14:00:13 CET  112946  1000  1000  11 present   /usr/bin/evolution

I'll try to extract and attach the backtraces.

Comment 48 Tadej Janež 2021-01-27 21:01:45 UTC

Created attachment 1751389 [details]
Output of "coredumpctl debug 52305"

Comment 49 Tadej Janež 2021-01-27 21:05:44 UTC

Created attachment 1751410 [details]
Backtraces for 0x7f8f440d7c20 corresponding to "coredumpctl debug 52305"

The last block of backtraces for address 0x7f8f440d7c20 for "coredumpctl debug 52305".

Comment 50 Tadej Janež 2021-01-27 21:28:56 UTC

Created attachment 1751417 [details]
Output of "coredumpctl debug 80768"

Comment 51 Tadej Janež 2021-01-27 21:43:49 UTC

Created attachment 1751431 [details]
Backtraces for 0x560a04460520 corresponding to "coredumpctl debug 80768"

Comment 52 Milan Crha 2021-01-28 09:10:33 UTC

Thanks a lot. Unfortunately, I'm such an idiot, the backtraces are printed too late, they do not cover the crash. I suppose you did not find them at the very end of the log, which can, partially, explain it.

Do you still have those coredump files and a gdb access to them? The backtraces are stored there, but it's pretty hard to get to them (gdb doesn't want to find imapx_debug_ref_unref_dump_backtraces_for_object() function here for me). I'll create a new build, which will make this work properly/better.

I'm so sorry I didn't notice this earlier.

Comment 53 Tadej Janež 2021-01-28 09:30:57 UTC

(In reply to Milan Crha from comment #52)
> Thanks a lot. Unfortunately, I'm such an idiot, the backtraces are printed
> too late, they do not cover the crash. I suppose you did not find them at
> the very end of the log, which can, partially, explain it.

He he, no worries.

> Do you still have those coredump files and a gdb access to them?

Yes, I do.

> The
> backtraces are stored there, but it's pretty hard to get to them (gdb
> doesn't want to find imapx_debug_ref_unref_dump_backtraces_for_object()
> function here for me). I'll create a new build, which will make this work
> properly/better.

Ok, cool.

Will you be able to extract any useful info from existing core dumps and/or logged backtraces?

Comment 54 Milan Crha 2021-01-28 09:38:44 UTC

I thought you'd just call:

(gdb) p imapx_debug_ref_unref_dump_backtraces_for_object(folder)

from the coredump, but the function is hidden for my gdb version (maybe due to optimizations or something), thus I'm building a version of the eds, which will make it available, thus the next time it'll crash you can do just this in gdb and the out of it will be accurate. I'll let you know once I've the change tested and ready.

Comment 55 Milan Crha 2021-01-28 10:53:18 UTC

I disabled the prints of the backtraces of the "correct" calls, thus things should be slightly quicker and the stdout will be empty. Once it crashes again and you get to the coredump gdb prompt, then, supposing it'll be at the same place, just execute this gdb command:

    (gdb) call imapx_debug_ref_unref_dump_backtraces_for_object(folder)

that will generate a file in /tmp with the backtraces for that object. It tells what it did or didn't do after the function is done.

The updated COPR package is evolution-data-server-3.38.3-1.imapxdebug.2.fc33 (the .2 at the end). Make sure you update also the debuginfo/debugsource packages with it, to have the backtraces usable.

Thank you for your help with this.

Comment 56 Tadej Janež 2021-01-28 12:30:47 UTC

(In reply to Milan Crha from comment #55)
> I disabled the prints of the backtraces of the "correct" calls, thus things
> should be slightly quicker and the stdout will be empty. Once it crashes
> again and you get to the coredump gdb prompt, then, supposing it'll be at
> the same place, just execute this gdb command:
> 
>     (gdb) call imapx_debug_ref_unref_dump_backtraces_for_object(folder)
> 
> that will generate a file in /tmp with the backtraces for that object. It
> tells what it did or didn't do after the function is done.

Ok, great.

> The updated COPR package is evolution-data-server-3.38.3-1.imapxdebug.2.fc33
> (the .2 at the end). Make sure you update also the debuginfo/debugsource
> packages with it, to have the backtraces usable.

I think this should be sufficient?

$ sudo dnf --refresh --disablerepo=* --enablerepo=copr:copr.fedorainfracloud.org:mcrha:imapx-debug update
Copr repo for imapx-debug owned by mcrha                                                                                                                      5.3 kB/s | 3.3 kB     00:00    
Copr repo for imapx-debug owned by mcrha                                                                                                                      181 kB/s | 156 kB     00:00    
Dependencies resolved.
==============================================================================================================================================================================================
 Package                                           Architecture           Version                                      Repository                                                        Size
==============================================================================================================================================================================================
Upgrading:
 evolution-data-server                             x86_64                 3.38.3-1.imapxdebug.2.fc33                   copr:copr.fedorainfracloud.org:mcrha:imapx-debug                 2.3 M
 evolution-data-server-debuginfo                   x86_64                 3.38.3-1.imapxdebug.2.fc33                   copr:copr.fedorainfracloud.org:mcrha:imapx-debug                 6.0 M
 evolution-data-server-debugsource                 x86_64                 3.38.3-1.imapxdebug.2.fc33                   copr:copr.fedorainfracloud.org:mcrha:imapx-debug                 1.9 M
 evolution-data-server-langpacks                   noarch                 3.38.3-1.imapxdebug.2.fc33                   copr:copr.fedorainfracloud.org:mcrha:imapx-debug                 1.4 M

Transaction Summary
==============================================================================================================================================================================================
Upgrade  4 Packages

Total download size: 12 M
Is this ok [y/N]:


> Thank you for your help with this.

No problem, thank you for debugging this.

Comment 57 Milan Crha 2021-01-28 13:04:49 UTC

> I think this should be sufficient?

Yup, it looks correct. Thanks.

Comment 58 Tadej Janež 2021-02-01 10:11:29 UTC

Hey Milan,

I've got a core dump using evolution-data-server-3.38.3-1.imapxdebug.2.fc33.x86_64.

But it seems I'm unable to do "call imapx_debug_ref_unref_dump_backtraces_for_object(folder)":

$ coredumpctl debug 192088

... trimmed ...

Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/bin/evolution'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fe610811894 in camel_imapx_folder_set_mailbox (folder=0x7fe5e81baf40, mailbox=0x7fe5e61e7270)
    at /usr/src/debug/evolution-data-server-3.38.3-1.imapxdebug.2.fc33.x86_64/src/camel/providers/imapx/camel-imapx-folder.c:1593
1593		if (uidvalidity > 0 && uidvalidity != imapx_summary->validity)
[Current thread is 1 (Thread 0x7fe5e37fe640 (LWP 226724))]
(gdb) call imapx_debug_ref_unref_dump_backtraces_for_object(folder)
You can't do that without a process to debug.
(gdb)

Comment 59 Milan Crha 2021-02-01 10:35:26 UTC

Err, I've not been aware gdb cannot call functions when it has opened a core file, though it kind of makes sense. The old good bug-buddy could keep the process opened and one could attach to it and do anything there, but it's gone for years.

I guess the easiest will be to simply run evolution under gdb and once the GUI freezes check the terminal with it, because it can mean gdb stopped due to some issue. It can be this or another issue detected by gdb. I'd simply "continue" (or "c") command in there in case it's stopped for other reason. When it stops in this camel_imapx_folder_set_mailbox(), just call the function as before, which will work this time, because the process is still alive.

With steps:
a) open a terminal
b) execute: gdb evolution --ex r --ex "call call imapx_debug_ref_unref_dump_backtraces_for_object(folder)"

which will run evolution and will call the function, which can fail in case gdb stops in a different function, otherwise it'll print the backtrace to a file.

Comment 60 Tadej Janež 2021-02-01 12:33:31 UTC

(In reply to Milan Crha from comment #59)
> Err, I've not been aware gdb cannot call functions when it has opened a core
> file, though it kind of makes sense.

Yes, that is unfortunate...

> I guess the easiest will be to simply run evolution under gdb and once the
> GUI freezes check the terminal with it, because it can mean gdb stopped due
> to some issue. It can be this or another issue detected by gdb. I'd simply
> "continue" (or "c") command in there in case it's stopped for other reason.
> When it stops in this camel_imapx_folder_set_mailbox(), just call the
> function as before, which will work this time, because the process is still
> alive.
> 
> With steps:
> a) open a terminal
> b) execute: gdb evolution --ex r --ex "call call
> imapx_debug_ref_unref_dump_backtraces_for_object(folder)"
> 
> which will run evolution and will call the function, which can fail in case
> gdb stops in a different function, otherwise it'll print the backtrace to a
> file.

Ok, I've started Evolution via:

gdb /usr/bin/evolution --ex r --ex "call call imapx_debug_ref_unref_dump_backtraces_for_object(folder)"

and I hope the crash will occur again.

Comment 61 Tadej Janež 2021-02-02 09:26:54 UTC

Ok, the crash occured at the same point and the backtraces extraction was successful:

Thread 2087 "pool-evolution" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffe68aff640 (LWP 296668)]
0x00007fffdc154894 in camel_imapx_folder_set_mailbox (folder=0x7fffb80072f0, mailbox=0x7ffe1c30f9e0)
    at /usr/src/debug/evolution-data-server-3.38.3-1.imapxdebug.2.fc33.x86_64/src/camel/providers/imapx/camel-imapx-folder.c:1593
1593		if (uidvalidity > 0 && uidvalidity != imapx_summary->validity)
No symbol "call" in current context.
(gdb) call call imapx_debug_ref_unref_dump_backtraces_for_object(folder)
No symbol "call" in current context.
(gdb) call imapx_debug_ref_unref_dump_backtraces_for_object(folder)
imapx_debug_ref_unref_dump_backtraces_for_object: Backtraces saved to '/tmp/bt-0x7fffb80072f0.txt'
(gdb)

Comment 62 Tadej Janež 2021-02-02 09:28:24 UTC

Created attachment 1754288 [details]
Backtraces for 0x7fffb80072f0

/tmp/bt-0x7fffb80072f0.txt

Comment 63 Tadej Janež 2021-02-02 09:30:37 UTC

Milan, let me know if the data is good so I can restart Evolution (I still have gdb open in case you need something else).

Comment 64 Milan Crha 2021-02-02 09:38:52 UTC

Could you run also "t a a bt", just to see the state of the application, please? It should be quit the same as the initial backtrace, +/- the move to the other place while it had been creating the backtraces. I'm interested in threads with the folder pointer only, that is with the 0x7fffb80072f0 mentioned. After that you can safely quit gdb. It'll take me some time to pair the 70 backtraces together.

Comment 65 Tadej Janež 2021-02-02 10:13:54 UTC

Created attachment 1754320 [details]
Output of gdb "t a a bt" for crash in comment 61

(In reply to Milan Crha from comment #64)
> Could you run also "t a a bt", just to see the state of the application,
> please? It should be quit the same as the initial backtrace, +/- the move to
> the other place while it had been creating the backtraces. I'm interested in
> threads with the folder pointer only, that is with the 0x7fffb80072f0
> mentioned.

I've attached all the threads since it didn't seem all that much (300 lines of output).

Comment 66 Tadej Janež 2021-02-02 10:15:46 UTC

(In reply to Milan Crha from comment #64)
> After that you can safely quit gdb. It'll take me some time to
> pair the 70 backtraces together.

Ok, understood. Please, let me know when you figure it out.

Comment 67 Milan Crha 2021-02-02 10:43:26 UTC

Thanks a lot.

I think I see it. It's a timing issue. There is scheduled a synchronize of the local changes to the server, which intentionally doesn't hold its own reference to the folder, when the last reference to the folder is dropped. Until the scheduled synchronize is saved, the summary is saved to the disk, which can take some time, between which the thread execution can be switched to another thread, like to the main and then to the scheduled one, which added a reference to the folder, but the problem is that the folder is partly disposed, which invalidates some internal properties.

My change will be:
a) cancel the scheduled synchronize as the first thing in the dispose() of the CamelFolder, not as the last thing
b) use GWeakRef for this schedule, which would avoid the a) on its own, but better to have there both changes.

I'll let you know when I've a test build ready in the COPR. Thank you again for your help on this.

Comment 68 Tadej Janež 2021-02-02 10:48:21 UTC

Cool, great to see you figured it out!

Yeah, let me know when you have new builds ready and I can test.

Comment 69 Milan Crha 2021-02-02 11:12:48 UTC

There's built evolution-data-server-3.38.3-1.imapxdebug.4.fc33 (.4 at the end) now, which contains only the fix. There is no backtrace logging, thus it should be with no extra CPU usage and slowness.

Please, use it for few days, say till the end of this week, and if it'll not cause any harm I'll commit it for 3.38.4, which is to be released at the end of the next week. Thanks in advance.

Comment 70 Tadej Janež 2021-02-02 11:35:02 UTC

(In reply to Milan Crha from comment #69)
> There's built evolution-data-server-3.38.3-1.imapxdebug.4.fc33 (.4 at the
> end) now, which contains only the fix. There is no backtrace logging, thus
> it should be with no extra CPU usage and slowness.

That was quick! Thanks, Milan!

> Please, use it for few days, say till the end of this week, and if it'll not
> cause any harm I'll commit it for 3.38.4, which is to be released at the end
> of the next week. Thanks in advance.

Agreed. I'll report back at the beginning of the next week.

Comment 71 Milan Crha 2021-02-05 09:23:04 UTC

*** Bug 1925422 has been marked as a duplicate of this bug. ***

Comment 72 Tadej Janež 2021-02-08 09:30:57 UTC

Hi Milan,

(In reply to Tadej Janež from comment #70)
> 
> > Please, use it for few days, say till the end of this week, and if it'll not
> > cause any harm I'll commit it for 3.38.4, which is to be released at the end
> > of the next week. Thanks in advance.
> 
> Agreed. I'll report back at the beginning of the next week.

just a quick update: I've been running Evolution as normally and haven't had any crashes so far.

Comment 73 Milan Crha 2021-02-08 10:51:31 UTC

(In reply to Tadej Janež from comment #72)
> just a quick update: I've been running Evolution as normally and haven't had
> any crashes so far.

Thanks, that's a very good news. I'll commit the change to the sources.

Comment 74 Milan Crha 2021-02-08 11:08:27 UTC

Created commit [1] for 3.39.2+
Created commit [2] for 3.38.4+

The release is planned on this Friday, it'll get to the Fedora repositories shortly afterwards it.

Thank you a lot for your help with this.

[1] https://gitlab.gnome.org/GNOME/evolution-data-server/commit/9991f16b1
[2] https://gitlab.gnome.org/GNOME/evolution-data-server/commit/1f1017492

Comment 75 Tadej Janež 2021-02-09 21:32:25 UTC

Thank you Milan for debugging this and finally implementing a fix!

Comment 76 Fedora Update System 2021-02-12 12:55:58 UTC

FEDORA-2021-bbb2a0f065 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2021-bbb2a0f065

Comment 77 Fedora Update System 2021-02-13 01:39:17 UTC

FEDORA-2021-bbb2a0f065 has been pushed to the Fedora 33 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-bbb2a0f065`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-bbb2a0f065

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 78 Fedora Update System 2021-02-15 01:31:15 UTC

FEDORA-2021-bbb2a0f065 has been pushed to the Fedora 33 stable repository.
If problem still persists, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.