Bug 1249402 - [abrt] On The Web calendar crash on local cache load/save
Summary: [abrt] On The Web calendar crash on local cache load/save
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: evolution-data-server
Version: 23
Hardware: x86_64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Milan Crha
QA Contact: Fedora Extras Quality Assurance
URL: https://retrace.fedoraproject.org/faf...
Whiteboard: abrt_hash:ed9319c817a8d969be5c32a0da7...
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-08-02 14:06 UTC by Christian Stadelmann
Modified: 2015-11-19 07:41 UTC (History)
2 users (show)

Fixed In Version: evolution-data-server-3.18.3
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-19 07:41:24 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
File: backtrace (47.32 KB, text/plain)
2015-08-02 14:06 UTC, Christian Stadelmann
no flags Details
File: core_backtrace (4.47 KB, text/plain)
2015-08-02 14:06 UTC, Christian Stadelmann
no flags Details
File: dso_list (7.63 KB, text/plain)
2015-08-02 14:06 UTC, Christian Stadelmann
no flags Details
File: limits (1.29 KB, text/plain)
2015-08-02 14:06 UTC, Christian Stadelmann
no flags Details
File: maps (38.16 KB, text/plain)
2015-08-02 14:06 UTC, Christian Stadelmann
no flags Details
File: namespaces (85 bytes, text/plain)
2015-08-02 14:06 UTC, Christian Stadelmann
no flags Details
File: open_fds (608 bytes, text/plain)
2015-08-02 14:06 UTC, Christian Stadelmann
no flags Details
File: proc_pid_status (1.00 KB, text/plain)
2015-08-02 14:06 UTC, Christian Stadelmann
no flags Details
A script to start valgrind (418 bytes, application/x-shellscript)
2015-09-07 08:05 UTC, Christian Stadelmann
no flags Details
valgrind log generated by script from comment #12 (70.06 KB, text/plain)
2015-09-14 07:58 UTC, Christian Stadelmann
no flags Details
output of gdb attached to valgrind (10.56 KB, text/plain)
2015-09-18 16:34 UTC, Christian Stadelmann
no flags Details
gdb backtrace of SIGTRAP occurrences as requested in comment #23 (352.12 KB, text/plain)
2015-09-29 13:59 UTC, Christian Stadelmann
no flags Details
another valgrind log (1.88 KB, text/plain)
2015-10-11 14:32 UTC, Christian Stadelmann
no flags Details
some more valgrind logs on F23 / eds 3.18.1 (1.66 KB, application/x-xz)
2015-11-04 09:26 UTC, Christian Stadelmann
no flags Details


Links
System ID Private Priority Status Summary Last Updated
GNOME Bugzilla 702127 0 None None None Never

Description Christian Stadelmann 2015-08-02 14:06:06 UTC
Description of problem:
This happened right after boot + login while the network was not up (yet). I didn't notice any misbehaviour.

Version-Release number of selected component:
evolution-data-server-3.16.4-1.fc22

Additional info:
reporter:       libreport-2.6.2
backtrace_rating: 4
cmdline:        /usr/libexec/evolution-calendar-factory-subprocess --factory webcal --bus-name org.gnome.evolution.dataserver.Subprocess.Backend.Calendarx1865x2 --own-path /org/gnome/evolution/dataserver/Subprocess/Backend/Calendar/1865/2
crash_function: strlen
executable:     /usr/libexec/evolution-calendar-factory-subprocess
kernel:         4.1.3-201.fc22.x86_64
runlevel:       N 5
type:           CCpp

Truncated backtrace:
Thread no. 1 (7 frames)
 #0 strlen at ../sysdeps/x86_64/strlen.S:106
 #1 icalmemory_strdup_and_quote at /usr/src/debug/libical-1.0.1/src/libical/icalvalue.c:302
 #2 icalvalue_as_ical_string_r at /usr/src/debug/libical-1.0.1/src/libical/icalvalue.c:950
 #4 icalproperty_as_ical_string_r at /usr/src/debug/libical-1.0.1/src/libical/icalproperty.c:507
 #5 icalcomponent_as_ical_string_r at /usr/src/debug/libical-1.0.1/src/libical/icalcomponent.c:347
 #8 cal_backend_store_save_cache_now at e-cal-backend-store.c:350
 #9 cal_backend_store_save_cache_timeout_cb at e-cal-backend-store.c:386

Comment 1 Christian Stadelmann 2015-08-02 14:06:10 UTC
Created attachment 1058512 [details]
File: backtrace

Comment 2 Christian Stadelmann 2015-08-02 14:06:11 UTC
Created attachment 1058513 [details]
File: core_backtrace

Comment 3 Christian Stadelmann 2015-08-02 14:06:12 UTC
Created attachment 1058514 [details]
File: dso_list

Comment 4 Christian Stadelmann 2015-08-02 14:06:13 UTC
Created attachment 1058515 [details]
File: limits

Comment 5 Christian Stadelmann 2015-08-02 14:06:15 UTC
Created attachment 1058516 [details]
File: maps

Comment 6 Christian Stadelmann 2015-08-02 14:06:16 UTC
Created attachment 1058517 [details]
File: namespaces

Comment 7 Christian Stadelmann 2015-08-02 14:06:18 UTC
Created attachment 1058518 [details]
File: open_fds

Comment 8 Christian Stadelmann 2015-08-02 14:06:19 UTC
Created attachment 1058519 [details]
File: proc_pid_status

Comment 9 Milan Crha 2015-08-03 06:38:45 UTC
Thanks for a bug report. The backtrace shows a crash in one of your On The Web calendars, when it was saving changes to the local cache. It crashed somewhere in libical library, which usually happens when there's involved some memory corruption, like a use-after-free (some code wrote data into a memory which was already freed, effectively overwriting memory used for another data, or a similar issue).

Comment 10 Christian Stadelmann 2015-08-12 06:45:28 UTC
I ran into an issue today which behaved similarly (crash with segv on login): https://bugzilla.redhat.com/show_bug.cgi?id=1203600#c13 , could be a duplicate.

Anyways is there something to help debugging? If its a memory corruption issue, valgrind should be able to find it, right? So I could just rename the evolution-calendar-factory-subprocess binary, create a script with the original name and start valgrind. Will I break Fedora/Evolution with that? Any better ideas?

Comment 11 Milan Crha 2015-08-12 08:42:12 UTC
The subprocess is run multiple times, the one which crashed for you was a webcal, which is On The Web calendar. One of them causing the crash. Replacing the subprocess with a script which would run the original subprocess under valgrind will work, just make sure you'll pass all the arguments to the original binary. As you'll have the script and the original binary you can always revert the debugging by returning back the original binary. Your changes will be overwritten back with any evolution-data-server package updates/reinstalls. Make sure you'll run valgrind with G_SLICE=always-malloc exported, thus the GLib's GSlice will not make trouble. The command can look like:
   G_SLICE=always-malloc valgrind ...
also always log to a new file, thus the different subprocesses will not fight who writes the log file content.

One problem can be that the slowness caused by the all memory checking in valgrind will cause the network connection to be established before the calendar will try to update its content, which may or may not be a problem (in a meaning for the reproducibility). Also note that valgrind can catch some memory issues and it avoids the crash, while it'll still log about the issue.

Comment 12 Christian Stadelmann 2015-09-07 08:05:16 UTC
Created attachment 1070854 [details]
A script to start valgrind

Ok, so for future reference I did this:
1. I renamed /usr/libexec/evolution-calendar-factory-subprocess to /usr/libexec/evolution-calendar-factory-subprocess-original
2. created a new executable bash file /usr/libexec/evolution-calendar-factory-subprocess (attached)
3. Every time I log in I autostart this command in a terminal:

sleep 20s && gdb /usr/libexec/evolution-calendar-factory-subprocess-original -ex "target remote | /usr/lib64/valgrind/../../bin/vgdb --pid=`pidof -s memcheck-amd64-linux`"

4. I just need to press "c" + Enter in the terminal window.

Although I logged in and rebooted some times it didn't reoccur (yet).

Comment 13 Milan Crha 2015-09-07 09:52:00 UTC
Thanks for the update. I would skip the leak check, it only for memory which wasn't freed, it doesn't help with memory errors (as far as I know). I suppose you also looked into the log, to see whether valgrind catch the crash.

Comment 14 Christian Stadelmann 2015-09-07 15:53:05 UTC
Oh, thanks for the hint. Yes, I looked into the log. Only showed meta-information and things like

12,168 bytes in 169 blocks are possibly lost in loss record 3,143 of 3,168
+ Stacktrace

so far. I'll remove leak checking, that may give a speedup.
As far as I understood I would see if it crashed because gdb is attached to valgrind.

Comment 15 Milan Crha 2015-09-07 16:40:11 UTC
(In reply to Christian Stadelmann from comment #14)
> As far as I understood I would see if it crashed because gdb is attached to
> valgrind.

Not necessarily. Valgrind can, in certain cases, avoid the crash and log about the memory issue only. It depends on the actual memory issue.

The backtrace mentions cal_backend_store_save_cache_timeout_cb, search your logs for it, as it can be there as well.

Comment 16 Christian Stadelmann 2015-09-14 07:58:28 UTC
Created attachment 1073120 [details]
valgrind log generated by script from comment #12

I got the first useful log file yesterday, see attachment. I was unable to debug it though, so there is no detailed info.

Comment 17 Milan Crha 2015-09-14 12:27:05 UTC
Thanks for the update. It looks like the issue happens because your On The Web calendar has set many timezones. I do not see anything obvious after a brief reading of the libical code, but I'll try to investigate this further and let you know when I know more information.

Comment 18 Milan Crha 2015-09-15 07:58:31 UTC
I tried to reproduce it here, by creating a large .ics file (~4MB) with quite many events with different time zones, but it didn't strike here. Valgrind is also mute on it. I didn't do anything special, I just pointer evolution into the .ics file through an On The Web calendar and let it open it. I see the events in the UI, thus it was opened properly. Maybe the timezones are just a consequence. I'll keep the calendar set and enabled in my Evolution, even though it will not change.

Comment 19 Christian Stadelmann 2015-09-15 15:56:56 UTC
I don't think this is a time zone issue. I am in CEST (or CET in winter) and all calendars I use are in my timezone. I think (but am not sure) all events are in CET/CEST too.

Does evolution use the same evolution-calendar-factory-subprocess process when I open evolution and manually refresh the calendar? In this case I don't get a crash or any message from valgrind's log files.

The update interval for my calendars is set to multiple days, so it looks like evolution-calendar-factory-subprocess breaks every time I update a specific calendar automatically on login.

I am sending you my calendar configuration (web calendar sources from ~/.config/evolution/sources ) via email.

Comment 20 Milan Crha 2015-09-16 09:19:38 UTC
Thanks for the .source files. Can I add those calendars into my evolution, please? They all seem to be public, non-authenticated, but I rather ask. I'm also in CEST/CET time zone, I would be able to verify what time zones the calendars use.

Evolution keeps using the same subprocess (and backend) as long as the calendar is left opened.

The calendars update on open, then only after the Refresh time elapses. Does it mean that there is shown nothing wrong in the logs if you right-click the calendar in the Calendar view and pick Refresh there, but if you keep evolution (or the desktop session) running for a day, and the Refresh time comes, then it also shows the errors in the log? That seems to be important and I missed it.

Comment 21 Christian Stadelmann 2015-09-17 07:10:11 UTC
(In reply to Milan Crha from comment #20)
> Thanks for the .source files. Can I add those calendars into my evolution,
> please? They all seem to be public, non-authenticated, but I rather ask. I'm
> also in CEST/CET time zone, I would be able to verify what time zones the
> calendars use.
Yes, that is ok. And thanks for asking and caring about privacy!

> The calendars update on open, then only after the Refresh time elapses.
Does that mean they update on every login (since they are started with gnome-shell)? This is not clear from the UI, I'd expect them to only update after the refresh interval elapses. That's why I set the refresh interval to a few days – for privacy. So this was meaningless?

> Does
> it mean that there is shown nothing wrong in the logs if you right-click the
> calendar in the Calendar view and pick Refresh there, but if you keep
> evolution (or the desktop session) running for a day, and the Refresh time
> comes, then it also shows the errors in the log? That seems to be important
> and I missed it.
Yes, manually picking refresh from all calendar context menus in evolution doesn't crash evolution-calendar-factory-subprocess.
As you may have noted I set the refresh interval to a few days, but I typically login on one affected machine one to five times a day. And evolution-calendar-factory-subprocess crashes about once a week.
I thought e-c-f-s would update all calendars only after the refresh interval elapses. If the interval elapses while it is not running, it would refresh the affected calendar on next start (on login). If this is true, one of these calendars should crash e-c-f-s on every automatic update, but never on manual update. I set the refresh interval to 5…20 days which approximately matches the crash interval.

Comment 22 Christian Stadelmann 2015-09-18 16:34:58 UTC
Created attachment 1075006 [details]
output of gdb attached to valgrind

Could this be an issue in gnutls? With gdb attached to valgrind I am getting SIGTRAP multiple times. Gdb breaks there and waits for me to tell it to continue. See attached output of gdb for details.

Comment 23 Milan Crha 2015-09-21 12:37:56 UTC
(In reply to Christian Stadelmann from comment #21)
> Does that mean they update on every login (since they are started with
> gnome-shell)?

Yes, that is true.

> This is not clear from the UI, I'd expect them to only update
> after the refresh interval elapses. That's why I set the refresh interval to
> a few days – for privacy. So this was meaningless?

I'd not call it meaningless, if the calendar is left opened, then it uses that Refresh interval. The reason to update after open is that you get up-to-date content, not a stale content (of course, it means the refresh interval you've set is ignored on open, while you might know why you set that interval and accept the fact that your calendar can show stale data). Please note that On The Web calendars are capable of the ETag-s, which the server can return and the calendar uses it to recognize changes on the calendar, and, if it didn't change, the calendar simply skips its update.

> If this is true, one of these calendars should crash e-c-f-s on every
> automatic update, but never on manual update. I set the refresh interval to
> 5…20 days which approximately matches the crash interval.

It also depends whether you suspend the machine or rather log out of the GNOME Shell and power off the machine.

(In reply to Christian Stadelmann from comment #22)
> Could this be an issue in gnutls?

I cannot tell from the backtrace. Inst5ead of 'c' use 'bt' and then 'c', to see where exactly the SIGTRAP had been catch.

Comment 24 Christian Stadelmann 2015-09-29 13:59:45 UTC
Created attachment 1078355 [details]
gdb backtrace of SIGTRAP occurrences as requested in comment #23

I am seeing quite many of these messages caused by evolution-calendar-subprocess(-original) lately:

GLib-CRITICAL **: g_variant_new_string: assertion 'g_utf8_validate (string, -1, NULL)' failed
GLib-CRITICAL **: g_variant_ref_sink: assertion 'value != NULL' failed
GLib-CRITICAL **: g_variant_new_string: assertion 'g_utf8_validate (string, -1, NULL)' failed
GLib-CRITICAL **: g_variant_ref_sink: assertion 'value != NULL' failed
GLib-CRITICAL **: g_variant_new_string: assertion 'g_utf8_validate (string, -1, NULL)' failed
GLib-CRITICAL **: g_variant_ref_sink: assertion 'value != NULL' failed
GLib-CRITICAL **: g_variant_new_string: assertion 'g_utf8_validate (string, -1, NULL)' failed
GLib-CRITICAL **: g_variant_ref_sink: assertion 'value != NULL' failed
GLib-CRITICAL **: g_variant_ref: assertion 'value != NULL' failed
GLib-CRITICAL **: g_variant_unref: assertion 'value != NULL' failed
GLib-CRITICAL **: g_variant_unref: assertion 'value != NULL' failed
GLib-CRITICAL **: g_variant_unref: assertion 'value != NULL' failed
GLib-CRITICAL **: g_variant_unref: assertion 'value != NULL' failed

Besides that I haven't run into any crashes any more.

I catched some of the SIGTRAP backtraces and attached them. I have more files like that, but I doubt they will be of more use.

Comment 25 Milan Crha 2015-09-29 15:12:44 UTC
Thanks for the update. I see the SIGTRAP might be harmless here. One pattern I see from your set of backtraces is that there's always one thread being in dl-open.c or dl-close.c, when the SIGTRAP is issued. The threads are loading/unloading dynamic modules for the application, both directly (camel modules), then indirectly (proxy modules provided by some other library).

Those g_variant runtime warning are new to me.

> Besides that I haven't run into any crashes any more.

Did you update to more recent evolution-data-server, glib2 or any other similar library recently? Or maybe the running under gdb/valgrind "cured" it.

Comment 26 Christian Stadelmann 2015-09-29 15:40:11 UTC
(In reply to Milan Crha from comment #25)
> Thanks for the update. I see the SIGTRAP might be harmless here. One pattern
> I see from your set of backtraces is that there's always one thread being in
> dl-open.c or dl-close.c, when the SIGTRAP is issued. The threads are
> loading/unloading dynamic modules for the application, both directly (camel
> modules), then indirectly (proxy modules provided by some other library).
> 
> Those g_variant runtime warning are new to me.

Warnings started on 2015-09-14 right after the first boot. No evolution-related package updates before that.

> > Besides that I haven't run into any crashes any more.
> 
> Did you update to more recent evolution-data-server, glib2 or any other
> similar library recently? Or maybe the running under gdb/valgrind "cured" it.

Yes, I am running on F22 updates-testing on this machine. There is e.g.

glib2 on 2015-09-24 version glib2-2.44.1-1.fc22.x86_64 to glib2-2.44.1-2.fc22.x86_64 with this single change: http://pkgs.fedoraproject.org/cgit/glib2.git/commit/?h=f22&id=bdab866334343601ca6b92e23029b4bd78ca6de6

No updates to evolution* packages since last crash. Last crash was on 2015-09-13, reported by valgrind. Valgrind didn't cure it, see comment #16

Comment 27 Christian Stadelmann 2015-10-11 14:32:01 UTC
Created attachment 1081798 [details]
another valgrind log

I ran into another crash (SIGSEGV) in e-c-f-s last week, see attached valgrind log. I didn't notice that crash so I can't provide more detailled information.

Comment 28 Milan Crha 2015-10-12 07:04:30 UTC
Thanks for the update.  The valgrind log shows one backtrace, some free_mem() call inside g_cond_wait().

There is clearly happening something wrong with the memory in the On The Web calendar backend, but I still do not know what, or what the conditions to trigger it are. Such memory issues can strike back in various places, possibly sometimes completely overlooked.

Comment 29 Christian Stadelmann 2015-11-04 09:26:48 UTC
Created attachment 1089478 [details]
some more valgrind logs on F23 / eds 3.18.1

Since Updating to F23 I get more errors logged from valgrind. See attached files.

Comment 30 Milan Crha 2015-11-05 07:28:26 UTC
Thanks for the update. If I read the logs properly, then valgrind logged about the crash itself, not right before it with some "explanation" what could go wrong. The logged crashes are in low-level libraries, which can also mean that the crash was detected there, not necessarily happened there.

I'll update to Fedora 23 soon, maybe I'll be more lucky with the reproducer there.

Comment 31 Christian Stadelmann 2015-11-14 10:45:34 UTC
Since upgrading to F23 I can't attach gdb to valgrind any more. valgrind is running and correctly logging to a file. If you run into this issue too when executing
(gdb) target remote | vgdb
and get
"Remote communication error.  Target disconnected.: Connection reset by peer."
you should probably make sure that vgdb and valgrind have the same environment. you may have to manually set the HOSTNAME or USER environment variables.


This "invalid read of size 4" keeps recurring (circa once a week), but I haven't been able to create a full backtrace for it.

==4548== Thread 18 pool:
==4548== Invalid read of size 4
==4548==    at 0xCAD42EB: g_dbus_connection_call_internal (gdbusconnection.c:5785)
==4548==    by 0xCAE0831: g_dbus_proxy_call_internal (gdbusproxy.c:2719)
==4548==    by 0xCAE20B5: g_dbus_proxy_call (gdbusproxy.c:2959)
==4548==    by 0x5313784: e_dbus_source_proxy_set_property (e-dbus-source.c:1630)
==4548==    by 0xCD98157: object_set_property (gobject.c:1420)
==4548==    by 0xCD98157: g_object_set_valist (gobject.c:2163)
==4548==    by 0xCD9882B: g_object_set (gobject.c:2273)
==4548==    by 0x50A782C: e_source_set_connection_status (e-source.c:3515)
==4548==    by 0x2CB8DA0C: cal_backend_http_load (e-cal-backend-http.c:619)
==4548==    by 0x2CB8E673: begin_retrieval_cb (e-cal-backend-http.c:799)
==4548==    by 0xCA87D17: g_task_thread_pool_thread (gtask.c:1283)
==4548==    by 0xD0441CD: g_thread_pool_thread_proxy (gthreadpool.c:307)
==4548==    by 0xD043834: g_thread_proxy (gthread.c:778)
==4548==  Address 0x3148abc0 is 16 bytes inside a block of size 32 free'd
==4548==    at 0x4C29D6A: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==4548==    by 0xD0225ED: g_free (gmem.c:189)
==4548==    by 0xD039D9C: g_slice_free1 (gslice.c:1112)
==4548==    by 0xCA873FC: g_task_finalize (gtask.c:633)
==4548==    by 0xCD955AE: g_object_unref (gobject.c:3179)
==4548==    by 0xCAD45B3: g_dbus_connection_call_done (gdbusconnection.c:5707)
==4548==    by 0xCA87582: g_task_return_now (gtask.c:1104)
==4548==    by 0xCA875B8: complete_in_idle_cb (gtask.c:1118)
==4548==    by 0xD01CE39: g_main_dispatch (gmain.c:3154)
==4548==    by 0xD01CE39: g_main_context_dispatch (gmain.c:3769)
==4548==    by 0xD01D1CF: g_main_context_iterate.isra.29 (gmain.c:3840)
==4548==    by 0xD01D4F1: g_main_loop_run (gmain.c:4034)
==4548==    by 0x109911: main (evolution-calendar-factory-subprocess.c:213)
==4548==  Block was alloc'd at
==4548==    at 0x4C28C50: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==4548==    by 0xD0224D8: g_malloc (gmem.c:94)
==4548==    by 0xD039622: g_slice_alloc (gslice.c:1007)
==4548==    by 0xD039CBD: g_slice_alloc0 (gslice.c:1032)
==4548==    by 0xCAD425C: g_dbus_connection_call_internal (gdbusconnection.c:5766)
==4548==    by 0xCAE0831: g_dbus_proxy_call_internal (gdbusproxy.c:2719)
==4548==    by 0xCAE20B5: g_dbus_proxy_call (gdbusproxy.c:2959)
==4548==    by 0x5313784: e_dbus_source_proxy_set_property (e-dbus-source.c:1630)
==4548==    by 0xCD98157: object_set_property (gobject.c:1420)
==4548==    by 0xCD98157: g_object_set_valist (gobject.c:2163)
==4548==    by 0xCD9882B: g_object_set (gobject.c:2273)
==4548==    by 0x50A782C: e_source_set_connection_status (e-source.c:3515)
==4548==    by 0x2CB8DA0C: cal_backend_http_load (e-cal-backend-http.c:619)
==4548==

Comment 32 Milan Crha 2015-11-19 07:41:24 UTC
(In reply to Christian Stadelmann from comment #31)
> This "invalid read of size 4" keeps recurring (circa once a week), but I
> haven't been able to create a full backtrace for it.

Right, that's a known (and ignored by the GLib developers) issue:
https://bugzilla.gnome.org/show_bug.cgi?id=748263

I still didn't manage to reproduce the crash itself, but I think I figured what was going wrong and committed a change to the evolution-data-server as part of an upstream bug report:
https://bugzilla.gnome.org/show_bug.cgi?id=702127

I made a test build with the change included. I'd like to ask you to install it and give it a try for a week or so, whether you'll be able to reproduce the crash with it. I suppose you'll not, but it needs a verification. It'll be good to catch console output of the respective calendar factory, to see whether anything will be claimed there. After you update the packages the background evolution processes should be restarted, thus they'll start to use the changes. The test build is available here:
http://koji.fedoraproject.org/koji/taskinfo?taskID=11901355

Thanks a lot for your help with this.


Note You need to log in before you can comment on or make changes to this bug.