Description of problem: This happened right after boot + login while the network was not up (yet). I didn't notice any misbehaviour. Version-Release number of selected component: evolution-data-server-3.16.4-1.fc22 Additional info: reporter: libreport-2.6.2 backtrace_rating: 4 cmdline: /usr/libexec/evolution-calendar-factory-subprocess --factory webcal --bus-name org.gnome.evolution.dataserver.Subprocess.Backend.Calendarx1865x2 --own-path /org/gnome/evolution/dataserver/Subprocess/Backend/Calendar/1865/2 crash_function: strlen executable: /usr/libexec/evolution-calendar-factory-subprocess kernel: 4.1.3-201.fc22.x86_64 runlevel: N 5 type: CCpp Truncated backtrace: Thread no. 1 (7 frames) #0 strlen at ../sysdeps/x86_64/strlen.S:106 #1 icalmemory_strdup_and_quote at /usr/src/debug/libical-1.0.1/src/libical/icalvalue.c:302 #2 icalvalue_as_ical_string_r at /usr/src/debug/libical-1.0.1/src/libical/icalvalue.c:950 #4 icalproperty_as_ical_string_r at /usr/src/debug/libical-1.0.1/src/libical/icalproperty.c:507 #5 icalcomponent_as_ical_string_r at /usr/src/debug/libical-1.0.1/src/libical/icalcomponent.c:347 #8 cal_backend_store_save_cache_now at e-cal-backend-store.c:350 #9 cal_backend_store_save_cache_timeout_cb at e-cal-backend-store.c:386
Created attachment 1058512 [details] File: backtrace
Created attachment 1058513 [details] File: core_backtrace
Created attachment 1058514 [details] File: dso_list
Created attachment 1058515 [details] File: limits
Created attachment 1058516 [details] File: maps
Created attachment 1058517 [details] File: namespaces
Created attachment 1058518 [details] File: open_fds
Created attachment 1058519 [details] File: proc_pid_status
Thanks for a bug report. The backtrace shows a crash in one of your On The Web calendars, when it was saving changes to the local cache. It crashed somewhere in libical library, which usually happens when there's involved some memory corruption, like a use-after-free (some code wrote data into a memory which was already freed, effectively overwriting memory used for another data, or a similar issue).
I ran into an issue today which behaved similarly (crash with segv on login): https://bugzilla.redhat.com/show_bug.cgi?id=1203600#c13 , could be a duplicate. Anyways is there something to help debugging? If its a memory corruption issue, valgrind should be able to find it, right? So I could just rename the evolution-calendar-factory-subprocess binary, create a script with the original name and start valgrind. Will I break Fedora/Evolution with that? Any better ideas?
The subprocess is run multiple times, the one which crashed for you was a webcal, which is On The Web calendar. One of them causing the crash. Replacing the subprocess with a script which would run the original subprocess under valgrind will work, just make sure you'll pass all the arguments to the original binary. As you'll have the script and the original binary you can always revert the debugging by returning back the original binary. Your changes will be overwritten back with any evolution-data-server package updates/reinstalls. Make sure you'll run valgrind with G_SLICE=always-malloc exported, thus the GLib's GSlice will not make trouble. The command can look like: G_SLICE=always-malloc valgrind ... also always log to a new file, thus the different subprocesses will not fight who writes the log file content. One problem can be that the slowness caused by the all memory checking in valgrind will cause the network connection to be established before the calendar will try to update its content, which may or may not be a problem (in a meaning for the reproducibility). Also note that valgrind can catch some memory issues and it avoids the crash, while it'll still log about the issue.
Created attachment 1070854 [details] A script to start valgrind Ok, so for future reference I did this: 1. I renamed /usr/libexec/evolution-calendar-factory-subprocess to /usr/libexec/evolution-calendar-factory-subprocess-original 2. created a new executable bash file /usr/libexec/evolution-calendar-factory-subprocess (attached) 3. Every time I log in I autostart this command in a terminal: sleep 20s && gdb /usr/libexec/evolution-calendar-factory-subprocess-original -ex "target remote | /usr/lib64/valgrind/../../bin/vgdb --pid=`pidof -s memcheck-amd64-linux`" 4. I just need to press "c" + Enter in the terminal window. Although I logged in and rebooted some times it didn't reoccur (yet).
Thanks for the update. I would skip the leak check, it only for memory which wasn't freed, it doesn't help with memory errors (as far as I know). I suppose you also looked into the log, to see whether valgrind catch the crash.
Oh, thanks for the hint. Yes, I looked into the log. Only showed meta-information and things like 12,168 bytes in 169 blocks are possibly lost in loss record 3,143 of 3,168 + Stacktrace so far. I'll remove leak checking, that may give a speedup. As far as I understood I would see if it crashed because gdb is attached to valgrind.
(In reply to Christian Stadelmann from comment #14) > As far as I understood I would see if it crashed because gdb is attached to > valgrind. Not necessarily. Valgrind can, in certain cases, avoid the crash and log about the memory issue only. It depends on the actual memory issue. The backtrace mentions cal_backend_store_save_cache_timeout_cb, search your logs for it, as it can be there as well.
Created attachment 1073120 [details] valgrind log generated by script from comment #12 I got the first useful log file yesterday, see attachment. I was unable to debug it though, so there is no detailed info.
Thanks for the update. It looks like the issue happens because your On The Web calendar has set many timezones. I do not see anything obvious after a brief reading of the libical code, but I'll try to investigate this further and let you know when I know more information.
I tried to reproduce it here, by creating a large .ics file (~4MB) with quite many events with different time zones, but it didn't strike here. Valgrind is also mute on it. I didn't do anything special, I just pointer evolution into the .ics file through an On The Web calendar and let it open it. I see the events in the UI, thus it was opened properly. Maybe the timezones are just a consequence. I'll keep the calendar set and enabled in my Evolution, even though it will not change.
I don't think this is a time zone issue. I am in CEST (or CET in winter) and all calendars I use are in my timezone. I think (but am not sure) all events are in CET/CEST too. Does evolution use the same evolution-calendar-factory-subprocess process when I open evolution and manually refresh the calendar? In this case I don't get a crash or any message from valgrind's log files. The update interval for my calendars is set to multiple days, so it looks like evolution-calendar-factory-subprocess breaks every time I update a specific calendar automatically on login. I am sending you my calendar configuration (web calendar sources from ~/.config/evolution/sources ) via email.
Thanks for the .source files. Can I add those calendars into my evolution, please? They all seem to be public, non-authenticated, but I rather ask. I'm also in CEST/CET time zone, I would be able to verify what time zones the calendars use. Evolution keeps using the same subprocess (and backend) as long as the calendar is left opened. The calendars update on open, then only after the Refresh time elapses. Does it mean that there is shown nothing wrong in the logs if you right-click the calendar in the Calendar view and pick Refresh there, but if you keep evolution (or the desktop session) running for a day, and the Refresh time comes, then it also shows the errors in the log? That seems to be important and I missed it.
(In reply to Milan Crha from comment #20) > Thanks for the .source files. Can I add those calendars into my evolution, > please? They all seem to be public, non-authenticated, but I rather ask. I'm > also in CEST/CET time zone, I would be able to verify what time zones the > calendars use. Yes, that is ok. And thanks for asking and caring about privacy! > The calendars update on open, then only after the Refresh time elapses. Does that mean they update on every login (since they are started with gnome-shell)? This is not clear from the UI, I'd expect them to only update after the refresh interval elapses. That's why I set the refresh interval to a few days – for privacy. So this was meaningless? > Does > it mean that there is shown nothing wrong in the logs if you right-click the > calendar in the Calendar view and pick Refresh there, but if you keep > evolution (or the desktop session) running for a day, and the Refresh time > comes, then it also shows the errors in the log? That seems to be important > and I missed it. Yes, manually picking refresh from all calendar context menus in evolution doesn't crash evolution-calendar-factory-subprocess. As you may have noted I set the refresh interval to a few days, but I typically login on one affected machine one to five times a day. And evolution-calendar-factory-subprocess crashes about once a week. I thought e-c-f-s would update all calendars only after the refresh interval elapses. If the interval elapses while it is not running, it would refresh the affected calendar on next start (on login). If this is true, one of these calendars should crash e-c-f-s on every automatic update, but never on manual update. I set the refresh interval to 5…20 days which approximately matches the crash interval.
Created attachment 1075006 [details] output of gdb attached to valgrind Could this be an issue in gnutls? With gdb attached to valgrind I am getting SIGTRAP multiple times. Gdb breaks there and waits for me to tell it to continue. See attached output of gdb for details.
(In reply to Christian Stadelmann from comment #21) > Does that mean they update on every login (since they are started with > gnome-shell)? Yes, that is true. > This is not clear from the UI, I'd expect them to only update > after the refresh interval elapses. That's why I set the refresh interval to > a few days – for privacy. So this was meaningless? I'd not call it meaningless, if the calendar is left opened, then it uses that Refresh interval. The reason to update after open is that you get up-to-date content, not a stale content (of course, it means the refresh interval you've set is ignored on open, while you might know why you set that interval and accept the fact that your calendar can show stale data). Please note that On The Web calendars are capable of the ETag-s, which the server can return and the calendar uses it to recognize changes on the calendar, and, if it didn't change, the calendar simply skips its update. > If this is true, one of these calendars should crash e-c-f-s on every > automatic update, but never on manual update. I set the refresh interval to > 5…20 days which approximately matches the crash interval. It also depends whether you suspend the machine or rather log out of the GNOME Shell and power off the machine. (In reply to Christian Stadelmann from comment #22) > Could this be an issue in gnutls? I cannot tell from the backtrace. Inst5ead of 'c' use 'bt' and then 'c', to see where exactly the SIGTRAP had been catch.
Created attachment 1078355 [details] gdb backtrace of SIGTRAP occurrences as requested in comment #23 I am seeing quite many of these messages caused by evolution-calendar-subprocess(-original) lately: GLib-CRITICAL **: g_variant_new_string: assertion 'g_utf8_validate (string, -1, NULL)' failed GLib-CRITICAL **: g_variant_ref_sink: assertion 'value != NULL' failed GLib-CRITICAL **: g_variant_new_string: assertion 'g_utf8_validate (string, -1, NULL)' failed GLib-CRITICAL **: g_variant_ref_sink: assertion 'value != NULL' failed GLib-CRITICAL **: g_variant_new_string: assertion 'g_utf8_validate (string, -1, NULL)' failed GLib-CRITICAL **: g_variant_ref_sink: assertion 'value != NULL' failed GLib-CRITICAL **: g_variant_new_string: assertion 'g_utf8_validate (string, -1, NULL)' failed GLib-CRITICAL **: g_variant_ref_sink: assertion 'value != NULL' failed GLib-CRITICAL **: g_variant_ref: assertion 'value != NULL' failed GLib-CRITICAL **: g_variant_unref: assertion 'value != NULL' failed GLib-CRITICAL **: g_variant_unref: assertion 'value != NULL' failed GLib-CRITICAL **: g_variant_unref: assertion 'value != NULL' failed GLib-CRITICAL **: g_variant_unref: assertion 'value != NULL' failed Besides that I haven't run into any crashes any more. I catched some of the SIGTRAP backtraces and attached them. I have more files like that, but I doubt they will be of more use.
Thanks for the update. I see the SIGTRAP might be harmless here. One pattern I see from your set of backtraces is that there's always one thread being in dl-open.c or dl-close.c, when the SIGTRAP is issued. The threads are loading/unloading dynamic modules for the application, both directly (camel modules), then indirectly (proxy modules provided by some other library). Those g_variant runtime warning are new to me. > Besides that I haven't run into any crashes any more. Did you update to more recent evolution-data-server, glib2 or any other similar library recently? Or maybe the running under gdb/valgrind "cured" it.
(In reply to Milan Crha from comment #25) > Thanks for the update. I see the SIGTRAP might be harmless here. One pattern > I see from your set of backtraces is that there's always one thread being in > dl-open.c or dl-close.c, when the SIGTRAP is issued. The threads are > loading/unloading dynamic modules for the application, both directly (camel > modules), then indirectly (proxy modules provided by some other library). > > Those g_variant runtime warning are new to me. Warnings started on 2015-09-14 right after the first boot. No evolution-related package updates before that. > > Besides that I haven't run into any crashes any more. > > Did you update to more recent evolution-data-server, glib2 or any other > similar library recently? Or maybe the running under gdb/valgrind "cured" it. Yes, I am running on F22 updates-testing on this machine. There is e.g. glib2 on 2015-09-24 version glib2-2.44.1-1.fc22.x86_64 to glib2-2.44.1-2.fc22.x86_64 with this single change: http://pkgs.fedoraproject.org/cgit/glib2.git/commit/?h=f22&id=bdab866334343601ca6b92e23029b4bd78ca6de6 No updates to evolution* packages since last crash. Last crash was on 2015-09-13, reported by valgrind. Valgrind didn't cure it, see comment #16
Created attachment 1081798 [details] another valgrind log I ran into another crash (SIGSEGV) in e-c-f-s last week, see attached valgrind log. I didn't notice that crash so I can't provide more detailled information.
Thanks for the update. The valgrind log shows one backtrace, some free_mem() call inside g_cond_wait(). There is clearly happening something wrong with the memory in the On The Web calendar backend, but I still do not know what, or what the conditions to trigger it are. Such memory issues can strike back in various places, possibly sometimes completely overlooked.
Created attachment 1089478 [details] some more valgrind logs on F23 / eds 3.18.1 Since Updating to F23 I get more errors logged from valgrind. See attached files.
Thanks for the update. If I read the logs properly, then valgrind logged about the crash itself, not right before it with some "explanation" what could go wrong. The logged crashes are in low-level libraries, which can also mean that the crash was detected there, not necessarily happened there. I'll update to Fedora 23 soon, maybe I'll be more lucky with the reproducer there.
Since upgrading to F23 I can't attach gdb to valgrind any more. valgrind is running and correctly logging to a file. If you run into this issue too when executing (gdb) target remote | vgdb and get "Remote communication error. Target disconnected.: Connection reset by peer." you should probably make sure that vgdb and valgrind have the same environment. you may have to manually set the HOSTNAME or USER environment variables. This "invalid read of size 4" keeps recurring (circa once a week), but I haven't been able to create a full backtrace for it. ==4548== Thread 18 pool: ==4548== Invalid read of size 4 ==4548== at 0xCAD42EB: g_dbus_connection_call_internal (gdbusconnection.c:5785) ==4548== by 0xCAE0831: g_dbus_proxy_call_internal (gdbusproxy.c:2719) ==4548== by 0xCAE20B5: g_dbus_proxy_call (gdbusproxy.c:2959) ==4548== by 0x5313784: e_dbus_source_proxy_set_property (e-dbus-source.c:1630) ==4548== by 0xCD98157: object_set_property (gobject.c:1420) ==4548== by 0xCD98157: g_object_set_valist (gobject.c:2163) ==4548== by 0xCD9882B: g_object_set (gobject.c:2273) ==4548== by 0x50A782C: e_source_set_connection_status (e-source.c:3515) ==4548== by 0x2CB8DA0C: cal_backend_http_load (e-cal-backend-http.c:619) ==4548== by 0x2CB8E673: begin_retrieval_cb (e-cal-backend-http.c:799) ==4548== by 0xCA87D17: g_task_thread_pool_thread (gtask.c:1283) ==4548== by 0xD0441CD: g_thread_pool_thread_proxy (gthreadpool.c:307) ==4548== by 0xD043834: g_thread_proxy (gthread.c:778) ==4548== Address 0x3148abc0 is 16 bytes inside a block of size 32 free'd ==4548== at 0x4C29D6A: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==4548== by 0xD0225ED: g_free (gmem.c:189) ==4548== by 0xD039D9C: g_slice_free1 (gslice.c:1112) ==4548== by 0xCA873FC: g_task_finalize (gtask.c:633) ==4548== by 0xCD955AE: g_object_unref (gobject.c:3179) ==4548== by 0xCAD45B3: g_dbus_connection_call_done (gdbusconnection.c:5707) ==4548== by 0xCA87582: g_task_return_now (gtask.c:1104) ==4548== by 0xCA875B8: complete_in_idle_cb (gtask.c:1118) ==4548== by 0xD01CE39: g_main_dispatch (gmain.c:3154) ==4548== by 0xD01CE39: g_main_context_dispatch (gmain.c:3769) ==4548== by 0xD01D1CF: g_main_context_iterate.isra.29 (gmain.c:3840) ==4548== by 0xD01D4F1: g_main_loop_run (gmain.c:4034) ==4548== by 0x109911: main (evolution-calendar-factory-subprocess.c:213) ==4548== Block was alloc'd at ==4548== at 0x4C28C50: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==4548== by 0xD0224D8: g_malloc (gmem.c:94) ==4548== by 0xD039622: g_slice_alloc (gslice.c:1007) ==4548== by 0xD039CBD: g_slice_alloc0 (gslice.c:1032) ==4548== by 0xCAD425C: g_dbus_connection_call_internal (gdbusconnection.c:5766) ==4548== by 0xCAE0831: g_dbus_proxy_call_internal (gdbusproxy.c:2719) ==4548== by 0xCAE20B5: g_dbus_proxy_call (gdbusproxy.c:2959) ==4548== by 0x5313784: e_dbus_source_proxy_set_property (e-dbus-source.c:1630) ==4548== by 0xCD98157: object_set_property (gobject.c:1420) ==4548== by 0xCD98157: g_object_set_valist (gobject.c:2163) ==4548== by 0xCD9882B: g_object_set (gobject.c:2273) ==4548== by 0x50A782C: e_source_set_connection_status (e-source.c:3515) ==4548== by 0x2CB8DA0C: cal_backend_http_load (e-cal-backend-http.c:619) ==4548==
(In reply to Christian Stadelmann from comment #31) > This "invalid read of size 4" keeps recurring (circa once a week), but I > haven't been able to create a full backtrace for it. Right, that's a known (and ignored by the GLib developers) issue: https://bugzilla.gnome.org/show_bug.cgi?id=748263 I still didn't manage to reproduce the crash itself, but I think I figured what was going wrong and committed a change to the evolution-data-server as part of an upstream bug report: https://bugzilla.gnome.org/show_bug.cgi?id=702127 I made a test build with the change included. I'd like to ask you to install it and give it a try for a week or so, whether you'll be able to reproduce the crash with it. I suppose you'll not, but it needs a verification. It'll be good to catch console output of the respective calendar factory, to see whether anything will be claimed there. After you update the packages the background evolution processes should be restarted, thus they'll start to use the changes. The test build is available here: http://koji.fedoraproject.org/koji/taskinfo?taskID=11901355 Thanks a lot for your help with this.