Description of problem: Random crashes in realloc in different programs starting with glibc-2.28.900-19 Some coredumpctl samples I have here: One nautilus coredump: Stack trace of thread 17746: #0 0x00007f5881fb2ac9 _int_free (libc.so.6) #1 0x00007f5881fb4eaf _int_realloc (libc.so.6) #2 0x00007f5881fb622b __GI___libc_realloc (libc.so.6) #3 0x00007f5882fc505e g_realloc (libglib-2.0.so.0) #4 0x00007f5882fe21f7 g_string_maybe_expand (libglib-2.0.so.0) #5 0x00007f5882fe254a g_string_insert_len (libglib-2.0.so.0) #6 0x00007f5882faa2ae g_build_path_va (libglib-2.0.so.0) #7 0x00007f5882fab739 g_build_filename_va (libglib-2.0.so.0) #8 0x00007f588247cb5a get_thumbnail_attributes (libgio-2.0.so.0) #9 0x00007f588247eea4 _g_local_file_info_get (libgio-2.0.so.0) #10 0x00007f588247945b g_local_file_query_info (libgio-2.0.so.0) #11 0x00007f58823e0e08 query_info_async_thread (libgio-2.0.so.0) #12 0x00007f5882425a07 g_task_thread_pool_thread (libgio-2.0.so.0) #13 0x00007f5882fe8e93 g_thread_pool_thread_proxy (libglib-2.0.so.0) #14 0x00007f5882fe848a g_thread_proxy (libglib-2.0.so.0) #15 0x00007f58820fd583 start_thread (libpthread.so.0) #16 0x00007f588202c083 __clone (libc.so.6) One evoluion: Stack trace of thread 16382: #0 0x00007f40cc9beac9 _int_free (libc.so.6) #1 0x00007f40cc9c0eaf _int_realloc (libc.so.6) #2 0x00007f40cc9c222b __GI___libc_realloc (libc.so.6) #3 0x00007f40d040305e g_realloc (libglib-2.0.so.0) #4 0x00007f40d04201f7 g_string_maybe_expand (libglib-2.0.so.0) #5 0x00007f40d042054a g_string_insert_len (libglib-2.0.so.0) #6 0x00007f40d03e82ae g_build_path_va (libglib-2.0.so.0) #7 0x00007f40d03e9739 g_build_filename_va (libglib-2.0.so.0) #8 0x00007f40d0539ee3 data_cache_expire (libcamel-1.2.so.62) #9 0x00007f40d053a188 data_cache_path (libcamel-1.2.so.62) #10 0x00007f40d053aa1c camel_data_cache_get (libcamel-1.2.so.62) #11 0x00007f40c42e17c2 imapx_get_message_cached (libcamelimapx.so) #12 0x00007f40d055d0cd camel_folder_get_message_sync (libcamel-1.2.so.62) #13 0x00007f40d055d784 folder_get_message_thread (libcamel-1.2.so.62) #14 0x00007f40cfa08a07 g_task_thread_pool_thread (libgio-2.0.so.0) #15 0x00007f40d0426e93 g_thread_pool_thread_proxy (libglib-2.0.so.0) #16 0x00007f40d042648a g_thread_proxy (libglib-2.0.so.0) #17 0x00007f40d04d6583 start_thread (libpthread.so.0) #18 0x00007f40cca38083 __clone (libc.so.6) One liferea: Stack trace of thread 24158: #0 0x00007f4ee348bac9 _int_free (libc.so.6) #1 0x00007f4ee348deaf _int_realloc (libc.so.6) #2 0x00007f4ee348f22b __GI___libc_realloc (libc.so.6) #3 0x00007f4e502a43fb n/a (p11-kit-trust.so) #4 0x00007f4e502a446a n/a (p11-kit-trust.so) #5 0x00007f4e502a4115 n/a (p11-kit-trust.so) #6 0x00007f4e502a54a1 n/a (p11-kit-trust.so) #7 0x00007f4e502a8be1 n/a (p11-kit-trust.so) #8 0x00007f4e5016622c find_cert_cb (libgnutls.so.30) #9 0x00007f4e5016b8e3 _pkcs11_traverse_tokens (libgnutls.so.30) #10 0x00007f4e5016d95b gnutls_pkcs11_crt_is_known (libgnutls.so.30) #11 0x00007f4e501afde6 _gnutls_pkcs11_verify_crt_status (libgnutls.so.30) #12 0x00007f4e501bfda9 gnutls_x509_trust_list_verify_crt2 (libgnutls.so.30) #13 0x00007f4e501bffa9 gnutls_x509_trust_list_verify_crt (libgnutls.so.30) #14 0x00007f4e502e2f73 g_tls_database_gnutls_verify_chain (libgiognutls.so) #15 0x00007f4e502dfc93 verify_peer_certificate (libgiognutls.so) #16 0x00007f4e502dffda async_handshake_thread (libgiognutls.so) #17 0x00007f4ee3974a07 g_task_thread_pool_thread (libgio-2.0.so.0) #18 0x00007f4ee37ace93 g_thread_pool_thread_proxy (libglib-2.0.so.0) #19 0x00007f4ee37ac48a g_thread_proxy (libglib-2.0.so.0) #20 0x00007f4ee35d6583 start_thread (libpthread.so.0) #21 0x00007f4ee3505083 __clone (libc.so.6) I dont't have a reproducer. All of these use threads..
Ugh, sorry about that. Do you have a backtrace with debugging information? Thanks.
Here is an excerpt from the evolution crash. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f40cc9beac9 in _int_free (av=av@entry=0x7f4008000020, p=p@entry=0x7f4008001ff0, have_lock=have_lock@entry=1) at malloc.c:4243 4243 if (tmp == e) [Current thread is 1 (Thread 0x7f404d3f7700 (LWP 16382))] Missing separate debuginfos, use: dnf debuginfo-install enchant2-2.2.3-5.fc30.x86_64 libnghttp2-1.34.0-1.fc30.x86_64 libtool-ltdl-2.4.6-27.fc30.x86_64 libxcrypt-4.4.0-1.fc30.x86_64 opensc-0.19.0-3.fc30.x86_64 pcsc-lite-libs-1.8.24-1.fc30.x86_64 sssd-client-2.0.0-5.fc30.x86_64 webkit2gtk3-2.22.3-2.fc30.x86_64 webkit2gtk3-jsc-2.22.3-2.fc30.x86_64 woff2-1.0.2-4.fc29.x86_64 xfconf-4.13.6-2.fc30.x86_64 yajl-2.1.0-11.fc29.x86_64 (gdb) bt full #0 0x00007f40cc9beac9 in _int_free (av=av@entry=0x7f4008000020, p=p@entry=0x7f4008001ff0, have_lock=have_lock@entry=1) at malloc.c:4243 tmp = 0x1 tc_idx = 254 e = 0x7f4008002000 size = 4096 fb = <optimized out> nextchunk = <optimized out> nextsize = <optimized out> nextinuse = <optimized out> prevsize = <optimized out> bck = <optimized out> fwd = <optimized out> __PRETTY_FUNCTION__ = "_int_free" #1 0x00007f40cc9c0eaf in _int_realloc (av=av@entry=0x7f4008000020, oldp=oldp@entry=0x7f4008001f60, oldsize=oldsize@entry=80, nb=nb@entry=144) at malloc.c:4710 newp = 0x7f4008001f60 newsize = 4240 newmem = <optimized out> next = 0x7f4008001fb0 remainder = 0x7f4008001ff0 remainder_size = 4096 copysize = <optimized out> ncopies = <optimized out> s = <optimized out> d = <optimized out> __PRETTY_FUNCTION__ = "_int_realloc" nextsize = <optimized out> #2 0x00007f40cc9c222b in __GI___libc_realloc (oldmem=0x7f4008001f70, bytes=bytes@entry=128) at malloc.c:3301 ar_ptr = 0x7f4008000020 nb = 144 newp = <optimized out> hook = <optimized out> oldp = 0x7f4008001f60 oldsize = 80 __PRETTY_FUNCTION__ = "__libc_realloc" #3 0x00007f40d040305e in g_realloc (mem=0x7f4008001f70, n_bytes=128) at gmem.c:164 newmem = <optimized out> #4 0x00007f40d04201f7 in g_string_maybe_expand (string=0x7f40ac601440, len=<optimized out>) at gstring.c:102 #5 0x00007f40d042054a in g_string_insert_len (string=0x7f40ac601440, pos=<optimized out>, val=0x7f400800411b "2549505", len=<optimized out>) at gstring.c:476 pos = <optimized out> string = 0x7f40ac601440 __func__ = "g_string_insert_len" len = <optimized out> val = 0x7f400800411b "2549505" __func__ = "g_string_insert_len" #6 0x00007f40d03e82ae in g_build_path_va (separator=separator@entry=0x7f40d044bedf "/", first_element=first_element@entry=0x7f404d3f66b0 "/home/yaneti/.cache/evolution/mail/0/folders/INBOX/cur/00", args=args@entry=0x7f404d3f64d0, str_array=str_array@entry=0x0) at gfileutils.c:1766 element = <optimized out> start = <optimized out> end = 0x7f4008004122 "" result = 0x7f40ac601440 separator_len = <optimized out> is_first = 0 have_leading = 1 single_element = 0x0 next_element = 0x0 last_trailing = 0x7f4008004122 "" i = 0 #7 0x00007f40d03e9739 in g_build_filename_va (str_array=0x0, args=0x7f404d3f64d0, first_argument=<optimized out>) at gfileutils.c:2069 str = <optimized out> str = <optimized out> args = {{gp_offset = 24, fp_offset = 48, overflow_arg_area = 0x7f404d3f65b0, reg_save_area = 0x7f404d3f64f0}} .....
(In reply to Yanko Kaneti from comment #2) > Here is an excerpt from the evolution crash. > > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x00007f40cc9beac9 in _int_free (av=av@entry=0x7f4008000020, > p=p@entry=0x7f4008001ff0, have_lock=have_lock@entry=1) at malloc.c:4243 > 4243 if (tmp == e) This is in the new double-free checking code: 4229 /* Check to see if it's already in the tcache. */ 4230 tcache_entry *e = (tcache_entry *) chunk2mem (p); 4231 4232 /* This test succeeds on double free. However, we don't 100% 4233 trust it (it also matches random payload data at a 1 in 4234 2^<size_t> chance), so verify it's not an unlikely coincidence 4235 before aborting. */ 4236 if (__glibc_unlikely (e->key == tcache && tcache)) 4237 { 4238 tcache_entry *tmp; 4239 LIBC_PROBE (memory_tcache_double_free, 2, e, tc_idx); 4240 for (tmp = tcache->entries[tc_idx]; 4241 tmp; 4242 tmp = tmp->next) 4243 if (tmp == e) 4244 malloc_printerr ("free(): double free detected in tcache 2") 4244 ; 4245 /* If we get here, it was a coincidence. We've wasted a few 4246 cycles, but don't abort. */ 4247 } I will try to create a reproducer, using random cross-thread reallocs, and revert the upstream patch in rawhide later today.
I also see a GCC crash when rebuilding glibc itself, which could be related. GCC is not multi-threaded. I filed an untag request with releng: https://pagure.io/releng/issue/7928
I managed to obtain the core file. Backtrace looks very similar, and the process was NOT multi-threaded. #10 0xf7c4faf0 in _int_free (av=av@entry=0xf7d787a0 <main_arena>, p=p@entry=0xabd7958, have_lock=have_lock@entry=1) at malloc.c:4243 #11 0xf7c51cb0 in _int_realloc (av=av@entry=0xf7d787a0 <main_arena>, oldp=oldp@entry=0xabd77e8, oldsize=oldsize@entry=352, nb=368) at malloc.c:4710 #12 0xf7c52c75 in __GI___libc_realloc (oldmem=0xabd77f0, bytes=352) at malloc.c:3292 (gdb) print tmp $2 = (tcache_entry *) 0xd The process is not multithreaded, which is why TLS does not work in GDB: (gdb) print tcache Cannot find thread-local storage for LWP 17812, shared library /lib/libc.so.6: Cannot find thread-local variables on this target But assuming that e->key *is* the address of the tcache, we have: (gdb) print e->key->entries[tc_idx] $28 = (tcache_entry *) 0xd (gdb) print tc_idx $31 = 109 That appears to be issue: The index is larger than TCACHE_MAX_BINS. We need to move the check for tc_idx < mp_.tcache_bin before the double-free check.
I posted what I believe is the fix upstream: https://sourceware.org/ml/libc-alpha/2018-11/msg00577.html
The upstream fix was incorporated in glibc-2.28.9000-21.fc30.