Bug 1674280

Summary: glibc: Invalid LIBC_PROBE in __pthread_timedjoin_ex can cause SIGSEGV
Product: [Fedora] Fedora Reporter: Igor Raits <igor.raits>
Component: glibcAssignee: Florian Weimer <fweimer>
Status: CLOSED NEXTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: aoliva, arjun.is, codonell, dj, fweimer, igor.raits, jistone, law, mfabian, pfrankli, rth, rust-sig, siddhesh, TicoTimo
Target Milestone: ---   
Target Release: ---   
Hardware: armv7l   
OS: Unspecified   
Whiteboard:
Fixed In Version: glibc-2.29-8.fc30 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-04-04 17:31:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1196181    

Description Igor Raits 2019-02-10 18:20:12 UTC
(gdb) bt full
#0  0xb6f81af0 in __pthread_timedjoin_ex () from /lib/libpthread.so.0
No symbol table info available.
#1  0x005edeac in std::sys::unix::thread::Thread::join ()
No symbol table info available.
#2  0x004af7b4 in <std::thread::JoinInner<T>>::join (self=0xbefff398) at /builddir/build/BUILD/rustc-1.32.0-src/src/libstd/thread/mod.rs:1298
No locals.
#3  <std::thread::JoinHandle<T>>::join (self=...) at /builddir/build/BUILD/rustc-1.32.0-src/src/libstd/thread/mod.rs:1431
No locals.
#4  0x00491e40 in build_script_build::codegen::main () at build.rs:40
        handle = <optimized out>
        output = <optimized out>
        input = <optimized out>
        manifest_dir = <optimized out>
#5  0x0040f790 in std::rt::lang_start::{{closure}} () at /builddir/build/BUILD/rustc-1.32.0-src/src/libstd/rt.rs:74
        main = <optimized out>
#6  0x005fc270 in std::panicking::try::do_call ()
No symbol table info available.
#7  0x006051d0 in __rust_maybe_catch_panic ()
No symbol table info available.
#8  0x005fb808 in std::panic::catch_unwind ()
No symbol table info available.
#9  0x005ee45c in std::rt::lang_start_internal ()
No symbol table info available.
#10 0x00429edc in main ()
No symbol table info available.
#11 0xb6e1587c in __libc_start_main () from /lib/libc.so.6
No symbol table info available.
#12 0x00405034 in _start ()
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

rust-1.32.0-2.fc30.armv7hl

Comment 1 Igor Raits 2019-02-10 19:20:19 UTC
(gdb) bt full
#0  __GI___pthread_timedjoin_ex (threadid=3067290560, thread_return=0x0, abstime=<optimized out>, block=<optimized out>) at pthread_join_c
   ommon.c:104
        pd = 0xb6d323c0
        self = <optimized out>
        result = <optimized out>
#1  0x0069171c in std::sys::unix::thread::Thread::join ()
No symbol table info available.
#2  0x005640f8 in <std::thread::JoinInner<T>>::join (self=0xbef25428) at /builddir/build/BUILD/rustc-1.32.0-src/src/libstd/thread/mod.rs:1298
No locals.
#3  <std::thread::JoinHandle<T>>::join (self=...) at /builddir/build/BUILD/rustc-1.32.0-src/src/libstd/thread/mod.rs:1431
No locals.
#4  0x00533cb0 in build_script_build::codegen::main () at build.rs:40
        handle = <optimized out>
        output = <optimized out>
        input = <optimized out>
        manifest_dir = <optimized out>
#5  0x005337b8 in std::rt::lang_start::{{closure}} () at /builddir/build/BUILD/rustc-1.32.0-src/src/libstd/rt.rs:74
        main = <optimized out>
#6  0x0069fadc in std::panicking::try::do_call ()
No symbol table info available.
#7  0x006a89f0 in __rust_maybe_catch_panic ()
No symbol table info available.
#8  0x0069f074 in std::panic::catch_unwind ()
No symbol table info available.
#9  0x00691cd0 in std::rt::lang_start_internal ()
No symbol table info available.
#10 0x004eaa14 in main ()
No symbol table info available.
#11 0xb6d4a87c in __libc_start_main (main=0xbef25644, argc=-1226317824, argv=0xb6d4a87c <__libc_start_main+268>, init=<optimized out>
    , fini=0x6b1d18 <__libc_csu_fini>, rtld_fini=0xb6f13390 <_dl_fini>, stack_end=0xbef25644) at libc-start.c:308
        self = <optimized out>
        result = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {-462516321, -330796193, 7019704, 0, 4874224, 0, 0, 0, 7335336, 0 <repeats 33 times>, -1091414460, -1226317824, -1091414185, -1091414452, -1226317824, 1, -10914
                14460, -1227577600, 0, -1091414568, -1226444948, 61765110, 1, 0, 4, -1225598304, 1, -1091414460, -1225848560, -1225707192, -1, -1225570024}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0
              x0, 0xb6f13298 <_dl_init+124>}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#12 0x004a6034 in _start ()

Comment 2 Josh Stone 2019-02-11 18:00:45 UTC
These backtraces just show the main thread trying to join -- waiting for another thread to complete. It's likely that the actual fault is in one of those other threads -- can you try "thread apply all bt"?

Comment 3 Josh Stone 2019-02-11 18:43:57 UTC
Discussion on IRC clarified that this is cssparser's build script, not rustc itself. And while that does start its own thread, GDB has no knowledge of it after the SIGSEGV.

FWIW, the script is using a pretty large stack for that thread:

        // We have stack overflows on Servo's CI.
        let handle = Builder::new().stack_size(128 * 1024 * 1024).spawn(move || {
            match_byte::expand(&input, &output);
        }).unwrap();

Comment 4 Igor Raits 2019-02-11 19:36:28 UTC
<mock-chroot> sh-4.4# cat > t.c << EOF
> #include <pthread.h>
> 
> void *x(void *data) {}
> 
> int main(void)
> {
>   pthread_t thread;
>   pthread_attr_t thread_attr;
>   pthread_attr_setstacksize (&thread_attr, 128 * 1024 * 1024);
>   pthread_create (&thread, &thread_attr, x, NULL);
>   pthread_join (thread, NULL);
> }
> EOF
<mock-chroot> sh-4.4# gcc t.c -lpthread
<mock-chroot> sh-4.4# ./a.out 
Segmentation fault

Comment 5 Igor Raits 2019-02-11 19:43:58 UTC
<mock-chroot> sh-4.4# cat t.c
#include <errno.h>
#include <pthread.h>
#include <stdio.h>

void *x(void *data)
{
  return NULL;
}

int
main (void)
{
  pthread_t thread;
  pthread_attr_t thread_attr;

  if (pthread_attr_init (&thread_attr))
    perror ("pthread_attr_init");
  if (pthread_attr_setstacksize (&thread_attr, 128 * 1024 * 1024))
    perror ("pthread_attr_setstacksize");
  if (pthread_create (&thread, &thread_attr, x, NULL))
    perror ("pthread_create");
  if (pthread_join (thread, NULL))
    perror ("pthread_join");

  return 0;
}
<mock-chroot> sh-4.4# gcc t.c -lpthread -g -Wall && ./a.out
Segmentation fault

Comment 6 Igor Raits 2019-02-11 19:50:06 UTC
(gdb) r
Starting program: /a.out 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
[New Thread 0xb6e54460 (LWP 6060)]
[Thread 0xb6e54460 (LWP 6060) exited]

Thread 1 "a.out" received signal SIGSEGV, Segmentation fault.
__GI___pthread_timedjoin_ex (threadid=3068478560, thread_return=0x0, abstime=<optimized out>, block=<optimized out>) at pthread_join_common.c:
   104
104       LIBC_PROBE (pthread_join_ret, 3, threadid, result, pd->result);
(gdb) t a a bt full

Thread 1 (Thread 0xb6ff8ac0 (LWP 6057)):
#0  __GI___pthread_timedjoin_ex (threadid=3068478560, thread_return=0x0, abstime=<optimized out>, block=<optimized out>) at pthread_join_common.c:104
        pd = 0xb6e54460
        self = <optimized out>
        result = <optimized out>
#1  0x00010640 in main () at t.c:22
        thread = 3068478560
        thread_attr = {__size = '\000' <repeats 13 times>, "\020\000\000\000\000\000\000\000\000\000\b", '\000' <repeats 11 times>, __align = 0}


<mock-chroot> sh-4.4# rpm -q glibc gcc binutils
glibc-2.29-6.fc30.armv7hl
gcc-9.0.1-0.4.fc30.armv7hl
binutils-2.31.1-21.fc30.armv7hl

Comment 7 Igor Raits 2019-02-11 19:52:26 UTC
The same binary (compiled inside chroot) works fine on F29 system (glibc-2.28-26.fc29.armv7hl)

Comment 8 Igor Raits 2019-02-11 20:03:31 UTC
With a pthread_attr_setstacksize (&thread_attr, 41938960) or any number above, it makes program to segfault.

Comment 9 Florian Weimer 2019-02-11 20:34:27 UTC
(In reply to Igor Gnatenko from comment #8)
> With a pthread_attr_setstacksize (&thread_attr, 41938960) or any number
> above, it makes program to segfault.

This is the cut-off point where the stack (and the thread descriptor) is freed to stay within the thread stack cache limit:

      /* Free the TCB.  */
      __free_tcb (pd);
    }
  else
    pd->joinid = NULL;

  LIBC_PROBE (pthread_join_ret, 3, threadid, result, pd->result);

With the fix for bug 1196181, we need to load pd->result into a register on Arm, so we start dereferencing pd->result.  But this probe is buggy on all architectures.  It should use result, not pd->result.

Comment 10 Florian Weimer 2019-02-11 20:53:16 UTC
glibc-2.29-7.fc30 should have a fix for this once the build completes.

Comment 11 Igor Raits 2019-02-11 20:53:53 UTC
(In reply to Florian Weimer from comment #10)
> glibc-2.29-7.fc30 should have a fix for this once the build completes.

Thank you a lot!

Comment 12 Florian Weimer 2019-02-11 21:22:56 UTC
The patch is slightly buggy (the probe value isn't correct), but it will make the crash go away.

Comment 13 Florian Weimer 2019-02-19 07:48:22 UTC
Official upstream fix applied in glibc-2.29-8.fc30.