Created attachment 616132 [details] t.c , test case to exercise the issue Description of problem: It seems calling exit() anywhere in a multithreaded program can cause a race condition leading glibc to access invalid memory. Version-Release number of selected component (if applicable): glibc-2.15-56.fc17.x86_64 How reproducible: Random Steps to Reproduce: Compile the attached test case, gcc -g t.c -pthread Run it in a loop under valgrind: while [ true ]; do echo test | valgrind --error-exitcode=2 ./a.out || break done This eventually leads to valgrind complaining: ==7234== Invalid read of size 4 ==7234== at 0x34A7275FC8: _IO_file_write@@GLIBC_2.2.5 (in /usr/lib64/libc-2.15.so) ==7234== by 0x34A7275EA1: new_do_write (in /usr/lib64/libc-2.15.so) ==7234== by 0x34A7276D44: _IO_do_write@@GLIBC_2.2.5 (in /usr/lib64/libc-2.15.so) ==7234== by 0x34A7278DB6: _IO_flush_all_lockp (in /usr/lib64/libc-2.15.so) ==7234== by 0x34A7278F07: _IO_cleanup (in /usr/lib64/libc-2.15.so) ==7234== by 0x34A7238BBF: __run_exit_handlers (in /usr/lib64/libc-2.15.so) ==7234== by 0x34A7238BF4: exit (in /usr/lib64/libc-2.15.so) ==7234== by 0x34A722173B: (below main) (in /usr/lib64/libc-2.15.so) ==7234== Address 0x542f2e0 is 0 bytes inside a block of size 568 free'd ==7234== at 0x4A079AE: free (vg_replace_malloc.c:427) ==7234== by 0x34A726B11C: fclose@@GLIBC_2.2.5 (in /usr/lib64/libc-2.15.so) ==7234== by 0x40087C: writer (t.c:22) ==7234== by 0x34A7607D13: start_thread (in /usr/lib64/libpthread-2.15.so) ==7234== by 0x34A72F167C: clone (in /usr/lib64/libc-2.15.so) Which indicates that: * main() returns, and starts running the exit handler to close all FILE* * the writer() thread, still running, wakes up, closes a FILE* * the exit handler tries to access the closed FILE*, which is now invalid/free()'d
That's basically what's happening. For reasons unknown, Uli changed the exit handlers to bypass all stdio related locking back in 2004 or so. I would guess this is to avoid exit hanging if another thread was holding the file's lock. What I don't understand from the code is why we're avoiding the list_all_lock which protects the list of open streams. That lock is totally internal and as far as I can tell is always properly released. Changing the exit code to honor the list_all_lock (but continue to not lock the underlying streams) fixes this race. But it's something that will need to be discussed upstream.
(In reply to comment #1) > That's basically what's happening. For reasons unknown, Uli changed the > exit handlers to bypass all stdio related locking back in 2004 or so. > > I would guess this is to avoid exit hanging if another thread was holding > the file's lock. Yeah, some other thread might be holding the lock across waiting for something that is never going to happen, such as reading from pipe using stdio when nothing ever writes to that pipe. Then exit would never succeed.
Actually, we can block on list_all_lock as well. Consider 3 threads. One blocks doing IO on an event that will never happen, this grabs the stream's lock and blocks. Thread #2 tries to fclose that same stream, which will grab the list_all_lock, then block on the stream's lock. Then thread #3 calls exit, if we try to honor the list_all_lock, then exit will block as well.
Moving up upstream bugzilla (#14622). If/when it's fixed upstream we'll pick up the fix via our usual merging procedures.