Bug 859726 - _IO_cleanup exit handlers not thread safe
Summary: _IO_cleanup exit handlers not thread safe
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: 17
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jeff Law
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-09-23 15:01 UTC by Nils Olav Selåsdal
Modified: 2016-11-24 15:55 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2012-09-25 16:27:19 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
t.c , test case to exercise the issue (578 bytes, text/x-csrc)
2012-09-23 15:01 UTC, Nils Olav Selåsdal
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Sourceware 14622 0 None None None Never

Description Nils Olav Selåsdal 2012-09-23 15:01:00 UTC
Created attachment 616132 [details]
t.c , test case to exercise the issue

Description of problem:

It seems calling exit() anywhere in a multithreaded program can cause a race condition leading glibc to access invalid memory.

Version-Release number of selected component (if applicable):

glibc-2.15-56.fc17.x86_64

How reproducible:
Random

Steps to Reproduce:

Compile the attached test case, gcc -g t.c -pthread

Run it in a loop under valgrind:

while [ true ]; do 
  echo test | valgrind --error-exitcode=2 ./a.out  || break  
done  


This eventually leads to valgrind complaining:

==7234== Invalid read of size 4
==7234==    at 0x34A7275FC8: _IO_file_write@@GLIBC_2.2.5 (in /usr/lib64/libc-2.15.so)
==7234==    by 0x34A7275EA1: new_do_write (in /usr/lib64/libc-2.15.so)
==7234==    by 0x34A7276D44: _IO_do_write@@GLIBC_2.2.5 (in /usr/lib64/libc-2.15.so)
==7234==    by 0x34A7278DB6: _IO_flush_all_lockp (in /usr/lib64/libc-2.15.so)
==7234==    by 0x34A7278F07: _IO_cleanup (in /usr/lib64/libc-2.15.so)
==7234==    by 0x34A7238BBF: __run_exit_handlers (in /usr/lib64/libc-2.15.so)
==7234==    by 0x34A7238BF4: exit (in /usr/lib64/libc-2.15.so)
==7234==    by 0x34A722173B: (below main) (in /usr/lib64/libc-2.15.so)
==7234==  Address 0x542f2e0 is 0 bytes inside a block of size 568 free'd
==7234==    at 0x4A079AE: free (vg_replace_malloc.c:427)
==7234==    by 0x34A726B11C: fclose@@GLIBC_2.2.5 (in /usr/lib64/libc-2.15.so)
==7234==    by 0x40087C: writer (t.c:22)
==7234==    by 0x34A7607D13: start_thread (in /usr/lib64/libpthread-2.15.so)
==7234==    by 0x34A72F167C: clone (in /usr/lib64/libc-2.15.so)


Which indicates that:


* main() returns, and starts running the exit handler to close all FILE*
* the writer() thread, still running, wakes up, closes a FILE*
* the exit handler tries to access the closed FILE*, which is now invalid/free()'d

Comment 1 Jeff Law 2012-09-25 13:21:41 UTC
That's basically what's happening.  For reasons unknown, Uli changed the exit handlers to bypass all stdio related locking back in 2004 or so. 

I would guess this is to avoid exit hanging if another thread was holding the file's lock.

What I don't understand from the code is why we're avoiding the list_all_lock which protects the list of open streams.  That lock is totally internal and as far as I can tell is always properly released.

Changing the exit code to honor the list_all_lock (but continue to not lock the underlying streams) fixes this race.  But it's something that will need to be discussed upstream.

Comment 2 Jakub Jelinek 2012-09-25 13:27:12 UTC
(In reply to comment #1)
> That's basically what's happening.  For reasons unknown, Uli changed the
> exit handlers to bypass all stdio related locking back in 2004 or so. 
> 
> I would guess this is to avoid exit hanging if another thread was holding
> the file's lock.

Yeah, some other thread might be holding the lock across waiting for something that is never going to happen, such as reading from pipe using stdio when nothing ever writes to that pipe.  Then exit would never succeed.

Comment 3 Jeff Law 2012-09-25 15:36:45 UTC
Actually, we can block on list_all_lock as well.  Consider 3 threads.  One blocks doing IO on an event that will never happen, this grabs the stream's lock and blocks.  Thread #2 tries to fclose that same stream, which will grab the list_all_lock, then block on the stream's lock.  Then thread #3 calls exit, if we try to honor the list_all_lock, then exit will block as well.

Comment 4 Jeff Law 2012-09-25 16:27:19 UTC
Moving up upstream bugzilla (#14622).  If/when it's fixed upstream we'll pick up the fix via our usual merging procedures.


Note You need to log in before you can comment on or make changes to this bug.