Bug 859726

Summary: _IO_cleanup exit handlers not thread safe
Product: [Fedora] Fedora Reporter: Nils Olav Selåsdal <nos>
Component: glibcAssignee: Jeff Law <law>
Status: CLOSED UPSTREAM QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 17CC: fweimer, jakub, law, pfrankli, schwab, spoyarek
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-09-25 16:27:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
t.c , test case to exercise the issue none

Description Nils Olav Selåsdal 2012-09-23 15:01:00 UTC
Created attachment 616132 [details]
t.c , test case to exercise the issue

Description of problem:

It seems calling exit() anywhere in a multithreaded program can cause a race condition leading glibc to access invalid memory.

Version-Release number of selected component (if applicable):

glibc-2.15-56.fc17.x86_64

How reproducible:
Random

Steps to Reproduce:

Compile the attached test case, gcc -g t.c -pthread

Run it in a loop under valgrind:

while [ true ]; do 
  echo test | valgrind --error-exitcode=2 ./a.out  || break  
done  


This eventually leads to valgrind complaining:

==7234== Invalid read of size 4
==7234==    at 0x34A7275FC8: _IO_file_write@@GLIBC_2.2.5 (in /usr/lib64/libc-2.15.so)
==7234==    by 0x34A7275EA1: new_do_write (in /usr/lib64/libc-2.15.so)
==7234==    by 0x34A7276D44: _IO_do_write@@GLIBC_2.2.5 (in /usr/lib64/libc-2.15.so)
==7234==    by 0x34A7278DB6: _IO_flush_all_lockp (in /usr/lib64/libc-2.15.so)
==7234==    by 0x34A7278F07: _IO_cleanup (in /usr/lib64/libc-2.15.so)
==7234==    by 0x34A7238BBF: __run_exit_handlers (in /usr/lib64/libc-2.15.so)
==7234==    by 0x34A7238BF4: exit (in /usr/lib64/libc-2.15.so)
==7234==    by 0x34A722173B: (below main) (in /usr/lib64/libc-2.15.so)
==7234==  Address 0x542f2e0 is 0 bytes inside a block of size 568 free'd
==7234==    at 0x4A079AE: free (vg_replace_malloc.c:427)
==7234==    by 0x34A726B11C: fclose@@GLIBC_2.2.5 (in /usr/lib64/libc-2.15.so)
==7234==    by 0x40087C: writer (t.c:22)
==7234==    by 0x34A7607D13: start_thread (in /usr/lib64/libpthread-2.15.so)
==7234==    by 0x34A72F167C: clone (in /usr/lib64/libc-2.15.so)


Which indicates that:


* main() returns, and starts running the exit handler to close all FILE*
* the writer() thread, still running, wakes up, closes a FILE*
* the exit handler tries to access the closed FILE*, which is now invalid/free()'d

Comment 1 Jeff Law 2012-09-25 13:21:41 UTC
That's basically what's happening.  For reasons unknown, Uli changed the exit handlers to bypass all stdio related locking back in 2004 or so. 

I would guess this is to avoid exit hanging if another thread was holding the file's lock.

What I don't understand from the code is why we're avoiding the list_all_lock which protects the list of open streams.  That lock is totally internal and as far as I can tell is always properly released.

Changing the exit code to honor the list_all_lock (but continue to not lock the underlying streams) fixes this race.  But it's something that will need to be discussed upstream.

Comment 2 Jakub Jelinek 2012-09-25 13:27:12 UTC
(In reply to comment #1)
> That's basically what's happening.  For reasons unknown, Uli changed the
> exit handlers to bypass all stdio related locking back in 2004 or so. 
> 
> I would guess this is to avoid exit hanging if another thread was holding
> the file's lock.

Yeah, some other thread might be holding the lock across waiting for something that is never going to happen, such as reading from pipe using stdio when nothing ever writes to that pipe.  Then exit would never succeed.

Comment 3 Jeff Law 2012-09-25 15:36:45 UTC
Actually, we can block on list_all_lock as well.  Consider 3 threads.  One blocks doing IO on an event that will never happen, this grabs the stream's lock and blocks.  Thread #2 tries to fclose that same stream, which will grab the list_all_lock, then block on the stream's lock.  Then thread #3 calls exit, if we try to honor the list_all_lock, then exit will block as well.

Comment 4 Jeff Law 2012-09-25 16:27:19 UTC
Moving up upstream bugzilla (#14622).  If/when it's fixed upstream we'll pick up the fix via our usual merging procedures.