Bug 151502

Summary:	pthread_cancel crashes (segv) on invalid thread ID
Product:	Red Hat Enterprise Linux 3	Reporter:	Patrick Quaid <pquaid>
Component:	glibc	Assignee:	Jakub Jelinek <jakub>
Status:	CLOSED NOTABUG	QA Contact:	Brian Brock <bbrock>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.0	CC:	fweimer, jizhq76, M8R-7fin56, stsp2
Target Milestone:	---
Target Release:	---
Hardware:	i386
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2005-03-20 15:14:45 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Patrick Quaid 2005-03-18 18:38:31 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0

Description of problem:
pthread_cancel crashes (ie gets a SEGV) if you pass in an invalid thread ID.  It ought to return ESRCH.

This program demonstrates the problem:
-------------------------------------
#include <pthread.h>

pthread_t pt;

int main(int argc, char * argv[]) {
    pthread_cancel(pt);
    return 0;
}
-------------------------------------

pthread_t is, in this context, an integer, and 'pt' is initialized to zero.  But pthread_cancel immediately calls INVALID_TD_P(pt), which in turn (if I read the debugging options correctly) is defined as "__builtin_expect ((pd)->tid <= 0, 0)".

In other words, pthread_cancel always dereferences the thread ID passed in, which causes lots of invalid values to crash the program.  Passing in an invalid thread ID is certainly a bug, but it has a well-defined behavior and it shouldn't cause a crash.

This happens in NPTL in EL 3.0 and in Fedora 1 and 3.  It happens on i386 and x64-64.  It did not happen in AS 2.1, Red Hat 7.2 or other distributions using LinuxThreads.


Version-Release number of selected component (if applicable):
glibc-2.3.2-95.30

How reproducible:
Always

Steps to Reproduce:
1. Compile the program above, using something like 'gcc example.c -o example -lpthread'
2. Run it.

  

Actual Results:  Segmentation violation

Expected Results:  pthread_cancel() should have returned ESRCH.

Additional info:

Comment 1 Jakub Jelinek 2005-03-20 15:14:45 UTC

See POSIX:
http://www.opengroup.org/onlinepubs/009695399/functions/pthread_cancel.html
ERRORS

The pthread_cancel() function may fail if:
[ESRCH]
No thread could be found corresponding to that specified by the given thread ID.

The important word here is `may'.  In NPTL where thread ID is a pointer to
thread's structure rather than a small index into an array of threads,
making sure ESRCH is returned for all invalid values would mean quite a big
overhead just to make broken programs happy.  As the standard doesn't require
this, NPTL just doesn't do that checking.
It is similar to passing invalid FILE * argument to stdio functions, there the
program will crash rather than give you an error.

Comment 2 Patrick Quaid 2005-03-21 21:53:46 UTC

I don't seem to recall any descriptions in POSIX that explicitly describe a
segfault as an allowable implementation, so I guess your argument is that a 
segfault is implicitly acceptable behavior for *any* POSIX function in place of 
*any* of the defined errors.

OK, that's put a little strongly.  It's certainly a quality of implementation 
issue, at the very least.  AIX and Solaris, not to mention all earlier versions 
of Red Hat, seem to run the test program without surprises.

The difference between this case and the stdio case is that pthread_t is 
entirely opaque.  FILE is opaque too, but (crucially) FILE* is not.  It's 
easy to initialize a FILE* to NULL and know whether or not it's valid; I know
of no analogous way to use pthread_t portably.

Passing garbage to pthread_cancel() is not exactly central to our design, but we
ran across this innocently enough, in a case somewhat like the example program 
in which we had not yet initialized a thread before shutting down due to 
unrelated errors.

In the current implementation, passing in 0 or other values outside the address 
space cause a segfault.  But even passing in a "once valid" pthread_t can cause 
a segfault if the thread is long dead and joined: the original thread structure 
is allocated through malloc(), which calls mmap(), and then some time after the 
thread is dead, the memory is returned to the system with munmap().  Is that 
still possible?  Is it still a good implementation?

And if you start and kill off a thread, and then later that memory is reused for
some other purpose, it certainly seems to me that you'll write to that memory 
area as long as it still passes that minimal "invalid" test noted above.  Is it 
not valid to call pthread_cancel on a thread that's no longer running?  How 
would you suggest that programs avoid that race?  Is it still a good 
implementation?

If a thread ends and a new thread is created, might the new thread reuse the 
same thread structure?  In that case, I guess you would inadvertently cancel the
wrong thread, again in the case in which the cancel raced with the exit.

I agree that from a standard lawyer's point of view, the implementation may be 
compliant.  But the implementation is still lousy.

Comment 3 chaingun 2008-10-17 02:25:33 UTC

According the description of the man-page of pthread_kill():
#http://www.opengroup.org/onlinepubs/009695399/functions/pthread_kill.html
-------------------------------------------------------------------------------
    The pthread_kill() function shall fail if:

    [ESRCH]
        No thread could be found corresponding to that specified by the given thread ID.
    [EINVAL]
        The value of the sig argument is an invalid or unsupported signal number.

    The pthread_kill() function shall not return an error code of [EINTR].
-------------------------------------------------------------------------------

if the thread-id passed to pthread_kill() is invalid, pthread_kill() will crash as same as pthread_cancel() [segment fault].

Is there anyone can give some advise?
thx a lot.

Comment 4 Anonymous account 2010-03-12 00:18:00 UTC

Frankly, this interpretation by Jakub Jelinek is absolutely ludicrous. It's an excuse for lazy programming, founded on a (convenient?) inability to read the specification correctly!

Jakub, which part of this statement from the specification don't you understand:

"Upon successful completion, the function shall return a value of zero. Otherwise, the function shall return an error number."

This leaves NO room for your interpretation, Jakub! The function is given two choices: it SHALL RETURN zero, or it SHALL return an error number. Unfortunately for your twisted logic, Jakub, crashing means the function doesn't return anything (because it never returns)! And (coming before the section you quote, which was "The pthread_cancel() function may fail if"), this previous statement makes clear the meaning of "may fail if". It is limiting the cases that may fail, and the error codes that may be returned, not giving you license to do whatever you feel like.

If you're intellectually honest, Jakub, I suggest you re-open this bug. It is a very reasonable case that code wants to cancel a thread if it hasn't already exited, and your current interpretation leaves no safe way to accomplish this.

Comment 5 Stas Sergeev 2015-03-13 13:38:10 UTC

There is a proposal pending to resolve this:
http://austingroupbugs.net/view.php?id=599

For PTHREAD_NULL these function will (likely)
return ESRCH just as you desire.