142551 – getrusage(RUSAGE_SELF) doesn't count threads

Bug 142551 - getrusage(RUSAGE_SELF) doesn't count threads

Summary: getrusage(RUSAGE_SELF) doesn't count threads

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Ernie Petrides
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-12-10 16:17 UTC by Johan Walles
Modified:	2007-11-30 22:07 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2005-05-17 03:18:59 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Repro case (1.41 KB, text/plain) 2004-12-10 16:18 UTC, Johan Walles	no flags	Details
View All

Description Johan Walles 2004-12-10 16:17:20 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; sv-SE; rv:1.6) Gecko/20040113

Description of problem:
The man page for getrusage() says that:

"
getrusage  returns  the  current  resource  usages, for a who of
either RUSAGE_SELF or RUSAGE_CHILDREN.  The former asks for resources
used  by the  current  process
"

I have a program (that I'll soon attach) that:
1. Starts a spinning thread.
2. Waits five secs.
3. Prints (in the main thread) how much CPU time getrusage() thinks
the process has consumed.



Version-Release number of selected component (if applicable):
glibc-2.3.2-95.27

How reproducible:
Always

Steps to Reproduce:
1. Run the repro case.


Actual Results:  main(): Starting spinner thread...
main(): CPU time 0: 0ms
main(): CPU time 1: 0ms
main(): getrusare(RUSAGE_SELF) says I used 0ms of the last 5000ms
while spinning


Expected Results:  main(): Starting spinner thread...
main(): CPU time 0: 0ms
main(): CPU time 1: 5000ms
main(): getrusare(RUSAGE_SELF) says I used 5000ms of the last 5000ms
while spinning


Additional info:
Please nag me if I forget to attach the repro case.

Comment 1 Johan Walles 2004-12-10 16:18:09 UTC

Created attachment 108325 [details]
Repro case

Comment 2 Johan Walles 2004-12-10 16:20:59 UTC

It seems as if I forgot to state the actual problem.  Duh...

The problem is that getrusage(RUSAGE_SELF) returns values appropriate
for the current thread only, not for the whole process as the man page
says it should.

Comment 3 Jakub Jelinek 2004-12-10 20:10:55 UTC

Without kernel help this is really hard to do.

Comment 4 Roland McGrath 2004-12-10 22:13:09 UTC

This has already been changed in the upstream kernel as of 2.6.9, and that
change will be in RHEL4.  The behavior of getrusage with regard to NPTL threads
is a known limitation in RHEL3 and we do not anticipate changing the
well-understood semantics of RHEL3 system calls in a bug-fix update.  This and
several other kernel issues regarding POSIX semantics of system calls in the
presence of multiple threads under NPTL are being addressed in RHEL4.

Comment 5 Johan Walles 2004-12-10 22:23:27 UTC

Do you know of any good way to either probe for the current semantics or to
request certain semantics of the getrusage() syscall?

Or is what I'm doing in the repro case the best way to probe?

Comment 6 Roland McGrath 2004-12-10 22:35:16 UTC

To my knowledge, no kernel that doesn't report a version number of 2.6.9 or
higher has the new semantics.  So you could just do a version test, though that
is always in principle less reliable than an empirical feature test.

Since the fixed getrusage also reports threads that have died, there is a
different approach to the test you could take that would not be subject to false
results in unusual situations of scheduling and the like, i.e. not timing
dependent.  That is, create a thread that chews a little and samples itself with
getrsuage to make sure some progress has happened, then dies.  Then create a
second thread that immediately calls getrusage.  In an old kernel, the new
thread will see almost no time on its counters, less than the total seen by the
first thread; in a new kernel, it will always see a total at least as high as
the sample the first thread took.

Comment 11 Rik van Riel 2005-05-17 02:32:54 UTC

This change is guaranteed to break applications.

I say "guaranteed" because the semantics change from RHEL3 to RHEL4 did break
some of the benchmarks Shak has run, so clearly there is code out there that
relies on the current behaviour in RHEL3.

Strictly speaking, RHEL3 behaviour may be wrong, but we know for a fact that
code exists that relies on it.

Comment 12 Ernie Petrides 2005-05-17 03:18:59 UTC

The consensus seems that this should be closed as WONTFIX.

The correct behavior is already in RHEL4.

Note You need to log in before you can comment on or make changes to this bug.