Bug 240349
Summary: | wait4()'s rusage sometimes wrong when threads present across exec*() | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Matt Evans <matt> | ||||||
Component: | kernel | Assignee: | Red Hat Kernel Manager <kernel-mgr> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 4.5 | CC: | bugproxy, dave, jbaron, neilc | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | RHBA-2007-0791 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2007-11-15 16:27:15 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Matt Evans
2007-05-16 16:57:08 UTC
Created attachment 154850 [details]
C source for test case
I've easily reproduced this on my x86_64 test machine. (Thanks for the test program, although the wait() syscall at the end should have an additional arg.) By experimenting with the reproducer, I've further discovered that the bogus cpu usage times are evident in the data returned by both wait4() and getrusage() syscalls, which suggests that it's the kernel's propagation of "rusage" data that is faulty. This data is calculated from fields in the task's "signal_struct", so another possibility is that there's a problem in the handling of this thread-shared structure during exec() of a multi-threaded process. Also, if the reproducer is changed to allow the sub-thread to complete before the exec() is initiated, the problem disappears. This is another indicator of a possible "signal_struct" handling problem during an exec() syscall. I'm continuing to investigate. The problem is in de_thread(). When the "signal_struct" is still being used (by another thread) during an exec(), a new one is allocated. Many of the fields in the new one are not initialized, including all of the resource counter stats. Thus, whatever garbage is in these memory locations of the newly allocated "signal_struct" ends up being added into the getrusage() data. I'll implement and test a patch, which I think only needs to alter de_thread(). Created attachment 155849 [details]
patch for fixing de_thread() to properly init signal_struct fields
The attached patch should fix this problem. Here are my test results:
# time fork_time_bug 100
0.022u 0.079s 0:00.10 90.0% 0+0k 0+0io 0pf+0w
# time fork_time_bug 100
0.026u 0.075s 0:00.10 90.0% 0+0k 0+0io 0pf+0w
# time fork_time_bug 100
0.026u 0.074s 0:00.10 90.0% 0+0k 0+0io 0pf+0w
# time fork_time_bug 100
0.030u 0.067s 0:00.10 90.0% 0+0k 0+0io 0pf+0w
# time fork_time_bug 100
0.024u 0.070s 0:00.10 90.0% 0+0k 0+0io 0pf+0w
#
The patch above was posted for internal review on 31-May-2007. This request was evaluated by Red Hat Kernel Team for inclusion in a Red Hat Enterprise Linux maintenance release, and has moved to bugzilla status POST. committed in stream U6 build 55.14. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/ ------- Comment From suzukikp.com 2007-09-05 10:52 EDT------- I have verified the kernel-2.6.9-56 to contain the patch attached here as "linux-2.6.9-signal-coredump.patch". Antonio, Can we close this ? Thanks Suzuki ------- Comment From rosalesa.com 2007-09-06 13:01 EDT------- (In reply to comment #16) > I have verified the kernel-2.6.9-56 to contain the patch attached here as > "linux-2.6.9-signal-coredump.patch". > > Antonio, > > Can we close this ? > > Thanks > > Suzuki As this patch was previously tested to resolve this issue, and the patch has been confirmed to be in the kernel-2.6.9-56 source I am marking this bug as verified, and closing. -thanks. ------- Comment From bugzilla 2007-10-03 22:35 EST------- User petrides's account has been disabled, requested by HC An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0791.html |