Bug 971352 - strace hangs while tracing parent+child communicating via pipe
strace hangs while tracing parent+child communicating via pipe
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: strace (Show other bugs)
7.0
Unspecified Unspecified
unspecified Severity unspecified
: beta
: ---
Assigned To: Jeff Law
Michael Petlan
:
Depends On:
Blocks: 987469 992965
  Show dependency treegraph
 
Reported: 2013-06-06 06:49 EDT by Jan Stancek
Modified: 2015-07-09 21:12 EDT (History)
7 users (show)

See Also:
Fixed In Version: strace-4.8-3.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 992965 (view as bug list)
Environment:
Last Closed: 2014-06-13 06:56:57 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
strace_hangs.c (1002 bytes, text/plain)
2013-06-06 06:50 EDT, Jan Stancek
no flags Details
experimental patch v1 (563 bytes, text/plain)
2013-06-12 09:47 EDT, Jan Stancek
no flags Details
experimental patch 2 v1 (on top of strace-4.8-2.el7) (440 bytes, text/plain)
2013-07-09 05:12 EDT, Jan Stancek
no flags Details

  None (edit)
Description Jan Stancek 2013-06-06 06:49:49 EDT
Description of problem:
"gcc strace_hangs.c; strace -F -tt ./a.out" hangs while tracing 2 processes, which communicate via pipe. (source attached in comment below)

1. hangs on select which is also timed
[pid 11039] 06:17:08.751691 <... close resumed> ) = 0
[pid 11038] 06:17:08.751821 <... read resumed> "\0", 1) = 1
[pid 11039] 06:17:08.751849 exit_group(0) = ?
[pid 11038] 06:17:08.751932 select(4, [3], NULL, NULL, {4, 994872} <unfinished ...>
[pid 11039] 06:17:08.752071 +++ exited with 0 +++

2. hangs on read from pipe, which has write end closed
[pid 11046] 06:18:00.169609 close(4 <unfinished ...>
[pid 11045] 06:18:00.169684 select(4, [3], NULL, NULL, {4, 994561} <unfinished ...>
[pid 11046] 06:18:00.169737 <... close resumed> ) = 0
[pid 11045] 06:18:00.169802 <... select resumed> ) = 1 (in [3], left {4, 994560})
[pid 11046] 06:18:00.169872 exit_group(0) = ?
[pid 11045] 06:18:00.169958 read(3,  <unfinished ...>
[pid 11046] 06:18:00.170040 +++ exited with 0 +++


Version-Release number of selected component (if applicable):
strace-4.7-10.el7.x86_64
3.10.0-0.rc4.59.el7.x86_64
RHEL-7.0-20130529.n.1

How reproducible:
50%, seems to happen more with 1CPU kvm guest

Steps to Reproduce:
1. gcc strace_hangs.c
2. strace -F -tt ./a.out

Actual results:
strace hangs

Expected results:
strace can complete tracing both parent and child

Additional info:
I couldn't reproduce it with strace-4.5.19-1.6.el6 on top of RHEL-7.0-20130529.n.1.
Comment 1 Jan Stancek 2013-06-06 06:50:32 EDT
Created attachment 757606 [details]
strace_hangs.c
Comment 4 Jan Stancek 2013-06-12 09:47:37 EDT
Created attachment 760168 [details]
experimental patch v1

With this patch on top of strace-4.7-10.el7.x86_64 I can no longer trigger the hang.
Comment 7 Jan Stancek 2013-07-09 05:12:15 EDT
Created attachment 770864 [details]
experimental patch 2 v1 (on top of strace-4.8-2.el7)

One more issue, that caught my eye is that register data is updated in collect_stopped_tcbs() in every iteration of while loop and then same (last) register data is used for all tcbs processed in handle_stopped_tcbs().

Trivial program doing just fork() returned sporadically 0 from clone():
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f3068aeca10) = 0
Comment 9 Jeff Law 2013-07-12 15:48:00 EDT
Jan,

That looks like a pretty reasonable change; Denys made a change to when and how strace retrieves the register file.  It's pretty obvious that we're using stale data.

I'm on the road right now, I'll integrate your patch into rhel7 & dts shortly.

Thanks!
Comment 11 Michael Petlan 2014-04-01 10:55:44 EDT
Tested with strace-4.8-7.el7 and no hang occurred.

VERIFIED.
Comment 12 Ludek Smid 2014-06-13 06:56:57 EDT
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Note You need to log in before you can comment on or make changes to this bug.