Bug 971352

Summary: strace hangs while tracing parent+child communicating via pipe
Product: Red Hat Enterprise Linux 7 Reporter: Jan Stancek <jstancek>
Component: straceAssignee: Jeff Law <law>
Status: CLOSED CURRENTRELEASE QA Contact: Michael Petlan <mpetlan>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.0CC: jarod, jburke, kmcmartin, mpetlan, ohudlick, pfrankli, pkis
Target Milestone: beta   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: strace-4.8-3.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 992965 (view as bug list) Environment:
Last Closed: 2014-06-13 10:56:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 987469, 992965    
Attachments:
Description Flags
strace_hangs.c
none
experimental patch v1
none
experimental patch 2 v1 (on top of strace-4.8-2.el7) none

Description Jan Stancek 2013-06-06 10:49:49 UTC
Description of problem:
"gcc strace_hangs.c; strace -F -tt ./a.out" hangs while tracing 2 processes, which communicate via pipe. (source attached in comment below)

1. hangs on select which is also timed
[pid 11039] 06:17:08.751691 <... close resumed> ) = 0
[pid 11038] 06:17:08.751821 <... read resumed> "\0", 1) = 1
[pid 11039] 06:17:08.751849 exit_group(0) = ?
[pid 11038] 06:17:08.751932 select(4, [3], NULL, NULL, {4, 994872} <unfinished ...>
[pid 11039] 06:17:08.752071 +++ exited with 0 +++

2. hangs on read from pipe, which has write end closed
[pid 11046] 06:18:00.169609 close(4 <unfinished ...>
[pid 11045] 06:18:00.169684 select(4, [3], NULL, NULL, {4, 994561} <unfinished ...>
[pid 11046] 06:18:00.169737 <... close resumed> ) = 0
[pid 11045] 06:18:00.169802 <... select resumed> ) = 1 (in [3], left {4, 994560})
[pid 11046] 06:18:00.169872 exit_group(0) = ?
[pid 11045] 06:18:00.169958 read(3,  <unfinished ...>
[pid 11046] 06:18:00.170040 +++ exited with 0 +++


Version-Release number of selected component (if applicable):
strace-4.7-10.el7.x86_64
3.10.0-0.rc4.59.el7.x86_64
RHEL-7.0-20130529.n.1

How reproducible:
50%, seems to happen more with 1CPU kvm guest

Steps to Reproduce:
1. gcc strace_hangs.c
2. strace -F -tt ./a.out

Actual results:
strace hangs

Expected results:
strace can complete tracing both parent and child

Additional info:
I couldn't reproduce it with strace-4.5.19-1.6.el6 on top of RHEL-7.0-20130529.n.1.

Comment 1 Jan Stancek 2013-06-06 10:50:32 UTC
Created attachment 757606 [details]
strace_hangs.c

Comment 4 Jan Stancek 2013-06-12 13:47:37 UTC
Created attachment 760168 [details]
experimental patch v1

With this patch on top of strace-4.7-10.el7.x86_64 I can no longer trigger the hang.

Comment 7 Jan Stancek 2013-07-09 09:12:15 UTC
Created attachment 770864 [details]
experimental patch 2 v1 (on top of strace-4.8-2.el7)

One more issue, that caught my eye is that register data is updated in collect_stopped_tcbs() in every iteration of while loop and then same (last) register data is used for all tcbs processed in handle_stopped_tcbs().

Trivial program doing just fork() returned sporadically 0 from clone():
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f3068aeca10) = 0

Comment 9 Jeff Law 2013-07-12 19:48:00 UTC
Jan,

That looks like a pretty reasonable change; Denys made a change to when and how strace retrieves the register file.  It's pretty obvious that we're using stale data.

I'm on the road right now, I'll integrate your patch into rhel7 & dts shortly.

Thanks!

Comment 11 Michael Petlan 2014-04-01 14:55:44 UTC
Tested with strace-4.8-7.el7 and no hang occurred.

VERIFIED.

Comment 12 Ludek Smid 2014-06-13 10:56:57 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.