Bug 1368074 - [devexp_public_987]Hang on "extracting tar stream" when do incremental build using s2i tool
Summary: [devexp_public_987]Hang on "extracting tar stream" when do incremental build ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OKD
Classification: Red Hat
Component: Build
Version: 3.x
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Gabe Montero
QA Contact: Wang Haoran
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-18 09:58 UTC by Dongbo Yan
Modified: 2016-09-19 13:53 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-09-19 13:53:25 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Dongbo Yan 2016-08-18 09:58:55 UTC
Description of problem:
When do incremental build using s2i tool, build process will hang on "extracting tar stream".

Version-Release number of selected component (if applicable):
s2i v1.1.1-30-g2da09db

How reproducible:
Always

Steps to Reproduce:
1.Install s2i tool
 $git clone https://github.com/openshift/source-to-image.git
 $cd source-to-image
 $export PATH=$PATH:${path_to}/source-to-image/_output/go/bin
 $hack/build-go.sh
2.Fork repo https://github.com/openshift/nodejs-ex to your own github
3.Do clean build
 $ s2i build https://github.com/dongboyan77/nodejs-ex-origin registry.access.redhat.com/openshift3/nodejs-010-rhel7 test-app --loglevel=5
4.Make some change in source repo, do incremental build
 $ s2i build https://github.com/dongboyan77/nodejs-ex-origin registry.access.redhat.com/openshift3/nodejs-010-rhel7 test-app --loglevel=5 --incremental=true

Actual results:
Build process will hang
I0818 03:35:39.519147   63642 tar.go:344] Creating directory /tmp/s2i286877388/upload/artifacts/.npm/vary
I0818 03:35:39.519218   63642 tar.go:344] Creating directory /tmp/s2i286877388/upload/artifacts/.npm/vary/1.1.0
I0818 03:35:39.519280   63642 tar.go:344] Creating directory /tmp/s2i286877388/upload/artifacts/.npm/vary/1.1.0/package
I0818 03:35:39.519338   63642 tar.go:353] Creating directory /tmp/s2i286877388/upload/artifacts/.npm/vary/1.1.0/package
I0818 03:35:39.519353   63642 tar.go:404] Creating /tmp/s2i286877388/upload/artifacts/.npm/vary/1.1.0/package/package.json
I0818 03:35:39.519393   63642 tar.go:414] Extracting/writing /tmp/s2i286877388/upload/artifacts/.npm/vary/1.1.0/package/package.json
I0818 03:35:39.519451   63642 tar.go:353] Creating directory /tmp/s2i286877388/upload/artifacts/.npm/vary/1.1.0
I0818 03:35:39.519484   63642 tar.go:404] Creating /tmp/s2i286877388/upload/artifacts/.npm/vary/1.1.0/package.tgz
I0818 03:35:39.519528   63642 tar.go:414] Extracting/writing /tmp/s2i286877388/upload/artifacts/.npm/vary/1.1.0/package.tgz
I0818 03:35:39.519583   63642 tar.go:383] Done extracting tar stream

Expected results:
Build successfully

Additional info:

Comment 1 Gabe Montero 2016-08-18 14:00:08 UTC
@DongboYan I was not able to reproduce.  After producing test-app off of my fork (https://github.com/gabemontero/nodejs-ex), then making change https://github.com/gabemontero/nodejs-ex/commit/ef37fb65d91a266166ddea54eb18333d0c4f9b5a, and then running 

 s2i build https://github.com/gabemontero/nodejs-ex registry.access.redhat.com/openshift3/nodejs-010-rhel7  test-app --loglevel=5 --incremental=true

it completes successfully for me.

Assuming you can still reproduce, let's start with you providing me the full set of --loglevel=5 logs, and I'll see what I can determine.  If need be, I'll create debug versions of s2i for you.

Also, as we progress, if the problem persists, I'll be keeping an eye as to whether it is related to the change in underlying docker client.  If it turns our this is another example of an existing bug, I'll let you know and we can go ahead and accept the trello card.

thanks

Comment 2 Gabe Montero 2016-08-18 14:13:23 UTC
I also tried 

s2i build https://github.com/dongboyan77/nodejs-ex-origin registry.access.redhat.com/openshift3/nodejs-010-rhel7 test-app --loglevel=5 --incremental=true

and it ran successfully for me.

Comment 3 Gabe Montero 2016-08-19 14:00:21 UTC
OK, I found the ec2 instance @DongboYan mentioned in the trello card, and logged into it via 

ssh ec2-user.amazonaws.com

I built test-app with my fork of nodejs-ex

I then committed / pushed a change (to the README file)

I then re-ran with --incremental=true, no hang.

@DongboYan - do you see any glaring differences in the steps executed?

We can continue to try and get me an env to repro on my on, but in the interim, please repro yourself and send me the entire --loglevel=5 trace.  I'll see what can be deciphered from that while also trying to repro myself.

Comment 4 Dongbo Yan 2016-08-19 16:20:22 UTC
Hi, this is buildlog  http://pastebin.test.redhat.com/404389

Comment 5 Gabe Montero 2016-08-19 20:02:35 UTC
OK ... analyzing log output, and trying more iterations, I can now reproduce using https://github.com/dongboyan77/nodejs-ex-origin as the source.  I had made attempts against that repo before.  Not sure yet why the hang is happening now, but based of further analysis below, I fear it is simply a timing issue.

Differences I'm seeing so far:

1) @DongboYan 's fork had a .sti dir with custom environment, assemble, and save-artifacts files.  I've replicated those in my fork.  Still no repro.

The contents of the save-artifacts are:

#!/bin/sh
pushd ${HOME} >/dev/null

#if [ -d .npm ];then
  #tar zcf /tmp/npm.tgz  .npm
  tar cf - .npm
#fi

popd >/dev/null

2) in comparing the runs from @DongboYan and mine against my repo, I now notice a difference starting here in my run:

I0819 15:21:00.190345   14570 sti.go:437] /usr/bin/env: /tmp/scripts/save-artifacts: No such file or directory

So my save-artifacts fails and moves onto assemble, where as @DongboYan's does not, the tarring starts, but then either the tar in his save-artifacts script never exits, or something is amiss in that container.  The log says "done extracting tar stream", and doing a docker ps -a in another window, it has exited.

3) However, I also took a thread dump.  While the WaitContainer had exitted, the attach container / hijacked container code borrowed from k8s had not.  With the mutex protection now around the data race detection from the unit tests that was added during merge, the s2i run container never exits.  

This is why @DongboYan saw the error now but had not when he ran against the s2i binary I provided him.


The relevant snippet from the thread dump:

goroutine 63 [chan receive, 11 minutes]:
github.com/openshift/source-to-image/pkg/docker.(*stiDocker).holdHijackedConnection(0xc820336790, 0x1704800, 0x0, 0x0, 0x7f5e26b84000, 0xc8202b2118, 0x7f5e26b84000, 0xc8202b2128, 0x7f5e26c1b8d8, 0xc820030108, ...)
	/home/gmontero/go/src/github.com/openshift/source-to-image/_output/local/go/src/github.com/openshift/source-to-image/pkg/docker/docker.go:858 +0x24c
github.com/openshift/source-to-image/pkg/docker.(*stiDocker).RunContainer.func4.1(0xc820336790, 0xc8202b2138, 0xc8201f0340, 0xc820445380, 0xc8204cc800, 0x40, 0xc8203266d0, 0xc8203266c0)
	/home/gmontero/go/src/github.com/openshift/source-to-image/_output/local/go/src/github.com/openshift/source-to-image/pkg/docker/docker.go:978 +0x672
created by github.com/openshift/source-to-image/pkg/docker.(*stiDocker).RunContainer.func4
	/home/gmontero/go/src/github.com/openshift/source-to-image/_output/local/go/src/github.com/openshift/source-to-image/pkg/docker/docker.go:979 +0x2dd


The snippet from the code:

	select {
	case err := <-receiveStdout:
		return err
	case <-stdinDone:
		if outputStream != nil || errorStream != nil {
			return <-receiveStdout
		}
	}
	return nil

l. 858 is the `reutrn <- receiveStdout` call

I would interpret that as it is trying to return, but wants to suck out the last little bit from the receiveStdout channel.  But if there is nothing left there, it blocks.

Want to confer with the devexp team, help confirm my interpretation, and I'll start experimenting with some tweaks, but this return that can potentially block in hindsight seems problematic.

Comment 6 Gabe Montero 2016-08-19 20:37:59 UTC
I think I want to convert `return <- receiveStdout` into a select with a default to get ... have conferred with Ben and he agrees.

Comment 7 Gabe Montero 2016-08-19 20:43:56 UTC
hang no longer occuring for me ... submitting PR

Comment 9 openshift-github-bot 2016-08-22 15:50:47 UTC
Commit pushed to master at https://github.com/openshift/source-to-image

https://github.com/openshift/source-to-image/commit/1fec24c6de5832e1635dde341bcd4573dbfc4451
s2i: intermittent hang

Bug 1368074
Bugzilla link https://bugzilla.redhat.com/show_bug.cgi?id=1368074
Detailed explanation
A timing window existed, seen first with incremental builds and the new engine-api/k8s docker client, where the attach container would never return because of a blocking channel call

Comment 10 Gabe Montero 2016-08-22 16:54:21 UTC
OK @DongboYan - please rebuild s2i with the latest from the repo and verify the fix.

Thanks

Comment 11 Dongbo Yan 2016-08-23 05:50:32 UTC
Verified
this bug is fixed

s2i v1.1.1-34-g9b4c172


Note You need to log in before you can comment on or make changes to this bug.