Bug 1502017 - oc new-app gives "goroutine stack exceeds byte limit"
Summary: oc new-app gives "goroutine stack exceeds byte limit"
Keywords:
Status: CLOSED DUPLICATE of bug 1422378
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: oc
Version: 3.5.1
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 3.6.z
Assignee: Gabe Montero
QA Contact: Wenjing Zheng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-10-13 17:50 UTC by Steven Walter
Modified: 2020-12-14 10:30 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-10-19 15:32:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Steven Walter 2017-10-13 17:50:53 UTC
Description of problem:
After upgrade of Openshift to current v3.5.5.31.24, when trying to use oc new-app --template=..., we get the following error:

[ta-1u17-admin-mja@fr0-ose004 ~]$ oc new-app --template=my-php-template --param="MYSQL_USER=test"  --param="MYSQL_DATABASE=test"
--> Deploying template "openshift/my-php-template" to project example
. . .
runtime: goroutine stack exceeds 1000000000-byte limit
fatal error: stack overflow


Version-Release number of selected component (if applicable):
v3.5.5.31.24

How reproducible:
Unconfirmed.

Customer can deploy it fine in the web console.

Customer has seen it in some oc versions and not others:

Customer tested with an oc v3.2  on windows and there the same command works...
But the same oc new-app command fail on their RHEL 7.4 when osing oc v3.3 also.

It might only affect certain templates (will attach the affected template)

Steps to Reproduce:
1. Save template attached to bz
2. oc create -f template
3. oc new-app --template=my-php-template --param="MYSQL_USER=test"  --param="MYSQL_DATABASE=test"

Comment 3 Ben Parees 2017-10-13 19:33:51 UTC
I was not able to recreate this but looking at the code it seems like it may have to do with other imagestreams that are defined in your project.

Can you send the output of:

oc get is -n openshift -o yaml

and

oc get is -o yaml  # in the project where you're running new-app

?

Comment 10 Marc Jadoul 2017-10-16 08:59:57 UTC
Note we have this on any 
* with this template
* On linux with oc v3.5.5.31.24... not on Windows with OC 3.3.

Comment 11 Ben Parees 2017-10-16 09:11:03 UTC
I think the circular reference checking was added after 3.3 which is why you likely do not see the issue with a 3.3 client.

Comment 13 Ruben Romero Montes 2017-10-16 09:48:25 UTC
In the `xxxdockerimages` namespace we indeed have tags pointing to the same imageStream

- apiVersion: v1
  kind: ImageStream
  metadata:
    annotations:
      openshift.io/generated-by: OpenShiftNewBuild
    creationTimestamp: 2016-12-16T17:40:38Z
    generation: 3
    labels:
      build: xx-php56-image
    name: xx-php56-image
    namespace: xxdockerimages
    resourceVersion: "51901661"
    selfLink: /oapi/v1/namespaces/xxdockerimages/imagestreams/xx-php56-image
    uid: c2163564-c3b6-11e6-a417-68b5996d6ec0
  spec:
    tags:
    - annotations: null
      from:
        kind: ImageStreamImage
        name: xx-php56-image-dev@sha256:aa223e31bfb0044e63eba0e726670d2bde450fe330957f9ea2cd3f127e3612a0
        namespace: xxdockerimages
      generation: 2
      importPolicy: {}
      name: DEV
      referencePolicy:
        type: Source
    - annotations: null
      from:
        kind: ImageStreamImage
        name: xx-php56-image@sha256:a1335ae5d18a1c9f9b3bfc6a5b314ed9f82cc8b24aade3bca4aecbd9d1a2c27f
        namespace: xxdockerimages
      generation: 3
      importPolicy: {}
      name: PROD
      referencePolicy:
        type: Source

Comment 14 Ruben Romero Montes 2017-10-16 09:51:03 UTC
I've just seen they're defined as ImageStreamImages instead of ImageStreamTags

Comment 15 Marc Jadoul 2017-10-16 13:18:49 UTC
So ... is the problem the way I declared the IS and the tags?
I have 2 build, each creating their own IS. 
Then the idea was, after validation it s working, tag it as DEV or PROD when manual test are OK. 
But it thus seems it is a problem to both "build" and then "tag" to the same IS?

I did this:
oc tag  xxxdockerimages/xxx-php56-image-dev:latest xxx-php-image-dev:DEV -n xxxdockerimages
oc tag --alias xxxdockerimages/xxx-php56-image-dev:DEV xxx-php56-image:DEV -n openshift
oc tag  xxxdockerimages/xxx-php56-image:latest xxx-php56-image:PROD -n xxxdockerimages
oc tag --alias xxxdockerimages/xxx-php56-image:latest xxx-php56-image:PROD -n openshift


At least this is the intent... should I delete all tags and try again?

Regards,

Marc

Comment 16 Marc Jadoul 2017-10-16 14:13:16 UTC
I recreated tags PROD and DEV after deleting them with -d. It didn't fix anything.

But then, I modified the template to point directly to xxx-php56-image:PROD in xxxdockerimages instead of pointing to alias in openshift

Then it works...

?

Comment 17 Ben Parees 2017-10-16 19:38:04 UTC
Gabe's more likely to have time to chase this than I am.  Gabe see above+attachments for some sample imagestreams/imagestreamimage references that are tripping a recursive loop in the circular reference logic.

Given your recent foray into that code, perhaps you can determine exactly what about this customer's imagestreamtag definitions is tripping us up and

1) offer them a way to fix their imagestreamtags
2) fix new-app so it doesn't do an inifinite recursion.

Comment 18 Gabe Montero 2017-10-17 14:30:01 UTC
OK, some housekeeping first.  Based on the line numbers, the earliest commit level that lines up with the thread dump is 9871f5001ba3947eef6bad88d7bd0fb0fa4038e2 from Feb 10 of this year, 2017.

And yes, as Ben alluded to, even at the latest commit level, as part of looking at another newapp bug, I'm coming to the conclusion that newapp's circular reference detection is vulnerable to recursive loops introduced by actual circular references.

As part of the fixes I'm constructing (haven't decided yet to do separate ones or a single one), I minimally plan on employing https://github.com/openshift/origin/blob/master/pkg/image/apis/image/helper.go#L383-L409 prior to follow followRefToDockerImage's recursion to avoid the recursion.  I'm also eyeing nuking the recursion altogether, but don't know if I'll go that far yet.

At first blush I do agree with Ben that there is a circular reference here that 3.3 would not have caught.

I need to study the template and imagestreams more to provide workarounds / fixes to the IST defs.

Will report back as I progress on both fronts.

Comment 19 Ben Parees 2017-10-17 15:26:06 UTC
Can they work around this for now by using oc process | oc create -f -?  That will not do the circular reference checking that is blowing up the stack.

Comment 21 Gabe Montero 2017-10-18 12:34:58 UTC
PR https://github.com/openshift/origin/pull/16843 

should handle the stack overflow, as well as introduce additional tracing to assist in explaining any circular references found

Conceivably this change will get in the 3.7 release, but its availability is still a few weeks out.  Marc - can you wait that long?  And would you be able to leverage a 3.7 client (even if your master is still at 3.5) when it is available?

Also, Marc's #Comment 16, about switching from ImageStreamImage to ImageStreamTag .... I finally realized something.

The commit level he is using (9871f5001ba3947eef6bad88d7bd0fb0fa4038e2, thanks to the the line numbers in the stack trace) is missing a fix wrt ImageStreamImage.

In particular, commit 5b464bfd8b4542212be93c688a40a744dbbb686f which I introduced in March of 2017 added fixes to the circular checks for ImageStreamImages.

Since Ben and I are running with that fix, that is why our repro attempts are not seeing this.  Undoubtedly that would assist Marc here.

The latest 3.5.x release appears to be v3.5.5.31.37-1, but looking at the relevant code blob/v3.5.5.31.37-1/pkg/generate/app/cmd/newapp.go it is missing the fix.  So a new 3.5.x change would be needed.

Specifically, backporting 5b464bfd8b4542212be93c688a40a744dbbb686f, either in lieu of my PR, on in conjunction with my PR, is an option.

Marc - do you have any preferences?

thanks

Comment 22 Ben Parees 2017-10-18 14:41:35 UTC
> In particular, commit 5b464bfd8b4542212be93c688a40a744dbbb686f which I introduced in March of 2017 added fixes to the circular checks for ImageStreamImages.

Gabe, wouldn't that mean that fix is in 3.6?  In which case they can try a 3.6 client.

Comment 23 Marc Jadoul 2017-10-18 14:42:26 UTC
Hello,

As indicated, we found a temporary solution by pointing directly on the image stream in the project xxxdockerimages were it is build.

The alias was only an attempt to "simplify" but is not absolutely required. And the alias is what seems to trigger the loop. 

See comment 15.

Therefor we can wait. Do as you prefer.

Marc

Comment 24 Gabe Montero 2017-10-18 18:08:46 UTC
Thanks for the feedback Marc.

With Marc satisfied with the "ImageStreamTag vs. ImageStreamImage workaround", and with Ben's noting that 5b464bfd8b4542212be93c688a40a744dbbb686f should be in 3.6 as well, I'm going to propose that we close this but out with the referenced PR's merging into master/3.7.

Ben / others let me know if you disagree.

thanks

Comment 25 Ben Parees 2017-10-18 19:32:50 UTC
yeah, set this as target release 3.7.0 and move it to modified once your fixes merge into origin.

Comment 26 Gabe Montero 2017-10-19 15:32:28 UTC
Reviewed a few things ...

1) the customer's willingness to use workaround (so no need to backport 5b464bfd8b4542212be93c688a40a744dbbb686f to 3.5 ... it is in 3.6

2) that circular dependency would be apparent regardless of imagestreamimage or imagestreamtag usage

3) that Ben and I could not repro with use of imagestreamimage with 5b464bfd8b4542212be93c688a40a744dbbb686f change present

4) after further review of 5b464bfd8b4542212be93c688a40a744dbbb686f and its associated bug https://bugzilla.redhat.com/show_bug.cgi?id=1422378

5) he and I agreed we should close as a dup

So no further activity to be taken on this one.

*** This bug has been marked as a duplicate of bug 1422378 ***


Note You need to log in before you can comment on or make changes to this bug.