Description of problem: After upgrade of Openshift to current v3.5.5.31.24, when trying to use oc new-app --template=..., we get the following error: [ta-1u17-admin-mja@fr0-ose004 ~]$ oc new-app --template=my-php-template --param="MYSQL_USER=test" --param="MYSQL_DATABASE=test" --> Deploying template "openshift/my-php-template" to project example . . . runtime: goroutine stack exceeds 1000000000-byte limit fatal error: stack overflow Version-Release number of selected component (if applicable): v3.5.5.31.24 How reproducible: Unconfirmed. Customer can deploy it fine in the web console. Customer has seen it in some oc versions and not others: Customer tested with an oc v3.2 on windows and there the same command works... But the same oc new-app command fail on their RHEL 7.4 when osing oc v3.3 also. It might only affect certain templates (will attach the affected template) Steps to Reproduce: 1. Save template attached to bz 2. oc create -f template 3. oc new-app --template=my-php-template --param="MYSQL_USER=test" --param="MYSQL_DATABASE=test"
I was not able to recreate this but looking at the code it seems like it may have to do with other imagestreams that are defined in your project. Can you send the output of: oc get is -n openshift -o yaml and oc get is -o yaml # in the project where you're running new-app ?
Note we have this on any * with this template * On linux with oc v3.5.5.31.24... not on Windows with OC 3.3.
I think the circular reference checking was added after 3.3 which is why you likely do not see the issue with a 3.3 client.
In the `xxxdockerimages` namespace we indeed have tags pointing to the same imageStream - apiVersion: v1 kind: ImageStream metadata: annotations: openshift.io/generated-by: OpenShiftNewBuild creationTimestamp: 2016-12-16T17:40:38Z generation: 3 labels: build: xx-php56-image name: xx-php56-image namespace: xxdockerimages resourceVersion: "51901661" selfLink: /oapi/v1/namespaces/xxdockerimages/imagestreams/xx-php56-image uid: c2163564-c3b6-11e6-a417-68b5996d6ec0 spec: tags: - annotations: null from: kind: ImageStreamImage name: xx-php56-image-dev@sha256:aa223e31bfb0044e63eba0e726670d2bde450fe330957f9ea2cd3f127e3612a0 namespace: xxdockerimages generation: 2 importPolicy: {} name: DEV referencePolicy: type: Source - annotations: null from: kind: ImageStreamImage name: xx-php56-image@sha256:a1335ae5d18a1c9f9b3bfc6a5b314ed9f82cc8b24aade3bca4aecbd9d1a2c27f namespace: xxdockerimages generation: 3 importPolicy: {} name: PROD referencePolicy: type: Source
I've just seen they're defined as ImageStreamImages instead of ImageStreamTags
So ... is the problem the way I declared the IS and the tags? I have 2 build, each creating their own IS. Then the idea was, after validation it s working, tag it as DEV or PROD when manual test are OK. But it thus seems it is a problem to both "build" and then "tag" to the same IS? I did this: oc tag xxxdockerimages/xxx-php56-image-dev:latest xxx-php-image-dev:DEV -n xxxdockerimages oc tag --alias xxxdockerimages/xxx-php56-image-dev:DEV xxx-php56-image:DEV -n openshift oc tag xxxdockerimages/xxx-php56-image:latest xxx-php56-image:PROD -n xxxdockerimages oc tag --alias xxxdockerimages/xxx-php56-image:latest xxx-php56-image:PROD -n openshift At least this is the intent... should I delete all tags and try again? Regards, Marc
I recreated tags PROD and DEV after deleting them with -d. It didn't fix anything. But then, I modified the template to point directly to xxx-php56-image:PROD in xxxdockerimages instead of pointing to alias in openshift Then it works... ?
Gabe's more likely to have time to chase this than I am. Gabe see above+attachments for some sample imagestreams/imagestreamimage references that are tripping a recursive loop in the circular reference logic. Given your recent foray into that code, perhaps you can determine exactly what about this customer's imagestreamtag definitions is tripping us up and 1) offer them a way to fix their imagestreamtags 2) fix new-app so it doesn't do an inifinite recursion.
OK, some housekeeping first. Based on the line numbers, the earliest commit level that lines up with the thread dump is 9871f5001ba3947eef6bad88d7bd0fb0fa4038e2 from Feb 10 of this year, 2017. And yes, as Ben alluded to, even at the latest commit level, as part of looking at another newapp bug, I'm coming to the conclusion that newapp's circular reference detection is vulnerable to recursive loops introduced by actual circular references. As part of the fixes I'm constructing (haven't decided yet to do separate ones or a single one), I minimally plan on employing https://github.com/openshift/origin/blob/master/pkg/image/apis/image/helper.go#L383-L409 prior to follow followRefToDockerImage's recursion to avoid the recursion. I'm also eyeing nuking the recursion altogether, but don't know if I'll go that far yet. At first blush I do agree with Ben that there is a circular reference here that 3.3 would not have caught. I need to study the template and imagestreams more to provide workarounds / fixes to the IST defs. Will report back as I progress on both fronts.
Can they work around this for now by using oc process | oc create -f -? That will not do the circular reference checking that is blowing up the stack.
PR https://github.com/openshift/origin/pull/16843 should handle the stack overflow, as well as introduce additional tracing to assist in explaining any circular references found Conceivably this change will get in the 3.7 release, but its availability is still a few weeks out. Marc - can you wait that long? And would you be able to leverage a 3.7 client (even if your master is still at 3.5) when it is available? Also, Marc's #Comment 16, about switching from ImageStreamImage to ImageStreamTag .... I finally realized something. The commit level he is using (9871f5001ba3947eef6bad88d7bd0fb0fa4038e2, thanks to the the line numbers in the stack trace) is missing a fix wrt ImageStreamImage. In particular, commit 5b464bfd8b4542212be93c688a40a744dbbb686f which I introduced in March of 2017 added fixes to the circular checks for ImageStreamImages. Since Ben and I are running with that fix, that is why our repro attempts are not seeing this. Undoubtedly that would assist Marc here. The latest 3.5.x release appears to be v3.5.5.31.37-1, but looking at the relevant code blob/v3.5.5.31.37-1/pkg/generate/app/cmd/newapp.go it is missing the fix. So a new 3.5.x change would be needed. Specifically, backporting 5b464bfd8b4542212be93c688a40a744dbbb686f, either in lieu of my PR, on in conjunction with my PR, is an option. Marc - do you have any preferences? thanks
> In particular, commit 5b464bfd8b4542212be93c688a40a744dbbb686f which I introduced in March of 2017 added fixes to the circular checks for ImageStreamImages. Gabe, wouldn't that mean that fix is in 3.6? In which case they can try a 3.6 client.
Hello, As indicated, we found a temporary solution by pointing directly on the image stream in the project xxxdockerimages were it is build. The alias was only an attempt to "simplify" but is not absolutely required. And the alias is what seems to trigger the loop. See comment 15. Therefor we can wait. Do as you prefer. Marc
Thanks for the feedback Marc. With Marc satisfied with the "ImageStreamTag vs. ImageStreamImage workaround", and with Ben's noting that 5b464bfd8b4542212be93c688a40a744dbbb686f should be in 3.6 as well, I'm going to propose that we close this but out with the referenced PR's merging into master/3.7. Ben / others let me know if you disagree. thanks
yeah, set this as target release 3.7.0 and move it to modified once your fixes merge into origin.
Reviewed a few things ... 1) the customer's willingness to use workaround (so no need to backport 5b464bfd8b4542212be93c688a40a744dbbb686f to 3.5 ... it is in 3.6 2) that circular dependency would be apparent regardless of imagestreamimage or imagestreamtag usage 3) that Ben and I could not repro with use of imagestreamimage with 5b464bfd8b4542212be93c688a40a744dbbb686f change present 4) after further review of 5b464bfd8b4542212be93c688a40a744dbbb686f and its associated bug https://bugzilla.redhat.com/show_bug.cgi?id=1422378 5) he and I agreed we should close as a dup So no further activity to be taken on this one. *** This bug has been marked as a duplicate of bug 1422378 ***