Version-Release number of selected component (if applicable): Dev_preview_PROD OpenShift Master: v3.2.1.10-1-g668ed0a Kubernetes Master: v1.2.0-36-g4a3f9c5 How reproducible: Sometimes Steps to Reproduce: 1. oc new-app https://github.com/openshift/nodejs-ex 2. Check the build process. Actual results: 2. In dev_preview_PROD env, sometimes build would fail to push image to registry, some build logs are listed below: ... I0809 02:01:45.667105 1 sti.go:334] Successfully built bingli-prod/nodejs-ex-4:4b18d114 I0809 02:01:45.697901 1 cleanup.go:23] Removing temporary directory /tmp/s2i-build339799339 I0809 02:01:45.697921 1 fs.go:156] Removing directory '/tmp/s2i-build339799339' I0809 02:01:45.710856 1 sti.go:268] Using provided push secret for pushing 172.30.47.227:5000/bingli-prod/nodejs-ex:latest image I0809 02:01:45.710877 1 sti.go:272] Pushing 172.30.47.227:5000/bingli-prod/nodejs-ex:latest image ... I0809 02:02:15.839267 1 sti.go:277] Registry server Address: I0809 02:02:15.839292 1 sti.go:278] Registry server User Name: serviceaccount I0809 02:02:15.839299 1 sti.go:279] Registry server Email: serviceaccount I0809 02:02:15.839306 1 sti.go:284] Registry server Password: <<non-empty>> F0809 02:02:15.839315 1 builder.go:204] Error: build error: Failed to push image. Response from registry is: Error parsing HTTP response: unexpected end of JSON input: "" Additional info: [user3@bingli ~]$ oc get build NAME TYPE FROM STATUS STARTED DURATION nodejs-ex-1 Source Git@0e748ed Failed 26 minutes ago 1m1s nodejs-ex-2 Source Git@0e748ed Failed 23 minutes ago 1m11s nodejs-ex-3 Source Git@0e748ed Complete 14 minutes ago 2m26s nodejs-ex-4 Source Git@0e748ed Failed 10 minutes ago 1m9s [user3@bingli ~]$ oc get secret NAME TYPE DATA AGE builder-dockercfg-d4and kubernetes.io/dockercfg 1 24d builder-token-bas4d kubernetes.io/service-account-token 3 24d builder-token-tbhw7 kubernetes.io/service-account-token 3 24d default-dockercfg-dm8j3 kubernetes.io/dockercfg 1 24d default-token-fhqmy kubernetes.io/service-account-token 3 24d default-token-uwg2y kubernetes.io/service-account-token 3 24d deployer-dockercfg-1zw07 kubernetes.io/dockercfg 1 24d deployer-token-p4cws kubernetes.io/service-account-token 3 24d deployer-token-v5xqh kubernetes.io/service-account-token 3 24d [user3@bingli ~]$ oc get bc nodejs-ex -o json { "kind": "BuildConfig", "apiVersion": "v1", "metadata": { "name": "nodejs-ex", "namespace": "bingli-prod", "selfLink": "/oapi/v1/namespaces/bingli-prod/buildconfigs/nodejs-ex", "uid": "5af208b6-5df4-11e6-a1a5-0e3d364e19a5", "resourceVersion": "96695839", "creationTimestamp": "2016-08-09T05:44:36Z", "labels": { "app": "nodejs-ex" }, "annotations": { "openshift.io/generated-by": "OpenShiftNewApp" } }, "spec": { "triggers": [ { "type": "GitHub", "github": { "secret": "2Yi6C0MPUa_Ei_fo8d0d" } }, { "type": "Generic", "generic": { "secret": "ZUpbr3uIAjqDIwI5opyu" } }, { "type": "ConfigChange" }, { "type": "ImageChange", "imageChange": { "lastTriggeredImageID": "registry.access.redhat.com/rhscl/nodejs-4-rhel7:latest" } } ], "runPolicy": "Serial", "source": { "type": "Git", "git": { "uri": "https://github.com/openshift/nodejs-ex" } }, "strategy": { "type": "Source", "sourceStrategy": { "from": { "kind": "ImageStreamTag", "namespace": "openshift", "name": "nodejs:4" } } }, "output": { "to": { "kind": "ImageStreamTag", "name": "nodejs-ex:latest" } }, "resources": {}, "postCommit": {} }, "status": { "lastVersion": 4 } }
Could this registry failure be a result of https://bugzilla.redhat.com/show_bug.cgi?id=1364870 ?
*** Bug 1365855 has been marked as a duplicate of this bug. ***
*** Bug 1366326 has been marked as a duplicate of this bug. ***
(In reply to Steve Speicher from comment #2) > Could this registry failure be a result of > https://bugzilla.redhat.com/show_bug.cgi?id=1364870 ? We this it is. We should get that fix asap to verify that. Also the registry longs indicates a timeout when connecting to OpenShift API server, which might be some infra issue or the API server is overloaded. Is this consistently reproducible or a flake?
It can be reproduced easily in online production environment. I think I'm not the only one who met this issue, because there are several duplicated bugs about this :)
I talked to Alex and it seems like the problem is the I/O timeout when contacting the API server to verify the OpenShift user. Alex will try to put a fix together where we do retry if we hit the I/O timeout. We also need to do better job in error reporting if this happen, so the builder can retry the push if we hit capacity problem.