Description of problem: In dev-preview-int, I hit this problem 2/3 of the times I tried creating a cakephp-mysql-example app: Pushing image 172.30.113.38:5000/dakinitest3/cakephp-mysql-example:latest ... error: build error: Failed to push image: unauthorized: authentication required Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Create a new project 2. Create a new app. 'oc new-app cakephp-mysql-example -n <project>' 3. Check build to see if it's in Error state, and review build logs. Actual results: Expected results: Additional info: Here's what I did to reproduce the issue: [root@dev-preview-int-master-d41bf ~]# oadm new-project dakinitest3 Created project dakinitest3 [root@dev-preview-int-master-d41bf ~]# oc new-app cakephp-mysql-example -n dakinitest3 [root@dev-preview-int-master-d41bf ~]# oc get events -w -n dakinitest3 LASTSEEN FIRSTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE LASTSEEN FIRSTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE 2016-09-07T21:01:53Z 2016-09-07T21:01:53Z 1 mysql DeploymentConfig Warning FailedRetry {deployments-controller } mysql-1: About to stop retrying mysql-1: couldn't create deployer pod for dakinitest3/mysql-1: pods "mysql-1-deploy" is forbidden: service account dakinitest3/deployer was not found, retry after the service account is created 2016-09-07T21:01:53Z 2016-09-07T21:01:53Z 2 mysql DeploymentConfig Warning FailedRetry {deployments-controller } mysql-1: About to stop retrying mysql-1: couldn't create deployer pod for dakinitest3/mysql-1: pods "mysql-1-deploy" is forbidden: service account dakinitest3/deployer was not found, retry after the service account is created 2016-09-07T21:01:53Z 2016-09-07T21:01:53Z 1 mysql DeploymentConfig Normal DeploymentCreated {deploymentconfig-controller } Created new deployment "mysql-1" for version 1 2016-09-07T21:02:05Z 2016-09-07T21:02:05Z 1 cakephp-mysql-example-1-build Pod Normal Scheduled {default-scheduler } Successfully assigned cakephp-mysql-example-1-build to ip-172-31-14-248.ec2.internal 2016-09-07T21:02:08Z 2016-09-07T21:02:08Z 1 cakephp-mysql-example-1-build Pod spec.containers{sti-build} Normal Pulling {kubelet ip-172-31-14-248.ec2.internal} pulling image "registry.ops.openshift.com/openshift3/ose-sti-builder:v3.3.0.26" 2016-09-07T21:02:08Z 2016-09-07T21:02:08Z 1 cakephp-mysql-example-1-build Pod spec.containers{sti-build} Normal Pulled {kubelet ip-172-31-14-248.ec2.internal} Successfully pulled image "registry.ops.openshift.com/openshift3/ose-sti-builder:v3.3.0.26" 2016-09-07T21:02:09Z 2016-09-07T21:02:09Z 1 cakephp-mysql-example-1-build Pod spec.containers{sti-build} Normal Created {kubelet ip-172-31-14-248.ec2.internal} Created container with docker id fb3b01c0f3e5 2016-09-07T21:02:10Z 2016-09-07T21:02:10Z 1 cakephp-mysql-example-1-build Pod spec.containers{sti-build} Normal Started {kubelet ip-172-31-14-248.ec2.internal} Started container with docker id fb3b01c0f3e5 2016-09-07T21:02:40Z 2016-09-07T21:02:40Z 1 cakephp-mysql-example-1-build Pod spec.containers{sti-build} Normal Killing {kubelet ip-172-31-14-248.ec2.internal} Killing container with docker id fb3b01c0f3e5: Need to kill pod. [root@dev-preview-int-master-d41bf ~]# oc get pods -n dakinitest3 NAME READY STATUS RESTARTS AGE cakephp-mysql-example-1-build 0/1 Error 0 48s [root@dev-preview-int-master-d41bf ~]# oc logs -n dakinitest3 cakephp-mysql-example-1-build Pulling image "registry.ops.openshift.com/rhscl/php-56-rhel7@sha256:c985bd02e50b3ceecd2c7197190c541132c6986f9a4225e12410274aa150782c" ... Pulling image "registry.ops.openshift.com/rhscl/php-56-rhel7@sha256:c985bd02e50b3ceecd2c7197190c541132c6986f9a4225e12410274aa150782c" ... Cloning "https://github.com/openshift/cakephp-ex.git" ... Commit: 701d706b7f2b50ee972d0bf76990042f6c0cda5c (Merge pull request #42 from bparees/recreate) Author: Ben Parees <bparees.github.com> Date: Mon Aug 22 14:44:49 2016 -0400 ---> Installing application source... Pushing image 172.30.113.38:5000/dakinitest3/cakephp-mysql-example:latest ... error: build error: Failed to push image: unauthorized: authentication required
It seems that the dockercfg secret is taking a while to get created (or attached to the builder service account). When the build is generated, the secret is not attached to the builder service account. However, it doesn't happen every time. I used the following script to prove this. Eventually you get a project that took too long to assign a dockercfg secret to the builder service account. #!/bin/bash set -e index=1 prefix="build-debug-" while(true); do echo "Creating ${prefix}${index}" oc new-project "${prefix}${index}" sacount=1 while(true); do if oc get sa builder -o yaml | grep -q dockercfg; then break fi sleep 1 sacount=$((sacount+1)) if (( sacount > 30 )); then echo "Took too long" oc delete project "${prefix}${index}" exit 1 fi done oc delete project "${prefix}${index}" index=$((index+1)) done
*** Bug 1363585 has been marked as a duplicate of this bug. ***
I hit this problem as well. Though, in my case, the builds continuously failed with this error over a period of 2-3 days before finally succeeding. I can no longer reproduce this issue now. Though, when the builds were failing, I was able to verify that the SA and the secrets are all present and being correctly referenced by the builder pod.
Abhishek the secret may exist, but if it's not associated with the builder service account we currently don't resolve it in the build generator. Could it be that it took a really long time to associate it with the service account?
That's a good point - I did query the SA and the secrets but cannot find the output pastebin - would have been nice to be able to confirm that.
I'm not seeing any evidence of the controllers running on dev-preview-int. I'm not seeing uid annotations added to namespaces, service accounts created, or namespaces getting cleaned up (they stay in Terminating forever). All of those are controller functions.
I think this is the same root cause as BZ1374569
Only question is: how did it auto-correct itself earlier today (a couple of hours back).
We had a config management issue that caused the masters and controllers to shut down in INT. The config files were corrupted by our Ansible 1.9->2.2 upgrade. The issue has been fixed and the master-controllers are back online.
Michal: Is there any additional debugging you want to do or should I move this bug over to QE?
Is this just an artifact of the master controllers crashing in INT as captured by Bug 1377483
I hadn't considered that, but it could be. Let's check back after the controller crashes are resolved.
Now that the controller crashes have been handled, lets retest this.
I've met this issue several times in dev-preview-stg 3.3.0.32: "builder-dockercfg" secret was failed to be mounted to the builder pod, and image failed to be pushed with error “Failed to push image: unauthorized: authentication required". But seems it's not as serious as before. It Only occured when I created a new project and create new app using the default template, and when I tried to rebuild with the same buildconfig, it would succeed. BTW, STG env is a little overloaded, don't know if it's due to this. And my project is "bingli808".
Reducing the severity as this seems to be related to the short (expected) delay in creating the secret/SA and the situation auto-corrects itself.
I still could reproduce this issue in STG 3.3.0.33.
Bing Li: How long is the delay in creating the SA/secret? Are we talking about a few seconds/minutes or hours?
(In reply to Abhishek Gupta from comment #18) > Bing Li: How long is the delay in creating the SA/secret? Are we talking > about a few seconds/minutes or hours? It's about 2 minutes in my test, then rebuild would succeed.
In dev-preview-stg, I still could reproduce this issue even about 2 hours after project was created. And "builder-dockercfg" secret wasn't mounted to the builder pod, but sa/secrets all look fine, they were all created 2 hours ago: $ oc get build NAME TYPE FROM STATUS STARTED DURATION dancer-mysql-example-1 Source Git@009ce0f Failed 8 minutes ago 3m1s $ oc get pod NAME READY STATUS RESTARTS AGE dancer-mysql-example-1-build 0/1 Error 0 8m database-1-t0z5f 1/1 Running 0 8m $ oc get secret NAME TYPE DATA AGE builder-dockercfg-q3hae kubernetes.io/dockercfg 1 2h builder-token-lhwjl kubernetes.io/service-account-token 3 2h builder-token-uovvq kubernetes.io/service-account-token 3 2h default-dockercfg-ry7at kubernetes.io/dockercfg 1 2h default-token-kdht4 kubernetes.io/service-account-token 3 2h default-token-rjwp3 kubernetes.io/service-account-token 3 2h deployer-dockercfg-kuhi5 kubernetes.io/dockercfg 1 2h deployer-token-a6fdw kubernetes.io/service-account-token 3 2h deployer-token-eyrg2 kubernetes.io/service-account-token 3 2h $ oc get sa NAME SECRETS AGE builder 2 2h default 2 2h deployer 2 2h $ oc logs dancer-mysql-example-1-build ...... Pushing image 172.30.46.234:5000/bingli726/dancer-mysql-example:latest ... error: build error: Failed to push image: unauthorized: authentication required
Can we retest in devpreview INT now that we have 3.3.1 installed there?
I can still reproduce this "failing to push image" issue about 1/10 of the times in INT v3.3.1.1, even after the "builder-dockercfg" secret was created and attached to the builder service account. And using the script from comment 1, I didn't met a long delay for the dockercfg secret attaching to the builder service account.
(In reply to Bing Li from comment #26) > I can still reproduce this "failing to push image" issue about 1/10 of the > times in INT v3.3.1.1, even after the "builder-dockercfg" secret was created > and attached to the builder service account. > > And using the script from comment 1, I didn't met a long delay for the > dockercfg secret attaching to the builder service account. Can you please provide output of 'oc get secrets'?
(In reply to Michal Fojtik from comment #27) > (In reply to Bing Li from comment #26) > > I can still reproduce this "failing to push image" issue about 1/10 of the > > times in INT v3.3.1.1, even after the "builder-dockercfg" secret was created > > and attached to the builder service account. > > > > And using the script from comment 1, I didn't met a long delay for the > > dockercfg secret attaching to the builder service account. > > Can you please provide output of 'oc get secrets'? Nvmd. I saw found it. Can you also send output of `oc get sa/builder -o yaml` ?
I reproduced this issue today after I created 10+ new projects and then created new builds: $ oc get sa/builder -o yaml apiVersion: v1 imagePullSecrets: - name: builder-dockercfg-fb42n kind: ServiceAccount metadata: creationTimestamp: 2016-10-18T02:07:10Z name: builder namespace: bingli745 resourceVersion: "3528129" selfLink: /api/v1/namespaces/bingli745/serviceaccounts/builder uid: 9417eda9-94d7-11e6-816f-0a30d6fc46dc secrets: - name: builder-token-e6hsl - name: builder-dockercfg-fb42n $ oc get pod ot565es-1-build -o yaml ...... volumeMounts: - mountPath: /var/run/docker.sock name: docker-socket - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: builder-token-e6hsl readOnly: true dnsPolicy: ClusterFirst
This should be fixed by https://github.com/openshift/ose/pull/411
all the code in that PR is already in 3.3.x https://github.com/openshift/origin/pull/11394/ might be more relevant
Moving this bug to ON_QA as DevPreview INT is on 3.4.0.x right now.
We can still reproduce this issue in online dev-preview-int 3.4: $ oc logs gfagtretrah-1-build Pulling image "registry.ops.openshift.com/rhscl/php-56-rhel7@sha256:9256701bdb510a52ba4b44447ef2e9445ed8b019fb109a0a2f44b8222debccff" ... Pulling image "registry.ops.openshift.com/rhscl/php-56-rhel7@sha256:9256701bdb510a52ba4b44447ef2e9445ed8b019fb109a0a2f44b8222debccff" ... Cloning "https://github.com/openshift/cakephp-ex.git" ... Commit: fbf2cfe396fb29ce94fa52152a38d0b9599c359c (Merge pull request #47 from luciddreamz/master) Author: Ben Parees <bparees.github.com> Date: Mon Oct 24 16:54:37 2016 -0400 ---> Installing application source... Pushing image 172.30.93.130:5000/bingli702/gfagtretrah:latest ... error: build error: Failed to push image: unauthorized: authentication required
(In reply to Bing Li from comment #35) > We can still reproduce this issue in online dev-preview-int 3.4: > > $ oc logs gfagtretrah-1-build > Pulling image > "registry.ops.openshift.com/rhscl/php-56-rhel7@sha256: > 9256701bdb510a52ba4b44447ef2e9445ed8b019fb109a0a2f44b8222debccff" > ... > Pulling image > "registry.ops.openshift.com/rhscl/php-56-rhel7@sha256: > 9256701bdb510a52ba4b44447ef2e9445ed8b019fb109a0a2f44b8222debccff" > ... > Cloning "https://github.com/openshift/cakephp-ex.git" ... > Commit: fbf2cfe396fb29ce94fa52152a38d0b9599c359c (Merge pull request #47 > from > luciddreamz/master) > Author: Ben Parees <bparees.github.com> > Date: Mon Oct 24 16:54:37 2016 -0400 > > ---> Installing application source... > > > Pushing image 172.30.93.130:5000/bingli702/gfagtretrah:latest ... > error: build error: Failed to push image: unauthorized: authentication > required Can I see output of "$ oc get secret" in devpreview when this happens? Also would it be possible to increase the BUILD_LOGLEVEL to 5 and provide that build output?
Created attachment 1217565 [details] Logs of "failing to push image" Reproduced this issue again in a project which was created 1 day ago in INT v3.4.0.21. Here's the logs and some other info. Please check. Thanks!
OK, so the secrets seems to not be mounted to the builder: I1105 13:37:47.130261 1 cfg.go:112] Using Docker authentication configuration in '/root/.docker/config.json' I1105 13:37:47.130296 1 cfg.go:56] Problem accessing /root/.docker/config.json: stat /root/.docker/config.json: no such file or directory I1105 13:37:47.130322 1 sti.go:283] No push secret provided But the SA secret seems to be present and populated with dockercfg data and it is even attached to the builder Pod (according to logs). Ben, it seems like this might indicate a problem in builder when parsing the correct secret for push.
Michal, so from the logs I can see what's happening... basically when the build controller is trying to create the build for the first time, the image stream doesn't exist yet. This is recorded in the status of the build object (which you can see in the log from Bing Li). The build controller will keep retrying to schedule the build until it succeeds. The problem is now that the image stream exists, the correct secret is no longer assigned to the pod, because resolving the secret happened when the build was generated and not while the controller is trying to create the pod. What we need to do in the controller is that when we finally find the output image stream, we should associate the right secret with the build as well. I am assigning this one to myself.
This ties back to the issue we've discussed previously (and i've tried to fix previously) that imagestreams must be resolved within the user's context which makes it difficult to retry them within the build controller. This is related to https://bugzilla.redhat.com/show_bug.cgi?id=1333030#c2 which is another bug where we fail to push because the service account secret was still being created when the build ran. This bug might be easier to fix since we can just move the secret resolution logic into the buildcontroller, after imagestream resolution has succeeded, I think? Regardless it's not a regression so it can be marked upcoming release if it's not easily fixable.
We should do the right thing and impersonate the service account when resolving the imagestream/secrets in the controller. I chatted with deads and he confirmed this is possible with an impersonate header. However, it may be too risky of a change for the current release. I'll go ahead and label this UpcomingRelease.
Cesar: Can you please spell out the workaround for the user? What exactly should users try/do if they hit this bug?
Abhishek, the workaround is to start the build manually once the output ImageStream exists.
> We should do the right thing and impersonate the service account when resolving > the imagestream/secrets in the controller. I chatted with deads and he > confirmed this is possible with an impersonate header. However, it may be too > risky of a change for the current release. that's an understatement :) doing it right ultimately means fully reworking our build api so we're using spec/status correctly and preventing users from updating status, among other things. it also means figuring out how to deal with resolution for triggered builds which have no "user" associated with them. (this is the same stuff i ran into when trying to do imagestream resolution in the controller instead of in the instantiate endpoint). I don't think we can realistically take it on until we're working on the v2 api and have the freedom to really fix our buildconfig objects properly. But yes i agree that's the direction we want to go to fix this properly.
I met this issue again in OCP v3.4.0.23, after I created a new project and then started a new build: Pushing image 172.31.100.64:5000/bingli2/nodejs-mongodb-example:latest ... error: build error: Failed to push image: unauthorized: authentication required
@Bing: did subsequent build requests succeed? how did you create the build? how did you start the build? how soon after creating the project did you create+start the build? We are aware of (and understand) the timing issues if you create a project and then immediately create/launch a build in that project. if you have a situation where repeated efforts to build the same thing fail, then we need: level 5 build logs, a dump of the build object, build pod object, secrets, and service accounts in the project.
This is affecting me in the current developer preview. I just tried (twice) to build a new project from a pristine fork of https://github.com/dfwperl/dancer-ex and I ran into the same error both times. Log excerpt from openshift web console build log: Pushing image 172.30.47.227:5000/dancertest2/dancer-mysql-example:latest ... error: build error: Failed to push image: unauthorized: authentication required
Tommy, just to clarify ... did you try starting the build twice on the same project and it failed or did you create a new project 2 times and the first build failed each time? If you can reproduce, can you please include: the json/yaml of the build the json/yaml of the build pod the json/yaml of the builder sa the json/yaml of secrets in the project and the steps you followed to start the build (did you instantiate a template, use new-app, etc).
I worked with Tommy on this, second builds were fine. It's the known issue of starting a build before the secrets are in place.
This problem also exists in ded-int-gcp. infomation: OpenShift Master: v3.3.1.3 Kubernetes Master: v1.3.0+52492b4
The problem is exist in dev-preview-int too. Version: openshift v3.4.0.38 kubernetes v1.4.0+776c994
I seem to hit this in: OpenShift Master: v3.4.0.39 Kubernetes Master: v1.4.0+776c994
Created attachment 1243615 [details] yaml output on failed build Output from: oc get bc/php -o yaml oc get bc/php2 -o yaml oc get builds/php-1 -o yaml oc get builds/php-2 -o yaml oc get builds/php2-1 -o yaml oc get is/php -o yaml oc get is/php2 -o yaml oc get dc/php -o yaml oc get dc/php2 -o yaml oc get routes/ab-php -o yaml oc get routes/php -o yaml oc get routes/php2 -o yaml oc get svc/php -o yaml oc get svc/php2 -o yaml oc get po/php-1-build -o yaml oc get po/php-2-build -o yaml oc get po/php2-1-build -o yaml oc describe pod php-1-build
[root@ocpmaster ~]# oc get secrets NAME TYPE DATA AGE builder-dockercfg-vn2j2 kubernetes.io/dockercfg 1 26m builder-token-73ulb kubernetes.io/service-account-token 4 26m builder-token-pbihs kubernetes.io/service-account-token 4 26m default-dockercfg-2p1g1 kubernetes.io/dockercfg 1 26m default-token-upr6g kubernetes.io/service-account-token 4 26m default-token-vfgkx kubernetes.io/service-account-token 4 26m deployer-dockercfg-ifj4d kubernetes.io/dockercfg 1 26m deployer-token-8giy4 kubernetes.io/service-account-token 4 26m deployer-token-bve7s kubernetes.io/service-account-token 4 26m [root@ocpmaster ~]# oc logs -f po/php-1-build Cloning "https://github.com/mglantz/ocp-php.git" ... Commit: 6518f16750cca61ee87160cc9492265601bfea02 (Significant update, version 9.5, breaks all) Author: Magnus Glantz <open.grieves> Date: Fri Jan 13 11:05:06 2017 +0100 ---> Installing application source... Pushing image 172.30.72.114:5000/test/php:latest ... error: build error: Failed to push image: unauthorized: authentication required
Hm, after waiting sometime. It worked. Uploading the same debug output from when it worked. It was a brand new installation of 3.4. So.. perhaps had not everything come up.
Created attachment 1243617 [details] yaml output on successful build
Yes, verified. Sorry for the noise. The issue was that the registry had not deployed fully at the point of the app build. Re-installed the cluster and replicated the issue.
also exists in dev-preview-stg 3.5.5.5.
Can someone summarize what scenario this bug is for at this point? There are a few scenarios where pushes can fail: 1) the secret never got created in the project 2) the build was created before the target imagestream was created 3) unknown reasons we need to investigate if it's (1), this bug needs to be transferred to a more appropriate team. if it's (2), it's a dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1443163 if it's (3), i guess we still need more master logs, build logs, build json, and pod json
I ran the script from comment #1 but I'm no longer able to reproduce this. I don't think this is an issue anymore. The only time I get 'unauthorized: authentication required' is when doing external pushes, which is covered by https://bugzilla.redhat.com/show_bug.cgi?id=1439614