Description of problem: When delete the secret, then restart a build which use the secret to access private repo, the build status is always Pending and blocked. Version-Release number of selected component (if applicable): openshift v0.5-202-gdf30dfa kubernetes v0.16.2-338-gc07896e How reproducible: Always Steps to Reproduce: 1. Generate a ssh key and upload the public key to github $ ssh-keygen $ cat ~/.ssh/id_rsa.pub 2. Create a new project $ osadm new-project test 3. Create a secret $ cat secret.json { "apiVersion": "v1beta3", "kind": "Secret", "metadata": { "name": "mysecret" }, "data": { "ssh-privatekey": "<<< place here the result of base64 -w 0 ~/.ssh/id_rsa >>>" } } $ osc create -f secret.json -n test 4. Edit application-template-stibuild.json, add ref info to it like below: $ cd /data/src/github.com/openshift/origin/examples/sample-app $ vim application-template-stibuild.json { "apiVersion": "v1beta1", "kind": "BuildConfig", "metadata": { "name": "ruby-sample-build", ... "source": { "git": { "uri": "git:openshift/ruby-hello-world.git" }, "sourceSecretName": "mysecret", "type": "Git" }, } 5. Submit the application template for processing and create the application using the processed template: $ osc process -n test -f application-template-stibuild.json | osc create -n test -f - 6. Start a build and check the build result $ osc start-build $buildConfig -n test $ osc get build -n test 7. Delete the secret $ osc delete secret mysecret -n test 8. Restart build and check the build result $ osc start-build $buildConfig -n test $ osc get build -n test $ osc build-logs $buildname -n test Actual results: 8.The build ruby-sample-build-2 status is always Pending $ osc get build -n test NAME TYPE STATUS POD ruby-sample-build-1 STI Complete ruby-sample-build-1 ruby-sample-build-2 STI Pending ruby-sample-build-2 $ osc build-logs ruby-sample-build-2 -n test Error from server: timed out waiting for build Expected results: 8.The build should be failed, and tip can't find secret in buildlogs. Additional info:
The problem is with k8s, see this issue https://github.com/GoogleCloudPlatform/kubernetes/issues/8178
If use incorrect pull secret name in build strategy, the build will keep pending and cannot see any warning, except from openshift.log: E0515 05:26:06.229295 1220 pod_workers.go:108] Error syncing pod 29ba3c6c-fac2-11e4-9a9c-22000ba092c3, skipping: secrets "pull123" not found E0515 05:26:08.995529 1220 secret.go:117] Couldn't get secret wzheng1/pull123 E0515 05:26:08.995561 1220 kubelet.go:1036] Unable to mount volumes for pod "ruby-sample-build-2_wzheng1": secrets "pull123" not found; skipping pod E0515 05:26:09.203327 1220 pod_workers.go:108] Error syncing pod 29ba3c6c-fac2-11e4-9a9c-22000ba092c3, skipping: secrets "pull123" not found E0515 05:26:12.012156 1220 secret.go:117] Couldn't get secret wzheng1/pull123 E0515 05:26:12.012193 1220 kubelet.go:1036] Unable to mount volumes for pod "ruby-sample-build-2_wzheng1": secrets "pull123" not found; skipping pod Here is my build strategy: { "strategy": { "stiStrategy": { "from": { "kind": "DockerImage", "name": "docker.io/wzheng/ruby-20-centos7:latest" }, "pullSecretName": "pull123" ----this secret doesn't exist }, "type": "STI" } }
Maciej, is this something you have been worked on re: build failure retries?
Michal, I need to sync with Paul about his investigation he mentioned in k8s Issue.
The result of the discussion from [1] is this is the expected situation for pod to hang endlessly waiting for the secret. To provide some kind of solution for end users we'll show pod events (which contain the information about missing/wrong secret) when doing osc describe build along with build events [3], as discussed in [2]. Additionally where we have control over objects we'll fail the build after 30 mins. [1] https://github.com/GoogleCloudPlatform/kubernetes/issues/8178 [2] https://github.com/openshift/origin/issues/2269 [3] https://github.com/openshift/origin/pull/2220
(In reply to Maciej Szulik from comment #5) > The result of the discussion from [1] is this is the expected situation for > pod to hang endlessly waiting for the secret. To provide some kind of > solution for end users we'll show pod events (which contain the information > about missing/wrong secret) when doing osc describe build along with build > events [3], as discussed in [2]. Additionally where we have control over > objects we'll fail the build after 30 mins. > > [1] https://github.com/GoogleCloudPlatform/kubernetes/issues/8178 > [2] https://github.com/openshift/origin/issues/2269 > [3] https://github.com/openshift/origin/pull/2220 There is no pod event in build description when using incorrect secret: [fedora@ip-10-229-66-143 sample-app]$ osc describe builds ruby-sample-build-2 Name: ruby-sample-build-2 Created: 5 minutes ago Labels: buildconfig=ruby-sample-build,name=ruby-sample-build,template=application-template-stibuild Build Config: ruby-sample-build Status: Pending Duration: waiting for 5m7s Build Pod: ruby-sample-build-2 Strategy: Source Image Reference: DockerImage docker.io/wzheng/ruby-20-centos7:latest Pull Secret Name: newsecret123 Incremental Build: yes Source Type: Git URL: git://github.com/openshift/ruby-hello-world.git Output to: origin-ruby-sample:latest Output Spec: <none> No events.
Tested against latest master (commit id: 54aed090d8ad32e228cb601a9695c00198061af8), and I got following result: [vagrant@openshiftdev origin]$ osc describe bc ruby-sample-build Name: ruby-sample-build Created: 3 minutes ago Labels: name=ruby-sample-build,template=application-template-stibuild Latest Version: 1 Strategy: Source Image Reference: ImageStreamTag ruby-20-centos7:latest Incremental Build: yes Source Type: Git URL: git://github.com/openshift/ruby-hello-world.git Source Secret: some-secret Output to: origin-ruby-sample:latest Output Spec: <none> Webhook Github: https://localhost:8443/osapi/v1beta3/namespaces/test/buildconfigs/ruby-sample-build/webhooks/secret101/github Webhook Generic: https://localhost:8443/osapi/v1beta3/namespaces/test/buildconfigs/ruby-sample-build/webhooks/secret101/generic Image Repository Trigger - LastTriggeredImageID: openshift/ruby-20-centos7:latest Builds: Name Status Duration Creation Time ruby-sample-build-1 pending waiting for 3m27s 2015-05-26 13:02:12 +0000 UTC [vagrant@openshiftdev origin]$ osc describe build ruby-sample-build-1 Name: ruby-sample-build-1 Created: 58 seconds ago Labels: buildconfig=ruby-sample-build,name=ruby-sample-build,template=application-template-stibuild Build Config: ruby-sample-build Status: Pending Duration: waiting for 58s Build Pod: ruby-sample-build-1 Strategy: Source Image Reference: DockerImage openshift/ruby-20-centos7:latest Incremental Build: yes Source Type: Git URL: git://github.com/openshift/ruby-hello-world.git Source Secret: some-secret Output to: origin-ruby-sample:latest Output Spec: <none> Events: FirstSeen LastSeen Count From SubobjectPath Reason Message Tue, 26 May 2015 13:02:12 +0000 Tue, 26 May 2015 13:02:12 +0000 1 {scheduler } scheduled Successfully assigned ruby-sample-build-1 to openshiftdev.local Tue, 26 May 2015 13:02:12 +0000 Tue, 26 May 2015 13:03:02 +0000 6 {kubelet openshiftdev.local} failedMount Unable to mount volumes for pod "ruby-sample-build-1_test": secrets "some-secret" not found Tue, 26 May 2015 13:02:12 +0000 Tue, 26 May 2015 13:03:02 +0000 6 {kubelet openshiftdev.local} failedSync Error syncing pod, skipping: secrets "some-secret" not found
Works now, thanks! openshift v0.5.2.0-176-gc386339 kubernetes v0.17.0-441-g6b6b47a ruby-sample-build-1 Created: 7 hours ago Labels: buildconfig=ruby-sample-build,name=ruby-sample-build,tem plate=application-template-stibuild Build Config: ruby-sample-build Status: ?[1mPending?[0m Duration: waiting for 7h0m57s Build Pod: ruby-sample-build-1 Strategy: Source Image Reference: DockerImage docker.io/wzheng/ruby-20-centos7:latest Pull Secret Name: newsecret Incremental Build: yes Source Type: Git URL: git://github.com/openshift/ruby-hello-world.git Output to: origin-ruby-sample:latest Output Spec: <none> Events: FirstSeen LastSeen Count From SubobjectPath Reason Message Tue, 26 May 2015 23:01:46 -0700 Tue, 26 May 2015 23:01:46 -0700 1 {scheduler } scheduled Successfully ass igned ruby-sample-build-1 to minion2.cluster.local Tue, 26 May 2015 23:01:46 -0700 Tue, 26 May 2015 23:04:54 -0700 9 {kubelet minion2.cluster.local} failedMount Unable to mount volumes for pod "ruby-sample-build-1_test": secrets "newsecret" not found Tue, 26 May 2015 23:02:08 -0700 Tue, 26 May 2015 23:04:54 -0700 8 {kubelet minion2.cluster.local} failedSync Error syncing po d, skipping: secrets "newsecret" not found
After delete the sercet of SourceSecret, the build keeps pending for 2h, don't fail after 30mins Build should fail as the code designed: https://github.com/openshift/origin/pull/2220 Reopen this bug to track this issue.Test in devenv-fedora_2389. $ oc describe builds ruby-sample-build-3 Name: ruby-sample-build-3 Created: 2 hours ago Labels: app=test,buildconfig=ruby-sample-build,name=ruby-sample-build,template=application-template-stibuild Annotations: openshift.io/build.number=3 Build Config: ruby-sample-build Status: Pending Duration: waiting for 2h34m36s Build Pod: ruby-sample-build-3-build Strategy: Source Image Reference: DockerImage openshift/ruby-20-centos7@sha256:720cae28b6a001172ec9a1683b10be5b9f9c9e97cb5f62c27349e351cd0bb088 Source Type: Git URL: https://github.com/openshift/ruby-hello-world.git Source Secret: mysecret Output to: ImageStreamTag origin-ruby-sample:latest Push Secret: builder-dockercfg-x2f18 Events: FirstSeen LastSeen Count From SubobjectPath Reason Message 2h 0s 928 {kubelet ip-172-18-3-198} failedMount Unable to mount volumes for pod "ruby-sample-build-3-build_xiuwang": secrets "mysecret" not found 2h 0s 928 {kubelet ip-172-18-3-198} failedSync Error syncing pod, skipping: secrets "mysecret" not found
This is working as expected, see #5. I'm closing the issue.
@maciej Could you change back this bug to on_qa?Since the comment #1 is a real bug,and has been fixed. I just reopen a old bug. Thanks!
Done.
Verified as comment #5.