Bug 1220979 - Build keeps pending if using either incorrect sourcesecretname or incorrect pullsecrectname
Summary: Build keeps pending if using either incorrect sourcesecretname or incorrect p...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OKD
Classification: Red Hat
Component: Build
Version: 3.x
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Maciej Szulik
QA Contact: Wenjing Zheng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-05-13 02:17 UTC by DeShuai Ma
Modified: 2015-11-23 21:14 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-23 21:14:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description DeShuai Ma 2015-05-13 02:17:27 UTC
Description of problem:
When delete the secret, then restart a build which use the secret to access private repo, the build status is always Pending and blocked.

Version-Release number of selected component (if applicable):
openshift v0.5-202-gdf30dfa
kubernetes v0.16.2-338-gc07896e

How reproducible:
Always

Steps to Reproduce:
1. Generate a ssh key and upload the public key to github
$ ssh-keygen
$ cat ~/.ssh/id_rsa.pub

2. Create a new project
$ osadm new-project test

3. Create a secret
$ cat secret.json
{
   "apiVersion": "v1beta3",
   "kind": "Secret",
   "metadata": {
      "name": "mysecret"
   },
   "data": {
      "ssh-privatekey": "<<< place here the result of base64 -w 0 ~/.ssh/id_rsa >>>"
   }
}
$ osc create -f secret.json -n test

4. Edit application-template-stibuild.json, add ref info to it like below:
$ cd /data/src/github.com/openshift/origin/examples/sample-app
$ vim application-template-stibuild.json
{
   "apiVersion": "v1beta1",
   "kind": "BuildConfig",
   "metadata": {
     "name": "ruby-sample-build",
...
     "source": {
       "git": {
         "uri": "git:openshift/ruby-hello-world.git"
       },
       "sourceSecretName": "mysecret",
       "type": "Git"
     },
}

5. Submit the application template for processing and create the application using the processed template:
$ osc process -n test -f application-template-stibuild.json | osc create -n test -f -

6. Start a build and check the build result
$ osc start-build $buildConfig -n test
$ osc get build -n test

7. Delete the secret
$ osc delete secret mysecret -n test

8. Restart build and check the build result
$ osc start-build $buildConfig -n test
$ osc get build -n test
$ osc build-logs $buildname -n test

Actual results:
8.The build ruby-sample-build-2 status is always Pending
$ osc get build -n test
NAME                  TYPE      STATUS     POD
ruby-sample-build-1   STI       Complete   ruby-sample-build-1
ruby-sample-build-2   STI       Pending    ruby-sample-build-2
$ osc build-logs ruby-sample-build-2 -n test
Error from server: timed out waiting for build

Expected results:
8.The build should be failed, and tip can't find secret in buildlogs.

Additional info:

Comment 1 Maciej Szulik 2015-05-13 20:14:04 UTC
The problem is with k8s, see this issue https://github.com/GoogleCloudPlatform/kubernetes/issues/8178

Comment 2 Wenjing Zheng 2015-05-15 05:55:54 UTC
If use incorrect pull secret name in build strategy, the build will keep pending and cannot see any warning, except from openshift.log:
E0515 05:26:06.229295    1220 pod_workers.go:108] Error syncing pod 29ba3c6c-fac2-11e4-9a9c-22000ba092c3, skipping: secrets "pull123" not found
E0515 05:26:08.995529    1220 secret.go:117] Couldn't get secret wzheng1/pull123
E0515 05:26:08.995561    1220 kubelet.go:1036] Unable to mount volumes for pod "ruby-sample-build-2_wzheng1": secrets "pull123" not found; skipping pod
E0515 05:26:09.203327    1220 pod_workers.go:108] Error syncing pod 29ba3c6c-fac2-11e4-9a9c-22000ba092c3, skipping: secrets "pull123" not found
E0515 05:26:12.012156    1220 secret.go:117] Couldn't get secret wzheng1/pull123
E0515 05:26:12.012193    1220 kubelet.go:1036] Unable to mount volumes for pod "ruby-sample-build-2_wzheng1": secrets "pull123" not found; skipping pod

Here is my build strategy:
{
  "strategy": {
    "stiStrategy": {
      "from": {
        "kind": "DockerImage",
        "name": "docker.io/wzheng/ruby-20-centos7:latest"
       },
       "pullSecretName": "pull123"  ----this secret doesn't exist
    },
    "type": "STI"
  }
}

Comment 3 Michal Fojtik 2015-05-15 07:42:51 UTC
Maciej, is this something you have been worked on re: build failure retries?

Comment 4 Maciej Szulik 2015-05-15 09:31:36 UTC
Michal, I need to sync with Paul about his investigation he mentioned in k8s Issue.

Comment 5 Maciej Szulik 2015-05-22 11:58:01 UTC
The result of the discussion from [1] is this is the expected situation for pod to hang endlessly waiting for the secret. To provide some kind of solution for end users we'll show pod events (which contain the information about missing/wrong secret) when doing osc describe build along with build events [3], as discussed in [2]. Additionally where we have control over objects we'll fail the build after 30 mins.

[1] https://github.com/GoogleCloudPlatform/kubernetes/issues/8178
[2] https://github.com/openshift/origin/issues/2269
[3] https://github.com/openshift/origin/pull/2220

Comment 6 Wenjing Zheng 2015-05-26 09:59:17 UTC
(In reply to Maciej Szulik from comment #5)
> The result of the discussion from [1] is this is the expected situation for
> pod to hang endlessly waiting for the secret. To provide some kind of
> solution for end users we'll show pod events (which contain the information
> about missing/wrong secret) when doing osc describe build along with build
> events [3], as discussed in [2]. Additionally where we have control over
> objects we'll fail the build after 30 mins.
> 
> [1] https://github.com/GoogleCloudPlatform/kubernetes/issues/8178
> [2] https://github.com/openshift/origin/issues/2269
> [3] https://github.com/openshift/origin/pull/2220

There is no pod event in build description when using incorrect secret: 
[fedora@ip-10-229-66-143 sample-app]$ osc describe builds ruby-sample-build-2
Name:			ruby-sample-build-2
Created:		5 minutes ago
Labels:			buildconfig=ruby-sample-build,name=ruby-sample-build,template=application-template-stibuild
Build Config:		ruby-sample-build
Status:			Pending
Duration:		waiting for 5m7s
Build Pod:		ruby-sample-build-2
Strategy:		Source
Image Reference:	DockerImage docker.io/wzheng/ruby-20-centos7:latest
Pull Secret Name:	newsecret123
Incremental Build:	yes
Source Type:		Git
URL:			git://github.com/openshift/ruby-hello-world.git
Output to:		origin-ruby-sample:latest
Output Spec:		<none>
No events.

Comment 7 Maciej Szulik 2015-05-26 13:06:03 UTC
Tested against latest master (commit id: 54aed090d8ad32e228cb601a9695c00198061af8), and I got following result:

[vagrant@openshiftdev origin]$ osc describe bc ruby-sample-build
Name:                   ruby-sample-build
Created:                3 minutes ago
Labels:                 name=ruby-sample-build,template=application-template-stibuild
Latest Version:         1
Strategy:               Source
Image Reference:        ImageStreamTag ruby-20-centos7:latest
Incremental Build:      yes
Source Type:            Git
URL:                    git://github.com/openshift/ruby-hello-world.git
Source Secret:          some-secret
Output to:              origin-ruby-sample:latest
Output Spec:            <none>
Webhook Github:         https://localhost:8443/osapi/v1beta3/namespaces/test/buildconfigs/ruby-sample-build/webhooks/secret101/github
Webhook Generic:        https://localhost:8443/osapi/v1beta3/namespaces/test/buildconfigs/ruby-sample-build/webhooks/secret101/generic
Image Repository Trigger
- LastTriggeredImageID: openshift/ruby-20-centos7:latest
Builds:
  Name                  Status          Duration                Creation Time
  ruby-sample-build-1   pending         waiting for 3m27s       2015-05-26 13:02:12 +0000 UTC

[vagrant@openshiftdev origin]$ osc describe build ruby-sample-build-1
Name:                   ruby-sample-build-1
Created:                58 seconds ago
Labels:                 buildconfig=ruby-sample-build,name=ruby-sample-build,template=application-template-stibuild
Build Config:           ruby-sample-build
Status:                 Pending
Duration:               waiting for 58s
Build Pod:              ruby-sample-build-1
Strategy:               Source
Image Reference:        DockerImage openshift/ruby-20-centos7:latest
Incremental Build:      yes
Source Type:            Git
URL:                    git://github.com/openshift/ruby-hello-world.git
Source Secret:          some-secret
Output to:              origin-ruby-sample:latest
Output Spec:            <none>
Events:
  FirstSeen                             LastSeen                        Count   From                            SubobjectPath   Reason          Message
  Tue, 26 May 2015 13:02:12 +0000       Tue, 26 May 2015 13:02:12 +0000 1       {scheduler }                                    scheduled       Successfully assigned ruby-sample-build-1 to openshiftdev.local
  Tue, 26 May 2015 13:02:12 +0000       Tue, 26 May 2015 13:03:02 +0000 6       {kubelet openshiftdev.local}                    failedMount     Unable to mount volumes for pod "ruby-sample-build-1_test": secrets "some-secret" not found
  Tue, 26 May 2015 13:02:12 +0000       Tue, 26 May 2015 13:03:02 +0000 6       {kubelet openshiftdev.local}                    failedSync      Error syncing pod, skipping: secrets "some-secret" not found

Comment 8 Wenjing Zheng 2015-05-27 06:21:46 UTC
Works now, thanks! openshift v0.5.2.0-176-gc386339 kubernetes v0.17.0-441-g6b6b47a
ruby-sample-build-1
Created:                7 hours ago
Labels:                 buildconfig=ruby-sample-build,name=ruby-sample-build,tem
plate=application-template-stibuild
Build Config:           ruby-sample-build
Status:                 ?[1mPending?[0m
Duration:               waiting for 7h0m57s
Build Pod:              ruby-sample-build-1
Strategy:               Source
Image Reference:        DockerImage docker.io/wzheng/ruby-20-centos7:latest
Pull Secret Name:       newsecret
Incremental Build:      yes
Source Type:            Git
URL:                    git://github.com/openshift/ruby-hello-world.git
Output to:              origin-ruby-sample:latest
Output Spec:            <none>
Events:
  FirstSeen                             LastSeen                        Count
From                            SubobjectPath   Reason          Message
  Tue, 26 May 2015 23:01:46 -0700       Tue, 26 May 2015 23:01:46 -0700 1
{scheduler }                                    scheduled       Successfully ass
igned ruby-sample-build-1 to minion2.cluster.local
  Tue, 26 May 2015 23:01:46 -0700       Tue, 26 May 2015 23:04:54 -0700 9
{kubelet minion2.cluster.local}                 failedMount     Unable to mount
volumes for pod "ruby-sample-build-1_test": secrets "newsecret" not found
  Tue, 26 May 2015 23:02:08 -0700       Tue, 26 May 2015 23:04:54 -0700 8
{kubelet minion2.cluster.local}                 failedSync      Error syncing po
d, skipping: secrets "newsecret" not found

Comment 9 XiuJuan Wang 2015-09-24 06:04:02 UTC
After delete the sercet of SourceSecret, the build keeps pending for 2h, don't fail after 30mins

Build should fail as the code designed:
https://github.com/openshift/origin/pull/2220

Reopen this bug to track this issue.Test in devenv-fedora_2389.

$ oc describe  builds  ruby-sample-build-3
Name:			ruby-sample-build-3
Created:		2 hours ago
Labels:			app=test,buildconfig=ruby-sample-build,name=ruby-sample-build,template=application-template-stibuild
Annotations:		openshift.io/build.number=3
Build Config:		ruby-sample-build
Status:			Pending
Duration:		waiting for 2h34m36s
Build Pod:		ruby-sample-build-3-build
Strategy:		Source
Image Reference:	DockerImage openshift/ruby-20-centos7@sha256:720cae28b6a001172ec9a1683b10be5b9f9c9e97cb5f62c27349e351cd0bb088
Source Type:		Git
URL:			https://github.com/openshift/ruby-hello-world.git
Source Secret:		mysecret
Output to:		ImageStreamTag origin-ruby-sample:latest
Push Secret:		builder-dockercfg-x2f18
Events:
  FirstSeen	LastSeen	Count	From				SubobjectPath	Reason		Message
  2h		0s		928	{kubelet ip-172-18-3-198}			failedMount	Unable to mount volumes for pod "ruby-sample-build-3-build_xiuwang": secrets "mysecret" not found
  2h		0s		928	{kubelet ip-172-18-3-198}			failedSync	Error syncing pod, skipping: secrets "mysecret" not found

Comment 10 Maciej Szulik 2015-09-24 13:14:00 UTC
This is working as expected, see #5. I'm closing the issue.

Comment 11 XiuJuan Wang 2015-09-25 00:56:59 UTC
@maciej Could you change back this bug to on_qa?Since the comment #1 is a real bug,and has been fixed. I just reopen a old bug.
Thanks!

Comment 12 Maciej Szulik 2015-09-25 19:53:06 UTC
Done.

Comment 13 Wenjing Zheng 2015-09-28 10:25:59 UTC
Verified as comment #5.


Note You need to log in before you can comment on or make changes to this bug.