During online testing, when attempting to verify deployments, a timing issue occurs were if we attempt to retrieve the RC before it is created, we get: ERROR: Build step failed with exception com.openshift.restclient.OpenShiftException: Could not get resource frontend-prod-1 in namespace gmontero-online-hackday: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"replicationcontrollers \"frontend-prod-1\" not found","reason":"NotFound","details":{"name":"frontend-prod-1","kind":"replicationcontrollers"},"code":404} at com.openshift.internal.restclient.DefaultClient.createOpenShiftException(DefaultClient.java:482) at com.openshift.internal.restclient.DefaultClient.get(DefaultClient.java:306) at com.openshift.jenkins.plugins.pipeline.IOpenShiftPlugin.getLatestReplicationController(IOpenShiftPlugin.java:64) at com.openshift.jenkins.plugins.pipeline.OpenShiftDeploymentVerifier.coreLogic(OpenShiftDeploymentVerifier.java:101) at com.openshift.jenkins.plugins.pipeline.IOpenShiftPlugin.doItCore(IOpenShiftPlugin.java:97) at com.openshift.jenkins.plugins.pipeline.IOpenShiftPlugin.doIt(IOpenShiftPlugin.java:111) at com.openshift.jenkins.plugins.pipeline.OpenShiftBaseStep.perform(OpenShiftBaseStep.java:89) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:782) at hudson.model.Build$BuildExecution.build(Build.java:205) at hudson.model.Build$BuildExecution.doRun(Build.java:162) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:534) at hudson.model.Run.execute(Run.java:1738) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:98) at hudson.model.Executor.run(Executor.java:410) Caused by: com.openshift.internal.restclient.http.NotFoundException: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"replicationcontrollers \"frontend-prod-1\" not found","reason":"NotFound","details":{"name":"frontend-prod-1","kind":"replicationcontrollers"},"code":404} at com.openshift.internal.restclient.http.UrlConnectionHttpClient.createException(UrlConnectionHttpClient.java:230) at com.openshift.internal.restclient.http.UrlConnectionHttpClient.request(UrlConnectionHttpClient.java:165) at com.openshift.internal.restclient.http.UrlConnectionHttpClient.request(UrlConnectionHttpClient.java:141) at com.openshift.internal.restclient.http.UrlConnectionHttpClient.get(UrlConnectionHttpClient.java:103) at com.openshift.internal.restclient.DefaultClient.get(DefaultClient.java:302) ... 14 more Caused by: java.io.FileNotFoundException: https://openshift.default.svc.cluster.local/api/v1/namespaces/gmontero-online-hackday/replicationcontrollers/frontend-prod-1 at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1836) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:254) at com.openshift.internal.restclient.http.UrlConnectionHttpClient.request(UrlConnectionHttpClient.java:161) ... 17 more Need to ensure that getLatestReplicationController() catches exceptions from the restclient and simply returns null, so that higher level retry logic can operate correctly.
Fixed pushed to openshift/jenkins-plugin with commit https://github.com/openshift/jenkins-plugin/commit/1c7c55083cdafe9db2f3074e4ef707186545b06b An jenkins image update will probably start within the week anyway because of future work.
RHEL Jenkins images with v1.0.10 of the plugin, which has the fix for this bug, are no available on brew-pulp.
@ Gabe Montero, could you add steps to how to reproduce, I am a little confused how to verify the bug, thanks
@Wen Wang - unfortunately, this is a tricky timing window that will be hard to reproduce. When I found it and then locally verified my fix, there were timing issues during the Online Hackthon, given the heavy usage, which delayed the actual creation of ReplicationControllers when a deployment was initiated and the plugin had a "Verify OpenShift Deployment" step. If you can somehow replicate that delay, and run the "Verify OpenShift Deployment" with "Allow for verbose logging during this build step plug-in" turned on, you should see an exception like in the description, but retries should occur, and if the deployment is ultimately successful, the "Verify OpenShift Deployment" will ultimately report success. My best guess at being able to artificially manufacture this delay is to define a DeploymentConfig with a pre lifecycle hook that say sleeps for say 60 seconds. It is not my area of expertise, but if I'm reading the code right, that could mimic this delay. Then, create a Jenkins job which deploys this DeploymentConfig and then attempts to verify the deploy, with verbose logging so you see that the exception occurs initially, but then ultimately the ReplicationController is created and the verify succeeds.
Created attachment 1161707 [details] config verify openshift deployment
Tested for openshift3/jenkins-1-rhel7 8fe7d109f5bd My env is : [root@dhcp-128-91 build]# oc get dc NAME REVISION REPLICAS TRIGGERED BY frontend 1 1 config,image(origin-nodejs-sample:latest) frontend-prod 0 1 config,image(origin-nodejs-sample:prod) jenkins 1 1 config,image(jenkins:latest) [root@dhcp-128-91 build]# oc get rc NAME DESIRED CURRENT AGE frontend-1 1 1 6m jenkins-1 1 1 12m [root@dhcp-128-91 build]# oc get pods NAME READY STATUS RESTARTS AGE frontend-1-build 0/1 Completed 0 9m frontend-1-nts1f 1/1 Running 0 6m jenkins-1-yxomb 1/1 Running 0 12m when configure "verify openshift deployment" like attachement , build job ,failed error: http://pastebin.test.redhat.com/377590 and ask a question, the build step is equal to what command in the background?
Created attachment 1161725 [details] verify Openshift build UI
and configure "verify openshift deployment" like attachment 1161725 [details], build success, but dc ,rc and pod have no change.pls see console output:http://pastebin.test.redhat.com/377596
The plugin reacted correctly in my opinion. You never started a frontend-prod deployment. The `oc get rc` and `oc get pods` confirms that. I could see making the message a bit clearer when a deployment is not available, but I don't think we should gate this bugzilla's verification on that. Add a "Tag OpenShift Image" step prior to the "Verify OpenShift Deployment", were you tag origin-nodejs-sample:latest to origin-nodejs-sample:prod. Also, with the screen shots you posted, unless you scale out the deployment to 3 before running the verify, that failure will be noted. Lastly, I saw no indication that you attempted the sabotage I articulated in #Comment 5. If that is too involved, and you simply want to do some regression testing to make sure I did not break the typical mainline path, I am OK with that. Just wanted to confirm that is what you were thinking.
@ Gabe Montero, I verified below: 1. Configure "Tag OpenShift Image", set origin-nodejs-sample to new tag:prod # oc get is NAME DOCKER REPO TAGS UPDATED nodejs-010-rhel7 172.30.153.230:5000/wewang/nodejs-010-rhel7 origin-nodejs-sample 172.30.153.230:5000/wewang/origin-nodejs-sample prod,latest 19 seconds ago 2. So there is rc: frontend-prod-1 # oc get rc NAME DESIRED CURRENT AGE frontend-1 1 1 2h frontend-prod-1 1 1 2m jenkins-1 1 1 2h 3. Configure"verify openshift deployment",and build job ,build complete "Verify OpenShift Deployment" successfully; deployment "frontend-prod-1" has completed with status: [Complete]. and console output:http://pastebin.test.redhat.com/378066 4.also using below template to check $oc new-app -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/image/language-image-templates/python-34-rhel7-stibuild.json(#change timeout to 60) Configure"verify openshift deployment",and build job ,build complete is there anything I should verify ? if no, will change status to "verified" ,I am not sure understand comment5 totally, so wait your reply to deal the bug
Yeah, at this point, let's not worry about Comment #5. I wasn't 100% sure it was a viable sabotage anyway. And as I said before, I was able to try this change in the unstable online env the day I found this. Go ahead and mark this verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2016:1206