Bug 1196617
Summary: | Proxy error in response on new Jenkins build | ||
---|---|---|---|
Product: | OpenShift Online | Reporter: | Jaspreet Kaur <jkaur> |
Component: | Containers | Assignee: | Dan Mace <dmace> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | libra bugs <libra-bugs> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 2.x | CC: | agrimm, bmeng, dmace, dmcphers, erich, jhonce, jokerman, libra-bugs, mhicks, mmccomas, nicholas_schuetz, wdecoste, wsun |
Target Milestone: | --- | ||
Target Release: | 2.x | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | 906863 | Environment: | |
Last Closed: | 2015-07-07 23:49:30 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 906863 | ||
Bug Blocks: | 1176649 |
Description
Jaspreet Kaur
2015-02-26 11:54:04 UTC
I believe this issue was a symptom of https://bugzilla.redhat.com/show_bug.cgi?id=1134686. I think I've verified that the fix for #1134686 also fixes this bug: 0. Edited the watchman gear state plugin configuration to be aggressive (20s delay) in order to improve my chances of watchman finding the builder, restarted watchman 1. Created a jbosseap app via rhc 2. Added Jenkins via web console (just to be as close to the reported scenario as possible) 3. Used a git push to invoke a build of jbosseap 4. Observed success after repeated attempts, no relevant watchman activity 5. Reverted patch from #1134686 6. Used a git push to invoke a build of jbosseap 7. Observed the reported client failure coinciding with log entries indicating the watchman restarted the builder 8. Re-applied patch from #1134686, restarted watchman 9. Manually deleted the now defunct/disconnected builder from the Jenkins console 10. Used a git push to invoke a build of jbosseap 11. Observed success after repeated attempts, no relevant watchman activity I'm moving this issue to ON_QA. Please re-open it if the bug can be reproduced in an environment also containing the fix for #1134686. Checked on devenv-stage_1138, with the steps described in c#1 Without the patch, the jenkins build will show error in the output. And after apply the patch, it can be done without errors, Move the bug to verified. Jaspreet, The fix for this was rolled out to production. Are you still experiencing the issue? Yes Dan, The issue is still seen. I tried to git push today. The first time it succedded but the second push resulted in the same failure as below: [root@jkaur webapp]# git push Counting objects: 11, done. Delta compression using up to 4 threads. Compressing objects: 100% (4/4), done. Writing objects: 100% (6/6), 421 bytes, done. Total 6 (delta 3), reused 0 (delta 0) remote: Executing Jenkins build. remote: remote: You can track your build at https://jenkins-jasdomain.rhcloud.com/job/jbosseap-build remote: remote: Waiting for build to schedule......Done remote: Waiting for job to complete................................................./opt/rh/ruby193/root/usr/share/ruby/json/common.rb:155:in `parse': 757: unexpected token at '<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> (JSON::ParserError) remote: <html><head> remote: <title>502 Proxy Error</title> remote: </head><body> remote: <h1>Proxy Error</h1> remote: <p>The proxy server received an invalid remote: response from an upstream server.<br /> remote: The proxy server could not handle the request <em><a href="/job/jbosseap-build/2/api/json">GET /job/jbosseap-build/2/api/json</a></em>.<p> remote: Reason: <strong>Error reading from remote server</strong></p></p> remote: <hr> remote: <address>Apache/2.2.15 (Red Hat) Server at jenkins-jasdomain.rhcloud.com Port 443</address> remote: </body></html>' remote: from /opt/rh/ruby193/root/usr/share/ruby/json/common.rb:155:in `parse' remote: from /var/lib/openshift/55029cc3e0b8cd2ae200011c/jenkins-client//bin/jenkins_build:35:in `get_job_info' remote: from /var/lib/openshift/55029cc3e0b8cd2ae200011c/jenkins-client//bin/jenkins_build:91:in `<main>' remote: !!!!!!!! remote: Deployment Halted! remote: If the build failed before the deploy step, your previous remote: build is still running. Otherwise, your application may be remote: partially deployed or inaccessible. remote: Fix the build and try again. remote: !!!!!!!! remote: An error occurred executing 'gear postreceive' (exit code: 1) remote: Error message: CLIENT_ERROR: Failed to execute: 'control post-receive' for /var/lib/openshift/55029cc3e0b8cd2ae200011c/jenkins-client remote: remote: For more details about the problem, try running the command again with the '--trace' option. To ssh://55029cc3e0b8cd2ae200011c.com/~/git/jbosseap.git/ 6708c48..d85377c master -> master This is an entirely separate issue, and I believe it is related to https://bz.apache.org/bugzilla/show_bug.cgi?id=37770 Unfortunately, the only suggestions there involve turning off keepalives & connection pooling in certain cases, and the performance benefits of those features were the primary reason we switched back to vhosts in OpenShift Online. I think that maybe a more reasonable solution would be to have some error handling & retry logic in the jenkins-client code. Currently, the specific problem that the user sees is that get_jobs_info exec's curl and then expects that the data returned is valid JSON, regardless of the HTTP status. Of course, the limitation of using curl throughout this code instead of a proper ruby http client library is that we cannot get both the http status _and_ the content back from the subshell ; we'd have to write the content to a file and then read it back in. So maybe we just rescue and pass (or emit a warning) on JSON parsing errors, and leave it at that. I agree with Andy's assessment in comment #6. The jenkins_build script needs converted to use net/http so it can gracefully handle errors. FYI, I'm nearly done with a significant refactor to jenkins_build to use the Ruby net/http library. Using net/http allows the program to easily retry requests and correctly interpret responses. There's a WIP PR for discussion: https://github.com/openshift/origin-server/pull/6098 Checked on devenv_5475, Jenkins build will return meaningful message when the build failed now. Modify the jenkins_build script to simulate the failure, 1. Make the get_jobs_info fail: remote: Executing Jenkins build. remote: remote: You can track your build at https://jenkins-bmengdev.dev.rhcloud.com/job/perl1-build remote: remote: Couldn't look up the last build number. Jenkins may be inaccessible. remote: Error: Couldn't list jobs: HTTP 200 OK 2. Make the schedule_build fail remote: Executing Jenkins build. remote: remote: You can track your build at https://jenkins-bmengdev.dev.rhcloud.com/job/perl1-build remote: remote: Couldn't schedule build. Jenkins may be inaccessible. remote: Error: Couldn't schedule build: HTTP 201 Created Move the bug to verified. Also do regression testing for the jenkins client, no issue found. this should be released on 4/13 Hello, Is this issue resolved now as per the above date? or if there is still more time to be released. Regards, Jaspreet |