Description of problem: - Running a Jenkins build in OpenShift Container Platform sometimes results in the build getting stuck infinitely. As a result, subsequent builds are being cancelled with a similar error as follows: ~~~ Error running start-build on at least one item: [buildconfig/<OMITTED>]; {reference={}, err=Uploading directory "/var/lib/jenkins/jobs/Monitoring jobs/jobs/<OMITTED>/workspace" as binary input for the build ... Unable to connect to the server: net/http: HTTP/1.x transport connection broken: write tcp 1.2.3.4:45792->5.6.7.8:443: write: connection reset by peer, verb=start-build, cmd=oc --server=https://server.example.com --insecure-skip-tls-verify --namespace=<OMITTED> --token=<OMITTED> start-build buildconfig/<OMITTED> --follow --from-dir='/var/lib/jenkins/jobs/Monitoring jobs/jobs/<OMITTED>/workspace' -o=name , out=, status=1} ~~~ An analysis of the master logs showed the following error message: ~~~ Apr 29 14:00:06 omitted.example.com atomic-openshift-node[20057]: E0429 14:00:06.114257 20057 status_manager.go:335] Status update on pod <OMITTED>/<OMITTED>-668-build aborted: terminated container git-clone attempted illegal transition to non-terminated state ~~~ Further analysis has shown that the build gets stuck during the git-clone operation: ~~~ docker_ps_-a:0471ca94f351 d3d2fbc373fb "openshift-git-clo..." 2 hours ago Up 2 hours k8s_git-clone_<OMITTED>-668-build_<OMITTED>_412b6b51-6a76-11e9-b041-00505698544b_0 docker_ps_-a:2fcf04946b3c registry.access.redhat.com/openshift3/ose-pod:v3.11.88 "/usr/bin/pod" 2 hours ago Up 2 hours k8s_POD_<OMITTED>-668-<OMITTED>_412b6b51-6a76-11e9-b041-00505698544b_0 ~~~ I will attach more data privately to this Bugzilla. Version-Release number of selected component (if applicable): - Red Hat OpenShift Container Platform 3.11.88 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: - Jenkins builds (source strategy) get intermittently stuck at the git-clone operation Expected results: - Jenkins build complete successfully. Additional info: - This issue was initially reported in RHBZ#1705557
Customer is not able to exec or rsh into the pods: error: unable to upgrade connection: container not found ("docker-build")
Reassigning to Adam, because there might be another issue with the git clone within the builder tool.
Hi Adam, Have you had a chance to review the customer output Ben provided the other day?
Issue summary (since the thread is very long): An OpenShift build with the `Binary` source strategy is initiated from a Jenkins pipeline. The build pod's `bsdtar` process appears to be hanging waiting for content to be uploaded. At present it is not clear why the build pod does not think the upload has completed - this requires further investigation. As an immediate work around, I recommend switching the Jenkins-initiated builds to clone source from a git-compatible repository (Github, Gitlab, Bitbucket, etc.) [1]. This kind of source strategy is more fault-tolerant than `Binary` source builds. Note that this work-around is not feasible for all situations. [1] https://docs.openshift.com/container-platform/3.11/dev_guide/builds/build_inputs.html#source-code