Bug 1721847 - Jenkins builds (source strategy) get intermittently stuck at the git-clone operation
Summary: Jenkins builds (source strategy) get intermittently stuck at the git-clone op...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Build
Version: 3.11.0
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: 3.11.z
Assignee: Adam Kaplan
QA Contact: wewang
URL:
Whiteboard: stale
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-06-19 07:08 UTC by Christian Koep
Modified: 2020-02-06 19:21 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-02-06 18:13:47 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 4165121 None None None 2019-06-19 07:11:47 UTC

Description Christian Koep 2019-06-19 07:08:40 UTC
Description of problem:
- Running a Jenkins build in OpenShift Container Platform sometimes results in the build getting stuck infinitely. As a result, subsequent builds are being cancelled with a similar error as follows:

~~~
Error running start-build on at least one item: [buildconfig/<OMITTED>];
{reference={}, err=Uploading directory "/var/lib/jenkins/jobs/Monitoring jobs/jobs/<OMITTED>/workspace" as binary input for the build ...
Unable to connect to the server: net/http: HTTP/1.x transport connection broken: write tcp 1.2.3.4:45792->5.6.7.8:443: write: connection reset by peer, verb=start-build, cmd=oc --server=https://server.example.com --insecure-skip-tls-verify --namespace=<OMITTED> --token=<OMITTED> start-build buildconfig/<OMITTED> --follow --from-dir='/var/lib/jenkins/jobs/Monitoring jobs/jobs/<OMITTED>/workspace' -o=name , out=, status=1}
~~~

An analysis of the master logs showed the following error message:

~~~
Apr 29 14:00:06 omitted.example.com atomic-openshift-node[20057]: E0429 14:00:06.114257   20057 status_manager.go:335] Status update on pod <OMITTED>/<OMITTED>-668-build aborted: terminated container git-clone attempted illegal transition to non-terminated state
~~~

Further analysis has shown that the build gets stuck during the git-clone operation:

~~~
docker_ps_-a:0471ca94f351        d3d2fbc373fb                                                                                                                     "openshift-git-clo..."   2 hours ago           Up 2 hours                                          k8s_git-clone_<OMITTED>-668-build_<OMITTED>_412b6b51-6a76-11e9-b041-00505698544b_0

docker_ps_-a:2fcf04946b3c        registry.access.redhat.com/openshift3/ose-pod:v3.11.88                                                                           "/usr/bin/pod"           2 hours ago           Up 2 hours                                          k8s_POD_<OMITTED>-668-<OMITTED>_412b6b51-6a76-11e9-b041-00505698544b_0
~~~

I will attach more data privately to this Bugzilla.

Version-Release number of selected component (if applicable):
- Red Hat OpenShift Container Platform 3.11.88

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
- Jenkins builds (source strategy) get intermittently stuck at the git-clone operation

Expected results:
- Jenkins build complete successfully.

Additional info:
- This issue was initially reported in RHBZ#1705557

Comment 36 Caden Marchese 2019-07-18 16:39:45 UTC
Customer is not able to exec or rsh into the pods:

error: unable to upgrade connection: container not found ("docker-build")

Comment 76 Ryan Phillips 2019-09-18 13:29:24 UTC
Reassigning to Adam, because there might be another issue with the git clone within the builder tool.

Comment 84 Tony Garcia 2019-10-15 21:36:53 UTC
Hi Adam,

Have you had a chance to review the customer output Ben provided the other day?

Comment 86 Adam Kaplan 2019-10-16 13:21:30 UTC
Issue summary (since the thread is very long):

An OpenShift build with the `Binary` source strategy is initiated from a Jenkins pipeline. The build pod's `bsdtar` process appears to be hanging waiting for content to be uploaded. At present it is not clear why the build pod does not think the upload has completed - this requires further investigation.

As an immediate work around, I recommend switching the Jenkins-initiated builds to clone source from a git-compatible repository (Github, Gitlab, Bitbucket, etc.) [1]. This kind of source strategy is more fault-tolerant than `Binary` source builds. Note that this work-around is not feasible for all situations.

[1] https://docs.openshift.com/container-platform/3.11/dev_guide/builds/build_inputs.html#source-code


Note You need to log in before you can comment on or make changes to this bug.