Hide Forgot
Created attachment 1193884 [details] Output of oc describe pod/tau-web-dev-gfa-18-z5nvk Description of problem: As of yesterday all my deployments have been timing out due to errors well before even getting round to pulling the image. Looking at the events on the failed pod, I see lots of messages like "Error syncing pod, skipping: failed to "StartContainer" for "POD" with RunContainerError: "runContainer: API error (500): Cannot start container 8ed16139f51fb937b8b9ce1747f062142bf1ffe7dd2792031617d92536e8cd0c: [8] System error: read parent: connection reset by peer\n" More detailed output of "oc describe" for the given pod attached. How reproducible: Consistently reproducible. Steps to Reproduce: 1. Log in to OpenShift Online as GitHub user 'pjnagel'. 2. Run "oc deploy tau-web-dev-gfa --retry -n tau-dev", or navigate to tau-web-dev-gfa in console and click 'deploy'. Actual results: At some pod a pod will be visible in the overview section of the web console. It will remain in "Container creating" status for a long time. Clicking on the pod and going to "events" tab shows errors as described above. Expected results: Expected the pod to at least be created and proceed to pulling and running the image.
Note: yesterday, before I started experiencing this bug on this deploymentconfig, I first experienced the bug I just reported as 1370056.
Moving this to containers team as this seems to be a Docker issue.
After researching issue, it appears to be caused by a lack of resources allocated causing the issue. A better error message could help with the confusion.
Issue should be resolved in docker builds including https://github.com/projectatomic/docker/commit/9d9f154f20a906820698c34ee3fc4b6c452fe5b8
The docker version that we now have in INT/STG/PROD should have this fix. Moving this to QE to test.
Can't reproduce this issue on INT, will verify it. openshift version openshift v3.3.1.1+cb482ab-dirty kubernetes v1.3.0+52492b4 etcd 2.3.0+git
Can't reproduce this issue on STG too.