| Summary: | Deployments unable to start pods due to "connection reset by peer" | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Online | Reporter: | Pieter Nagel <pieter> | ||||
| Component: | Deployments | Assignee: | Michal Fojtik <mfojtik> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | zhou ying <yinzhou> | ||||
| Severity: | low | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 3.x | CC: | abhgupta, aos-bugs, jhonce, jokerman, mmccomas | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2017-02-16 22:12:25 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
Note: yesterday, before I started experiencing this bug on this deploymentconfig, I first experienced the bug I just reported as 1370056. Moving this to containers team as this seems to be a Docker issue. After researching issue, it appears to be caused by a lack of resources allocated causing the issue. A better error message could help with the confusion. Issue should be resolved in docker builds including https://github.com/projectatomic/docker/commit/9d9f154f20a906820698c34ee3fc4b6c452fe5b8 The docker version that we now have in INT/STG/PROD should have this fix. Moving this to QE to test. Can't reproduce this issue on INT, will verify it. openshift version openshift v3.3.1.1+cb482ab-dirty kubernetes v1.3.0+52492b4 etcd 2.3.0+git Can't reproduce this issue on STG too. |
Created attachment 1193884 [details] Output of oc describe pod/tau-web-dev-gfa-18-z5nvk Description of problem: As of yesterday all my deployments have been timing out due to errors well before even getting round to pulling the image. Looking at the events on the failed pod, I see lots of messages like "Error syncing pod, skipping: failed to "StartContainer" for "POD" with RunContainerError: "runContainer: API error (500): Cannot start container 8ed16139f51fb937b8b9ce1747f062142bf1ffe7dd2792031617d92536e8cd0c: [8] System error: read parent: connection reset by peer\n" More detailed output of "oc describe" for the given pod attached. How reproducible: Consistently reproducible. Steps to Reproduce: 1. Log in to OpenShift Online as GitHub user 'pjnagel'. 2. Run "oc deploy tau-web-dev-gfa --retry -n tau-dev", or navigate to tau-web-dev-gfa in console and click 'deploy'. Actual results: At some pod a pod will be visible in the overview section of the web console. It will remain in "Container creating" status for a long time. Clicking on the pod and going to "events" tab shows errors as described above. Expected results: Expected the pod to at least be created and proceed to pulling and running the image.