Bug 1315672

Summary: api error 500: connection reset by peer
Product: OKD Reporter: theseaofstars
Component: ContainersAssignee: Jhon Honce <jhonce>
Status: CLOSED DEFERRED QA Contact: DeShuai Ma <dma>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.xCC: aos-bugs, mmccomas
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-08 20:40:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description theseaofstars 2016-03-08 12:07:07 UTC
Description of problem:
When starting router and registry on the openshift platform, sometimes the pod does not work. 



output from the "oc get events":

1m 1m 1 docker-registry-1-byw7p Pod FailedSync {kubelet 172.30.5.151} Error syncing pod, skipping: API error (500): Cannot start container bb3c463bce7ac6b82c8c00de18c7a0ebe546a51dc82d29caa2e8ce66d8c97487: [8] System error: read parent: connection reset by peer

1m 1m 1 docker-registry-1-byw7p Pod implicitly required container POD Created {kubelet 172.30.5.151} Created with docker id c8a34eb97b8a
1m 1m 1 docker-registry-1-byw7p Pod implicitly required container POD Failed {kubelet 172.30.5.151} Failed to start with docker id c8a34eb97b8a with error: API error (500): Cannot start container c8a34eb97b8ae8a3873249d3f3dc8c7803dad6a817a544dd5f552c99aa8bca44: [8] System error: read parent: connection reset by peer

1m 1m 1 docker-registry-1-byw7p Pod FailedSync {kubelet 172.30.5.151} Error syncing pod, skipping: API error (500): Cannot start container c8a34eb97b8ae8a3873249d3f3dc8c7803dad6a817a544dd5f552c99aa8bca44: [8] System error: read parent: connection reset by peer

1m 1m 1 docker-registry-1-byw7p Pod implicitly required container POD Created {kubelet 172.30.5.151} Created with docker id b795267195fe




with the container id changing, those events repeat again and again.




output from the system logs on the node the registry pod scheduled to: 

Mar 03 19:09:33 172.30.5.151 atomic-openshift-node[18961]: E0303 19:09:33.043282 18961 manager.go:1867] Failed to create pod infra container: API error (500): Cannot start container a7a3fa839f1aa37aeb6a4bbc43efd85777ee4fd0aa5b9ae39d929b261603900f: [8] System error: read parent: connection reset by peer
Mar 03 19:09:33 172.30.5.151 atomic-openshift-node[18961]: ; Skipping pod "docker-registry-1-bzcd8_default"
Mar 03 19:09:33 172.30.5.151 atomic-openshift-node[18961]: W0303 19:09:33.044803 18961 container.go:326] Failed to create summary reader for "/system.slice/docker-a7a3fa839f1aa37aeb6a4bbc43efd85777ee4fd0aa5b9ae39d929b261603900f.scope": none of the resources are being tracked.
Mar 03 19:09:33 172.30.5.151 atomic-openshift-node[18961]: E0303 19:09:33.047843 18961 pod_workers.go:113] Error syncing pod 4f2ca500-e130-11e5-9894-001a4a576348, skipping: API error (500): Cannot start container a7a3fa839f1aa37aeb6a4bbc43efd85777ee4fd0aa5b9ae39d929b261603900f: [8] System error: read parent: connection reset by peer
Mar 03 19:09:33 172.30.5.151 atomic-openshift-node[18961]: I0303 19:09:33.116273 18961 manager.go:1720] Need to restart pod infra container for "docker-registry-1-bzcd8_default" because it is not found
Mar 03 19:09:38 172.30.5.151 atomic-openshift-node[18961]: E0303 19:09:38.530391 18961 manager.go:1867] Failed to create pod infra container: API error (500): Cannot start container a367f10963b22308c502469007b034ed681ce230bdc20f0519bd2d3736b3a64e: [8] System error: read parent: connection reset by peer
Mar 03 19:09:38 172.30.5.151 atomic-openshift-node[18961]: ; Skipping pod "docker-registry-1-bzcd8_default"
Mar 03 19:09:38 172.30.5.151 atomic-openshift-node[18961]: W0303 19:09:38.531038 18961 container.go:326] Failed to create summary reader for "/system.slice/docker-a367f10963b22308c502469007b034ed681ce230bdc20f0519bd2d3736b3a64e.scope": none of the resources are being tracked.
Mar 03 19:09:38 172.30.5.151 atomic-openshift-node[18961]: E0303 19:09:38.537591 18961 pod_workers.go:113] Error syncing pod 4f2ca500-e130-11e5-9894-001a4a576348, skipping: API error (500): Cannot start container a367f10963b22308c502469007b034ed681ce230bdc20f0519bd2d3736b3a64e: [8] System error: read parent: connection reset by peer



with the container id changing, those logs repeat again and again.



output from cmd "docker ps -a" on the node the registry pod scheduled to:

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b95a82a63af1 openshift3/ose-pod:v3.1.0.4 "/pod" 3 seconds ago Created k8s_POD.3d2efd0c_docker-registry-1-bzcd8_default_4f2ca500-e130-11e5-9894-001a4a576348_766e8717
1365324a6404 openshift3/ose-pod:v3.1.0.4 "/pod" 13 seconds ago Created k8s_POD.3d2efd0c_docker-registry-1-bzcd8_default_4f2ca500-e130-11e5-9894-001a4a576348_427270c2
7aeb6c0d70ae openshift3/ose-pod:v3.1.0.4 "/pod" 16 seconds ago Created k8s_POD.3d2efd0c_docker-registry-1-bzcd8_default_4f2ca500-e130-11e5-9894-001a4a576348_42f570e1
54b31fed70fe openshift3/ose-pod:v3.1.0.4 "/pod" 43 seconds ago Created k8s_POD.3d2efd0c_docker-registry-1-bzcd8_default_4f2ca500-e130-11e5-9894-001a4a576348_ab82b036
fda0e0bea6af openshift3/ose-pod:v3.1.0.4 "/pod" 53 seconds ago Created k8s_POD.3d2efd0c_docker-registry-1-bzcd8_default_4f2ca500-e130-11e5-9894-001a4a576348_34e1403f
7da07c481d89 openshift3/ose-pod:v3.1.0.4 "/pod" About a minute ago Created k8s_POD.3d2efd0c_docker-registry-1-bzcd8_default_4f2ca500-e130-11e5-9894-001a4a576348_f947e181
90741b47fab3 openshift3/ose-pod:v3.1.0.4 "/pod" About a minute ago Created k8s_POD.3d2efd0c_docker-registry-1-bzcd8_default_4f2ca500-e130-11e5-9894-001a4a576348_b57f9547
1ebcddcaa8f1 openshift3/ose-pod:v3.1.0.4 "/pod" About a minute ago Created k8s_POD.3d2efd0c_docker-registry-1-bzcd8_default_4f2ca500-e130-11e5-9894-001a4a576348_082ef9a1
2520080b8f6f openshift3/ose-pod:v3.1.0.4 "/pod" About a minute ago Created k8s_POD.3d2efd0c_docker-registry-1-bzcd8_default_4f2ca500-e130-11e5-9894-001a4a576348_9cc1fe9d
1dab0a779b8d openshift3/ose-pod:v3.1.0.4 "/pod" About a minute ago Created k8s_POD.3d2efd0c_docker-registry-1-bzcd8_default_4f2ca500-e130-11e5-9894-001a4a576348_8a7bb45b
7408188d4fea openshift3/ose-pod:v3.1.0.4 "/pod" About a minute ago Created k8s_POD.3d2efd0c_docker-registry-1-bzcd8_default_4f2ca500-e130-11e5-9894-001a4a576348_60b211d1
177686e1f8d5 openshift3/ose-pod:v3.1.0.4 "/pod" 2 minutes ago Created k8s_POD.3d2efd0c_docker-registry-1-bzcd8_default_4f2ca500-e130-11e5-9894-001a4a576348_3dc0fc5a
f9af221cf3d9 openshift3/ose-pod:v3.1.0.4 "/pod" 2 minutes ago Created k8s_POD.3d2efd0c_docker-registry-1-byw7p_default_9c080023-e12c-11e5-9894-001a4a576348_6d7404ca
6bda889a9124 openshift3/ose-pod:v3.1.0.4 "/pod" 2 minutes ago Created k8s_POD.3d2efd0c_docker-registry-1-byw7p_default_9c080023-e12c-11e5-9894-001a4a576348_28c83aac
6fd30d2eba0a openshift3/ose-haproxy-router:v3.1.0.4 "/usr/bin/openshift-r" About an hour ago Up About an hour k8s_router.4dfa44f2_router-default-1-rgkyw_default_602be5fb-e121-11e5-9894-001a4a576348_ef4b35cb
c8f9ab2df9ea openshift3/ose-pod:v3.1.0.4 "/pod" About an hour ago Up About an hour k8s_POD.da21dbf3_router-default-1-rgkyw_default_602be5fb-e121-11e5-9894-001a4a576348_2da51d59


it seems that the deployer are trying to start the ose-pod repeatly, but the status of container can not change to up due to some reason i do not know. 

Version-Release number of selected component (if applicable):

oc version
oc v3.1.0.4-16-g112fcc4
kubernetes v1.1.0-origin-1107-g4c8e6f4

docker version
Client:
Version: 1.8.2-el7
API version: 1.20
Package Version: docker-1.8.2-10.el7.x86_64
Go version: go1.4.2
Git commit: a01dc02/1.8.2
Built:

OS/Arch: linux/amd64

Server:
Version: 1.8.2-el7
API version: 1.20
Package Version:
Go version: go1.4.2
Git commit: a01dc02/1.8.2
Built:

OS/Arch: linux/amd64

How reproducible:

The bug happens by accident

Actual results:

pod does not start to work

Additional info:

the openshift platform is deployed on rhev

Comment 1 Andy Goldstein 2016-03-08 15:52:48 UTC
At a quick glance, this appears to be https://github.com/docker/docker/issues/14203. Reassigning back to you Jhon.

Comment 2 Jhon Honce 2016-03-08 20:40:04 UTC
Should be fixed in Docker 1.10.x