Description of problem: The pod status become OutOfDisk on the dedicated env after run sometime later Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.create project 2.create a pod 3.check pod status Actual results: [vagrant@ose ~]$ oc get pod -o wide NAME READY STATUS RESTARTS AGE NODE beego-example-1-build 0/1 OutOfDisk 0 24s ip-172-31-5-177.ec2.internal Expected results: running Additional info:
the env cannot build and deploy now [vagrant@ose ~]$ oc get event FIRSTSEEN LASTSEEN COUNT NAME KIND SUBOBJECT REASON SOURCE MESSAGE 17m 10m 11 database-1 ReplicationController FailedCreate {deployer } Error creating deployer pod for haowang2/database-1: Pod "database-1-deploy" is forbidden: service account haowang2/deployer was not found, retry after the service account is created 17m 12m 2 database-1 ReplicationController FailedCreate {deployer } Error creating deployer pod for haowang2/database-1: Internal error occurred: Get http://api.stage.openshift.com/api/v1/namespaces/haowang2: dial tcp 52.5.122.7:80: connection refused 17m 14m 2 database-1 ReplicationController FailedCreate {deployer } Error creating deployer pod for haowang2/database-1: Internal error occurred: Get http://api.stage.openshift.com/api/v1/namespaces/haowang2: dial tcp 52.72.220.72:80: connection refused 8m 17s 10 database-2 ReplicationController FailedCreate {deployer } Error creating deployer pod for haowang2/database-2: Pod "database-2-deploy" is forbidden: service account haowang2/deployer was not found, retry after the service account is created 17m 17m 1 database DeploymentConfig DeploymentCreated {deploymentconfig-controller } Created new deployment "database-1" for version 1 8m 8m 1 database DeploymentConfig DeploymentCreated {deploymentconfig-controller } Created new deployment "database-2" for version 2
Initial report that project does not exist and there are no other pods showing this. Comment 1 is related to https://bugzilla.redhat.com/show_bug.cgi?id=1304586 Can you recreate the OutOfDisk error so I can actually look into it?
No such errors like comment #1 and no OutOfDisk error now; However, pods keep in pending status in nodes: [wzheng@localhost ~]$ oc get builds NAME TYPE FROM STATUS STARTED DURATION php-sample-build-1 Source Git Pending [wzheng@localhost ~]$ oc get pods -o wide NAME READY STATUS RESTARTS AGE NODE database-1-deploy 1/1 Running 0 21m ip-172-31-5-179.ec2.internal database-1-kbnn8 1/1 Running 0 21m ip-172-31-5-179.ec2.internal database-1-posthook 0/1 Pending 0 21m ip-172-31-5-180.ec2.internal database-1-prehook 0/1 Completed 0 21m ip-172-31-5-179.ec2.internal php-sample-build-1-build 0/1 Pending 0 19m ip-172-31-5-180.ec2.internal [wzheng@localhost ~]$ oc get pods -n wzheng3 -o wide NAME READY STATUS RESTARTS AGE NODE php-sample-build-2-build 0/1 Pending 0 19m ip-172-31-5-179.ec2.internal [wzheng@localhost ~]$ oc get pods -o wide -n wzheng123 NAME READY STATUS RESTARTS AGE NODE php-sample-build-3-build 0/1 Pending 0 21m ip-172-31-5-177.ec2.internal
This is related to docker getting hung. It seems to be happening at a higher rate for us in 3.1.1.6.
During the next hang, please attach strace to the docker process (-f -v -y -yy -s 4096) and log for approximately 5 minutes. Please attach log or forward via email. Thanks.
Created attachment 1122835 [details] python build failed It might be related working in web console, I've got "the image cannot be retrieved" several times while trying to open: https://console.stage.openshift.com/console/project/ctrnl/create/fromimage?imageName=python&imageTag=3.4&namespace=ctrnl At some point it succeeded. But then build failed with failed to push image. See attached console log.
I've tested it with my limited runs and have not seen the problem. Will need to run more tests to see and will put it as VERIFIED if still can't reproduce the problem then.
I've run more tests today on top off yesterday and have not seen the issue again. Putting it as verified.