Bug 1304266 - Pod status keeps in pending status on dedicated env
Summary: Pod status keeps in pending status on dedicated env
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Containers
Version: 3.x
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Jhon Honce
QA Contact: DeShuai Ma
URL:
Whiteboard:
Depends On:
Blocks: OSOPS_V3
TreeView+ depends on / blocked
 
Reported: 2016-02-03 08:37 UTC by Wang Haoran
Modified: 2016-05-23 15:08 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-05-23 15:08:33 UTC
Target Upstream Version:
Embargoed:
jhonce: needinfo-


Attachments (Terms of Use)
python build failed (42.38 KB, text/plain)
2016-02-10 15:54 UTC, Aleksandar Kostadinov
no flags Details

Description Wang Haoran 2016-02-03 08:37:46 UTC
Description of problem:

The pod status become OutOfDisk on the dedicated env after run sometime later
Version-Release number of selected component (if applicable):


How reproducible:

Always
Steps to Reproduce:
1.create project 
2.create a pod
3.check pod status

Actual results:
[vagrant@ose ~]$ oc get pod -o wide
NAME                    READY     STATUS      RESTARTS   AGE       NODE
beego-example-1-build   0/1       OutOfDisk   0          24s       ip-172-31-5-177.ec2.internal

Expected results:
running

Additional info:

Comment 1 Wang Haoran 2016-02-04 02:34:45 UTC
the env cannot build and  deploy now
[vagrant@ose ~]$ oc get event
FIRSTSEEN   LASTSEEN   COUNT     NAME         KIND                    SUBOBJECT   REASON              SOURCE                           MESSAGE
17m         10m        11        database-1   ReplicationController               FailedCreate        {deployer }                      Error creating deployer pod for haowang2/database-1: Pod "database-1-deploy" is forbidden: service account haowang2/deployer was not found, retry after the service account is created
17m         12m        2         database-1   ReplicationController               FailedCreate        {deployer }                      Error creating deployer pod for haowang2/database-1: Internal error occurred: Get http://api.stage.openshift.com/api/v1/namespaces/haowang2: dial tcp 52.5.122.7:80: connection refused
17m         14m        2         database-1   ReplicationController               FailedCreate        {deployer }                      Error creating deployer pod for haowang2/database-1: Internal error occurred: Get http://api.stage.openshift.com/api/v1/namespaces/haowang2: dial tcp 52.72.220.72:80: connection refused
8m          17s        10        database-2   ReplicationController               FailedCreate        {deployer }                      Error creating deployer pod for haowang2/database-2: Pod "database-2-deploy" is forbidden: service account haowang2/deployer was not found, retry after the service account is created
17m         17m        1         database     DeploymentConfig                    DeploymentCreated   {deploymentconfig-controller }   Created new deployment "database-1" for version 1
8m          8m         1         database     DeploymentConfig                    DeploymentCreated   {deploymentconfig-controller }   Created new deployment "database-2" for version 2

Comment 2 Wesley Hearn 2016-02-05 21:04:04 UTC
Initial report that project does not exist and there are no other pods showing this.


Comment 1 is related to https://bugzilla.redhat.com/show_bug.cgi?id=1304586

Can you recreate the OutOfDisk error so I can actually look into it?

Comment 3 Wenjing Zheng 2016-02-06 04:01:04 UTC
No such errors like comment #1 and no OutOfDisk error now; However, pods keep in pending status in nodes:
[wzheng@localhost ~]$ oc get builds
NAME                 TYPE      FROM      STATUS    STARTED   DURATION
php-sample-build-1   Source    Git       Pending             
[wzheng@localhost ~]$ oc get pods -o wide
NAME                       READY     STATUS      RESTARTS   AGE       NODE
database-1-deploy          1/1       Running     0          21m       ip-172-31-5-179.ec2.internal
database-1-kbnn8           1/1       Running     0          21m       ip-172-31-5-179.ec2.internal
database-1-posthook        0/1       Pending     0          21m       ip-172-31-5-180.ec2.internal
database-1-prehook         0/1       Completed   0          21m       ip-172-31-5-179.ec2.internal
php-sample-build-1-build   0/1       Pending     0          19m       ip-172-31-5-180.ec2.internal
[wzheng@localhost ~]$ oc get pods -n wzheng3 -o wide
NAME                       READY     STATUS      RESTARTS   AGE       NODE
php-sample-build-2-build   0/1       Pending     0          19m       ip-172-31-5-179.ec2.internal
[wzheng@localhost ~]$ oc get pods -o wide -n wzheng123
NAME                       READY     STATUS      RESTARTS   AGE       NODE
php-sample-build-3-build   0/1       Pending     0          21m       ip-172-31-5-177.ec2.internal

Comment 5 Wesley Hearn 2016-02-08 15:32:26 UTC
This is related to docker getting hung. It seems to be happening at a higher rate for us in 3.1.1.6.

Comment 6 Jhon Honce 2016-02-08 21:19:38 UTC
During the next hang, please attach strace to the docker process (-f -v -y -yy -s 4096) and log for approximately 5 minutes. Please attach log or forward via email.  Thanks.

Comment 7 Aleksandar Kostadinov 2016-02-10 15:54:53 UTC
Created attachment 1122835 [details]
python build failed

It might be related working in web console, I've got
"the image cannot be retrieved" several times while trying to open:
https://console.stage.openshift.com/console/project/ctrnl/create/fromimage?imageName=python&imageTag=3.4&namespace=ctrnl

At some point it succeeded. But then build failed with failed to push image. See attached console log.

Comment 11 Peter Ruan 2016-02-12 19:18:10 UTC
I've tested it with my limited runs and have not seen the problem.  Will need to run more tests to see and will put it as VERIFIED if still can't reproduce the problem then.

Comment 12 Peter Ruan 2016-02-12 23:48:24 UTC
I've run more tests today on top off yesterday and have not seen the issue again.  Putting it as verified.


Note You need to log in before you can comment on or make changes to this bug.