Created attachment 1119566 [details] openshift web console Description of problem: I have been having a lot of trouble with this version. Deployments are very inconsistent. The latest manifestation has three separate deployments for one service; all of them hung in various states. This is after I have been able to deploy before without any changes. The new openshift v1.1.1 just randomly starts giving me errors. This is just the latest: Failed to pull image "172.30.250.187:5000/abecorn/tradeclient@sha256:cbca9d885bf1c23bb518662cc51d61b5365ab321147a59d2be5b86869f50c08e": Driver devicemapper failed to create image rootfs 0dd71c06ef1387030a9c4c05e3cea6727405fce6eda53b5d01cb5ff442440d02: Unknown device e551632771dbc2fa0728e96af65300567243f2413311b76334e3abebbe836e19 Subsequent deployments do not show errors in the event log or anywhere else I can find. They just hang. I did notice all subsequent builds, including the original failed deployment keep trying to use the oldest container image id even though there are many later successful builds and images of that container. I also notice every time I start a new deployment it increments the container count of the oldest failed deployment and creates a new deployment with a container increment of 1. I have attached an image of the mess in my web console. Along with the output for: oc get dc,rc -o yaml Version-Release number of selected component (if applicable): Openshift Origin v1.1.1, RHEL 7.2, Docker 1.8.2 How reproducible: Deployments will work a few times, but ultimately start failing and hanging Steps to Reproduce: 1. Set up thin docker storage pool with free volume group (option b in prerequisites) 2. Start Openshift Origin Master and Node all in one on RHEL 7.2 machine 3. Build and deploy containers Actual results: Expected results: Additional info:
Created attachment 1119567 [details] oc get dc,rc -o yaml
Created attachment 1119580 [details] Available images and the oldest is always being used for deployments
I also see a lot of pods out there when I do: oc get pods on the project. I don't remember seeing so many pods in earlier versions. It seems oc get pods would only show running pods: abecornlandingpageservice-1-2jazv 1/1 Running 0 13h abecornlandingpageservice-1-frbyd 1/1 Running 0 13h batchservice-1-deploy 0/1 Error 0 5d batchservice-6-deploy 0/1 Error 0 2d batchservice-8-deploy 0/1 Error 0 1d batchservice-9-7mjvz 1/1 Running 0 23h batchservice-build-1-build 0/1 Error 0 5d batchservice-build-2-build 0/1 Completed 0 5d batchservice-build-3-build 0/1 Completed 0 3d batchservice-build-4-build 0/1 Completed 0 3d batchservice-build-5-build 0/1 Completed 0 3d batchservice-build-6-build 0/1 Completed 0 2d batchservice-build-7-build 0/1 Completed 0 1d client-base-build-1-build 0/1 Completed 0 6d client-base-build-2-build 0/1 Completed 0 2d client-base-build-3-build 0/1 Error 0 2d client-base-build-4-build 0/1 Completed 0 2d client-base-build-5-build 0/1 Completed 0 1d itemrepoclient-build-1-build 0/1 Completed 0 1d itemrepoclientservice-2-grzzg 1/1 Running 1 1d itemrepoclientservice-2-zatbp 1/1 Running 1 1d tradeclient-build-3-build 0/1 Completed 0 1d tradeclient-build-4-build 0/1 Completed 0 2h tradeclient-build-5-build 0/1 Completed 0 1h tradeclientservice-10-hzw2i 0/1 Terminating 0 1h tradeclientservice-4-2szx0 0/1 Terminating 0 2h tradeclientservice-4-dfx0e 0/1 Terminating 0 2h tradeclientservice-4-rinm1 0/1 Terminating 0 1h tradeclientservice-4-uc7ni 0/1 Terminating 0 1h tradeclientservice-5-388dq 0/1 Terminating 0 1h tradeclientservice-7-ezw21 0/1 Terminating 0 1h tradeclientservice-9-1g1dk 0/1 Terminating 0 1h tradeservice-1-deploy 0/1 Error 0 23h tradeservice-2-deploy 0/1 DeadlineExceeded 0 23h tradeservice-3-deploy 0/1 Error 0 23h tradeservice-6-bs9c8 1/1 Running 0 2h tradeservice-build-1-build 0/1 Error 0 1d tradeservice-build-2-build 0/1 Error 0 1d tradeservice-build-5-build 0/1 Completed 0 1d tradeservice-build-6-build 0/1 Completed 0 7h tradeservice-build-7-build 0/1 Completed 0 2h tradeservicebase-build-1-build 0/1 Completed 0 7d tradeservicebase-build-2-build 0/1 Completed 0 3d tradeservicebase-build-3-build 0/1 Completed 0 3d tradeservicebase-build-4-build 0/1 Completed 0 2d tradeservicebase-build-5-build 0/1 Completed 0 1d wildfly-jdk-8-build-1-build 0/1 Error 0 8d wildfly-jdk-8-build-10-build 0/1 Completed 0 3d wildfly-jdk-8-build-11-build 0/1 Completed 0 2d wildfly-jdk-8-build-12-build 0/1 Completed 0 1d wildfly-jdk-8-build-2-build 0/1 Error 0 8d wildfly-jdk-8-build-3-build 0/1 Error 0 8d wildfly-jdk-8-build-4-build 0/1 Error 0 8d wildfly-jdk-8-build-5-build 0/1 Error 0 8d wildfly-jdk-8-build-6-build 0/1 Error 0 8d wildfly-jdk-8-build-7-build 0/1 Error 0 8d wildfly-jdk-8-build-8-build 0/1 Error 0 8d wildfly-jdk-8-build-9-build 0/1 Completed 0 8d
The terminating pods have been "Terminating" for hours. I have tried deleting them; I tried deleting the Deployment Config for them and the associated service. The Pods still remain.
The only way to get the pods to clear was by killing openshift. I restarted, ran the deployment again and it worked this time. This happens far too often. If it is something with my machine setup, why does it work after restarting Openshift? I have been working with Openshift Origin for a while now and my previous v1.0.7 seemed more reliable. The logs were incredibly slow, builds would get slower over time and deployments would often kill previous containers before the new containers were up; still, I rarely had pods hang forever forcing me to manually delete deployment configs and restart openshift. The only thing I have changed from that setup to this setup is moving from RHEL 7.1 to RHEL 7.2 and switching to thin storage pool over default loopback storage.
As for "Terminating" state pods, do you remember that you deleted the pods when it's pulling image? Or failed to pull? There are some related bug reports: https://bugzilla.redhat.com/show_bug.cgi?id=1271198#c8 https://bugzilla.redhat.com/show_bug.cgi?id=1274598#c0 and they said that they deleted the pods when it's pulling image. NOTE: If your original issue is NOT related to the "Terminating" state, we should not discuss it here. But if you think it is related to your issue, please comment on this bug ticket.
I don't think image pulling is part of the problem. I see plenty of messages that say the image is pulled successfully. The image is pulled but the container never starts and looking at the container logs directly using docker logs shows the logs are completely empty. The last time I had containers failing to start with logs that were blank I needed to set the runAsUser to runAsAny because I use root user inside my docker files. I checked that setting and the scc restricted file is set to runAsAny. I only manually delete the pods after the event logs show the image was pulled and the containers supposedly start. After they start, the event logs show the containers are killed for some unknown reason.
We apologize, however, we do not plan to address this report at this time. The majority of our active development is for the v3 version of OpenShift. If you would like for Red Hat to reconsider this decision, please reach out to your support representative. We are very sorry for any inconvenience this may cause.