Description of problem: Environment: A Python 2.7, MongoDB application. If I do a git push which triggers a build via a Github webhook, via the console I can see the build complete, it then moves to the deploy state for a long period of time (~ 7 minutes) and then fails and the pod color turns orange. I can get around this by scaling down the pod before doing a git push, waiting for it to build, then deploy, and then I manually click the Scale Up button on the pod. The pod event messages are below, I believe they are somewhat misleading however as when I scale down the pod before git pushing, these event messages do not occur: 3:03:36 PM Warning Failed mount Unable to mount volumes for pod "python-app-77-1xmlr_my-app(d82bee7c-69b6-11e6-8f30-12d79454368d)": Could not attach EBS Disk "aws://us-east-1c/vol-abfa5979": Error attaching EBS volume: VolumeInUse: vol-abfa5979 is already attached to an instance status code: 400, request id: 8 times in the last 8 minutes 3:03:36 PM Warning Failed sync Error syncing pod, skipping: Could not attach EBS Disk "aws://us-east-1c/vol-abfa5979": Error attaching EBS volume: VolumeInUse: vol-abfa5979 is already attached to an instance status code: 400, request id: 8 times in the last 8 minutes 2:54:31 PM Normal Scheduled Successfully assigned python-app-77-1xmlr to ip-172-31-54-166.ec2.internal Version-Release number of selected component (if applicable): Dev Preview How reproducible: I don't know. Steps to Reproduce: 1. In a Python 2.7, MongoDB application. 2. Login to console. 3. Do a git push. 4. Watch what occurs in the console. Actual results: The pod deployment is failing. Expected results: The pod builds, deploys and scales up, whilst scaling down the previous pod. Additional info: Troubleshooting Details oc get pvc NAME STATUS VOLUME CAPACITY ACCESSMODES AGE mongodb Bound pv-aws-1dj3b 4Gi RWO 17d pvc-nf0kl Bound pv-aws-e1agr 4Gi RWO 16d oc volume dc --all deploymentconfigs/mongodb pvc/mongodb (allocated 4GiB) as mongodb-data mounted at /var/lib/mongodb/data deploymentconfigs/python-app pvc/pvc-nf0kl (allocated 4GiB) as mypythonvolume mounted at /opt/app-root/src/static/media_files oc describe quota compute-resources -n my-app Name: compute-resources Namespace: my-app Scopes: NotTerminating * Matches all pods that do not have an active deadline. Resource Used Hard -------- ---- ---- limits.cpu 2 8 limits.memory 2G 4Gi oc get pods NAME READY STATUS RESTARTS AGE mongodb-11-2ln7o 1/1 Running 0 8d python-app-24-build 0/1 Error 0 6d python-app-36-build 0/1 Completed 0 22h python-app-37-build 0/1 Completed 0 15h python-app-38-build 0/1 Completed 0 20m python-app-51-deploy 0/1 Error 0 8d python-app-76-vfnmx 1/1 Running 0 21m python-app-77-deploy 0/1 Error 0 17m Browse > Deployments > python-app > Actions > Set Resource Limits is set to 1GB. Browse > Deployments > mongodb > Actions > Set Resource Limits is set to 1GB. Browse > Deployments > python-app > Edit YAML: spec: strategy: type: Rolling rollingParams: updatePeriodSeconds: 1 intervalSeconds: 1 timeoutSeconds: 600 maxUnavailable: 25% maxSurge: 25% resources: { } Browse > Deployments > mongodb > Edit YAML: spec: strategy: type: Recreate recreateParams: timeoutSeconds: 600 resources: { } It is my understanding that for dev preview I have 4GB memory available for the whole project and 10GB storage. Extensive troubleshooting around these issues is here: https://groups.google.com/forum/#!topic/openshift/ZZBe8REFT2E
Are you able to deploy the Python app if you switch from Rolling to Recreate?
Browse > Deployments > python-app > Actions > Edit YAML: spec: strategy: type: Rolling rollingParams: updatePeriodSeconds: 1 intervalSeconds: 1 timeoutSeconds: 600 maxUnavailable: 25% maxSurge: 25% resources: { } What specifically do you suggest changing this too? I changed it to this: spec: strategy: type: Recreate And then did a git push and ... wow, it worked! I had tried this before and it hadn't worked (see above forum troubleshooting post). I will continue monitoring this issue and see if this new, desired, behaviour is consistent. Thank You.
There is an issue[1] in Kubernetes about RWO volumes not working across nodes. When you use a Rolling deployment, old and new pods of your deployment are going to be scheduled on different nodes and request the volume, hence your deployment gets blocked. Long story short, by using RWO currently only Recreate deployments with one pod (dc.spec.replicas = 1) are working. [1] https://github.com/kubernetes/kubernetes/issues/26567
https://github.com/openshift/origin/pull/11917 merged recently and should prevent scaling up new pods before the old pods are removed from etcd. I am still not sure about the attach controller bit but I would expect once the old pod is removed, the volume should dettach from the node and be available to attach to wherever the new pod lands. The fix will be included in 3.5, not sure when Online will move into that version (keep in mind 3.4 has yet to be released).
Confirmed with latest version on online with Recreate strategy, the issue has fixed: openshift v3.5.5.9 kubernetes v1.5.2+43a9be4 [zhouy@zhouy v3]$ oc get po NAME READY STATUS RESTARTS AGE django-psql-persistent-1-build 0/1 Completed 0 5m django-psql-persistent-2-build 0/1 Completed 0 1m django-psql-persistent-2-nkw93 1/1 Running 0 42s
Are we only planning to support Recreate strategy for OpenShift Online v3?