1369644 – [preview] Deployment fails after git push in Dev Preview

Bug 1369644 - [preview] Deployment fails after git push in Dev Preview

Summary: [preview] Deployment fails after git push in Dev Preview

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Online
Classification:	Red Hat
Component:	Deployments
Sub Component:
Version:	3.x
Hardware:	Unspecified
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Michail Kargakis
QA Contact:	zhou ying
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-08-24 05:24 UTC by bugreport398
Modified:	2020-12-14 07:41 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-11-09 18:50:13 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description bugreport398 2016-08-24 05:24:58 UTC

Description of problem:

Environment:

A Python 2.7, MongoDB application.   

If I do a git push which triggers a build via a Github webhook, via the console I can see the build complete, it then moves to the deploy state for a long period of time (~ 7 minutes) and then fails and the pod color turns orange.  

I can get around this by scaling down the pod before doing a git push, waiting for it to build, then deploy, and then I manually click the Scale Up button on the pod.  

The pod event messages are below, I believe they are somewhat misleading however as when I scale down the pod before git pushing, these event messages do not occur:

3:03:36 PM 
Warning 
Failed mount 
Unable to mount volumes for pod "python-app-77-1xmlr_my-app(d82bee7c-69b6-11e6-8f30-12d79454368d)": Could not attach EBS Disk "aws://us-east-1c/vol-abfa5979": Error attaching EBS volume: VolumeInUse: vol-abfa5979 is already attached to an instance status code: 400, request id:
8 times in the last 8 minutes

3:03:36 PM
Warning
Failed sync
Error syncing pod, skipping: Could not attach EBS Disk "aws://us-east-1c/vol-abfa5979": Error attaching EBS volume: VolumeInUse: vol-abfa5979 is already attached to an instance status code: 400, request id:
8 times in the last 8 minutes

2:54:31 PM
Normal
Scheduled
Successfully assigned python-app-77-1xmlr to ip-172-31-54-166.ec2.internal 

Version-Release number of selected component (if applicable):

Dev Preview

How reproducible:

I don't know.  

Steps to Reproduce:

1. In a Python 2.7, MongoDB application.

2. Login to console.

3. Do a git push.  

4. Watch what occurs in the console.  

Actual results:

The pod deployment is failing.  

Expected results:

The pod builds, deploys and scales up, whilst scaling down the previous pod.  

Additional info:

Troubleshooting Details

oc get pvc
NAME        STATUS    VOLUME         CAPACITY   ACCESSMODES   AGE
mongodb     Bound     pv-aws-1dj3b   4Gi        RWO           17d
pvc-nf0kl   Bound     pv-aws-e1agr   4Gi        RWO           16d


oc volume dc --all 
deploymentconfigs/mongodb
  pvc/mongodb (allocated 4GiB) as mongodb-data
    mounted at /var/lib/mongodb/data
deploymentconfigs/python-app
  pvc/pvc-nf0kl (allocated 4GiB) as mypythonvolume
    mounted at /opt/app-root/src/static/media_files


oc describe quota compute-resources -n my-app
Name:           compute-resources
Namespace:      my-app
Scopes:         NotTerminating
 * Matches all pods that do not have an active deadline.
Resource        Used    Hard
--------        ----    ----
limits.cpu      2       8
limits.memory   2G      4Gi


oc get pods
NAME                   READY     STATUS      RESTARTS   AGE
mongodb-11-2ln7o       1/1       Running     0          8d
python-app-24-build    0/1       Error       0          6d
python-app-36-build    0/1       Completed   0          22h
python-app-37-build    0/1       Completed   0          15h
python-app-38-build    0/1       Completed   0          20m
python-app-51-deploy   0/1       Error       0          8d
python-app-76-vfnmx    1/1       Running     0          21m
python-app-77-deploy   0/1       Error       0          17m

Browse > Deployments > python-app > Actions > Set Resource Limits is set to 1GB. 

Browse > Deployments > mongodb > Actions > Set Resource Limits is set to 1GB. 

Browse > Deployments > python-app > Edit YAML:

spec:
  strategy:
    type: Rolling
    rollingParams:
      updatePeriodSeconds: 1
      intervalSeconds: 1
      timeoutSeconds: 600
      maxUnavailable: 25%
      maxSurge: 25%
    resources: {  }

Browse > Deployments > mongodb > Edit YAML:
    
spec:
  strategy:
    type: Recreate
    recreateParams:
      timeoutSeconds: 600
    resources: {  }


It is my understanding that for dev preview I have 4GB memory available for the whole project and 10GB storage. 

Extensive troubleshooting around these issues is here:

https://groups.google.com/forum/#!topic/openshift/ZZBe8REFT2E

Comment 1 Michail Kargakis 2016-08-24 10:03:02 UTC

Are you able to deploy the Python app if you switch from Rolling to Recreate?

Comment 2 bugreport398 2016-08-26 12:46:05 UTC

Browse > Deployments > python-app > Actions > Edit YAML:

spec:
  strategy:
    type: Rolling
    rollingParams:
      updatePeriodSeconds: 1
      intervalSeconds: 1
      timeoutSeconds: 600
      maxUnavailable: 25%
      maxSurge: 25%
    resources: {  }

What specifically do you suggest changing this too?

I changed it to this:

spec:
  strategy:
    type: Recreate

And then did a git push and ... wow, it worked!

I had tried this before and it hadn't worked (see above forum troubleshooting post).  

I will continue monitoring this issue and see if this new, desired, behaviour is consistent.  

Thank You.

Comment 3 Michail Kargakis 2016-08-26 13:27:30 UTC

There is an issue[1] in Kubernetes about RWO volumes not working across nodes. When you use a Rolling deployment, old and new pods of your deployment are going to be scheduled on different nodes and request the volume, hence your deployment gets blocked. Long story short, by using RWO currently only Recreate deployments with one pod (dc.spec.replicas = 1) are working.

[1] https://github.com/kubernetes/kubernetes/issues/26567

Comment 4 Michail Kargakis 2016-12-22 13:46:15 UTC

https://github.com/openshift/origin/pull/11917 merged recently and should prevent scaling up new pods before the old pods are removed from etcd. I am still not sure about the attach controller bit but I would expect once the old pod is removed, the volume should dettach from the node and be available to attach to wherever the new pod lands. The fix will be included in 3.5, not sure when Online will move into that version (keep in mind 3.4 has yet to be released).

Comment 5 zhou ying 2017-04-26 07:39:10 UTC

Confirmed with latest version on online with Recreate strategy, the issue has fixed:
openshift v3.5.5.9
kubernetes v1.5.2+43a9be4


[zhouy@zhouy v3]$ oc get po 
NAME                             READY     STATUS      RESTARTS   AGE
django-psql-persistent-1-build   0/1       Completed   0          5m
django-psql-persistent-2-build   0/1       Completed   0          1m
django-psql-persistent-2-nkw93   1/1       Running     0          42s

Comment 6 Steven Walter 2017-10-06 20:39:08 UTC

Are we only planning to support Recreate strategy for OpenShift Online v3?

Note You need to log in before you can comment on or make changes to this bug.