1372138 – MongoDB pod re-deployed without any user action

Bug 1372138 - MongoDB pod re-deployed without any user action

Summary: MongoDB pod re-deployed without any user action

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	OpenShift Online
Classification:	Red Hat
Component:	Deployments
Sub Component:
Version:	3.x
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Abhishek Gupta
QA Contact:	zhou ying
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-09-01 01:55 UTC by bugreport398
Modified:	2023-09-14 03:30 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-06-28 08:18:54 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description bugreport398 2016-09-01 01:55:41 UTC

Description of problem:

Logged in to console to see MongoDB pod has a failed deploy, however I haven't made any changes to the pod that would trigger a re-deploy - it has been stable for 20 days.  

Console overview says the deploy was from:

10 hours ago from image change 

But, as stated, no image change was made.  

The Python app deployment has also failed, however I did a git push before stopping work last night and don't remember checking if the rebuild/deploy was successful - so that may have caused the Python app deployment to fail.  

I have re-deployed the Python app, but just getting the error events:

Unable to mount volumes for pod "python-app-92-xw0d5_my-app(4d43470a-6fe5-11e6-8f30-12d79454368d)": Could not attach EBS Disk "aws://us-east-1c/vol-abfa5979": Error attaching EBS volume: VolumeInUse: vol-abfa5979 is already attached to an instance status code: 400, request id: 

Version-Release number of selected component (if applicable):

Dev preview.  

How reproducible:

I don't know.  

Steps to Reproduce:

1. Login to console.  

Actual results:

Failed deploy for MongoDB and Python pods.  

Expected results:

Normally functioning pods.  


Additional info:

Comment 1 bugreport398 2016-09-01 02:02:04 UTC

After a few tries at re-deploying the app, it finally worked, but when I scaled the pod up it just got stuck on the "pulling" state (grey circle indicator).

Comment 2 bugreport398 2016-09-01 02:02:39 UTC

Last comment is in regards to the python app.

Comment 3 bugreport398 2016-09-01 02:08:25 UTC

I deleted the failed mongo deployment (as the previous one was up and running), and again a re-deployment has been triggered without me doing anything.  :/

Comment 4 bugreport398 2016-09-01 02:12:29 UTC

Both pods failed and unable to get back up.

Comment 5 bugreport398 2016-09-01 08:54:16 UTC

This seems so strange, I scale down Python and MongoDB pods, and then just look at the console for about 5 minutes, and MongoDB pod deployment #11 just scales down and MongoDB pod deployment #12 (a new deployment) gets deployed without me doing anything. (This has happened a couple of times, and each time I delete MongoDB pod deployment #12).  I can't scale up either the MongoDB or Python pod from 0 to 1.

Comment 6 Michal Fojtik 2016-09-01 09:45:50 UTC

(In reply to bugreport398 from comment #5)
> This seems so strange, I scale down Python and MongoDB pods, and then just
> look at the console for about 5 minutes, and MongoDB pod deployment #11 just
> scales down and MongoDB pod deployment #12 (a new deployment) gets deployed
> without me doing anything. (This has happened a couple of times, and each
> time I delete MongoDB pod deployment #12).  I can't scale up either the
> MongoDB or Python pod from 0 to 1.

OK, there are couple things grouped in this bug:

1) You see "Could not attach EBS Disk" because your python app deployment config is using the "rolling" strategy but have ReadWriteOnce (RWO) persistent volume attached. What is happening is that rolling strategy will start a new Pod when a new deployment occurs but it does not scale down the old Pod... So at this point, there are two Pods running which have the same volume defined.
To fix this, you can switch to "recreate" strategy.

2) "gets deployed without me doing anything" if you removed the replication controller that deployment config created, deployment config will see that as it does not have the latest version running where it should and will re-create it for you. That is expected.

3) "however I haven't made any changes to the pod that would trigger a re-deploy" this is probably caused by pushing a new version of the MongoDB image into Docker Registry. Because you have ImageChangeTrigger defined, it means it will be automatically re-deployed when a new image is available. You can remove this behavior by removing the "automatic: true" from the trigger (edit YAML in web console or `oc edit dc/foo`).

Comment 7 Michail Kargakis 2016-09-01 09:49:00 UTC

The Mongo deployment started probably because an updated image was pushed to the Mongo ImageStream. If you don't want to have automatic deployments happen on image updates, edit your Mongo DeploymentConfig and remove "automatic: true" from its ImageChangeTrigger.

https://docs.openshift.org/latest/dev_guide/deployments.html#image-change-trigger

In this way, you will need to manually enable the trigger back when you wish to update to the latest Mongo image. From the cli, it should be `oc set triggers dc/mongodb --auto` - from the console, you will need to add "automatic: true" back to the DeploymentConfig. In 1.3.1 we will enable manual deployments without the need to enable/disable triggers.

> I deleted the failed mongo deployment (as the previous one was up and running), and again a re-deployment has been triggered without me doing anything.  :/

You are not supposed to manipulate ReplicationControllers owned by DeploymentConfigs. We lag in our docs, but the notice is added in https://github.com/openshift/openshift-docs/pull/2694. Long story short, if you delete your latest RC, the DC controller will notice it and recreate a new one.

Can you post the output of the following commands from your project?

oc get events
oc get all -o yaml

Comment 8 bugreport398 2016-09-01 12:10:58 UTC

@Michal Fojtik, both MongoDB and Python were already set to recreate strategies (not rolling):  

Browse > Deployments > mongodb > Edit YAML:

spec:
  strategy:
    type: Recreate
    recreateParams:
      timeoutSeconds: 600
    resources: {  }

Browse > Deployments > python-app > Edit YAML:

spec:
  strategy:
    type: Recreate
    recreateParams:
      timeoutSeconds: 600
    resources: {  }

@ Michail Kargakis

This seems likely:  

"The Mongo deployment started probably because an updated image was pushed to the Mongo ImageStream".  

Based on these two bits of feedback (thanks), I am thinking that the errors are because resources weren't available to build, deploy or scale both pods up at once.  (I still cannot scale either pod up from 0 - python behaviour is reminiscent of that described in this bug:  https://bugzilla.redhat.com/show_bug.cgi?id=1369644).  

I am scratching my head though, because I think it should be able to handle the current configuration, ie:

Browse > Deployments > mongodb > Set Resource Limits:

Memory:  1GB

Browse > Deployments > python-app > Set Resource Limits:

Memory:  1GB

oc get pvc
NAME        STATUS    VOLUME         CAPACITY   ACCESSMODES   AGE
mongodb     Bound     pv-aws-1dj3b   4Gi        RWO           17d
pvc-nf0kl   Bound     pv-aws-e1agr   4Gi        RWO           16d

oc volume dc --all 
deploymentconfigs/mongodb
  pvc/mongodb (allocated 4GiB) as mongodb-data
    mounted at /var/lib/mongodb/data
deploymentconfigs/python-app
  pvc/pvc-nf0kl (allocated 4GiB) as mypythonvolume
    mounted at /opt/app-root/src/static/media_files

oc describe quota compute-resources -n my-app
Name:           compute-resources
Namespace:      my-app
Scopes:         NotTerminating
 * Matches all pods that do not have an active deadline.
Resource        Used    Hard
--------        ----    ----
limits.cpu      0       8
limits.memory   0       4Gi



Below is the requested output for `oc get events`.

Is there a way to minimise output of `oc get all -o yaml` so that only required information is displayed - it currently is very lengthy and includes `tokens` and `secrets` fields - not sure if ok to post here?  


oc get events
FIRSTSEEN   LASTSEEN   COUNT     NAME                  KIND      SUBOBJECT                    TYPE      REASON        SOURCE                                    MESSAGE
18m         18m        1         python-app-57-build   Pod                                    Normal    Scheduled     {default-scheduler }                      Successfully assigned python-app-57-build to ip-172-31-54-158.ec2.internal
18m         18m        1         python-app-57-build   Pod       spec.containers{sti-build}   Normal    Pulling       {kubelet ip-172-31-54-158.ec2.internal}   pulling image "openshift3/ose-sti-builder:v3.2.1.7"
18m         18m        1         python-app-57-build   Pod       spec.containers{sti-build}   Normal    Pulled        {kubelet ip-172-31-54-158.ec2.internal}   Successfully pulled image "openshift3/ose-sti-builder:v3.2.1.7"
18m         18m        1         python-app-57-build   Pod       spec.containers{sti-build}   Normal    Created       {kubelet ip-172-31-54-158.ec2.internal}   Created container with docker id 23c106a2a94e
18m         18m        1         python-app-57-build   Pod       spec.containers{sti-build}   Normal    Started       {kubelet ip-172-31-54-158.ec2.internal}   Started container with docker id 23c106a2a94e
2h          48m        82        python-app-93-4gnk7   Pod                                    Warning   FailedMount   {kubelet ip-172-31-54-168.ec2.internal}   Unable to mount volumes for pod "python-app-93-4gnk7_my-app(77f4b738-7027-11e6-8f30-12d79454368d)": Could not attach EBS Disk "aws://us-east-1c/vol-abfa5979": Error attaching EBS volume: VolumeInUse: vol-abfa5979 is already attached to an instance
            status code: 400, request id: 
2h          48m       82        python-app-93-4gnk7   Pod                 Warning   FailedSync   {kubelet ip-172-31-54-168.ec2.internal}   Error syncing pod, skipping: Could not attach EBS Disk "aws://us-east-1c/vol-abfa5979": Error attaching EBS volume: VolumeInUse: vol-abfa5979 is already attached to an instance
            status code: 400, request id: 
49m         49m       1         python-app-93          ReplicationController                                 Normal    SuccessfulDelete    {replication-controller }                 Deleted pod: python-app-93-4gnk7
15m         15m       1         python-app-94-deploy   Pod                                                   Normal    Scheduled           {default-scheduler }                      Successfully assigned python-app-94-deploy to ip-172-31-54-168.ec2.internal
15m         15m       1         python-app-94-deploy   Pod                     spec.containers{deployment}   Normal    Pulling             {kubelet ip-172-31-54-168.ec2.internal}   pulling image "openshift3/ose-deployer:v3.2.1.7"
15m         15m       1         python-app-94-deploy   Pod                     spec.containers{deployment}   Normal    Pulled              {kubelet ip-172-31-54-168.ec2.internal}   Successfully pulled image "openshift3/ose-deployer:v3.2.1.7"
15m         15m       1         python-app-94-deploy   Pod                     spec.containers{deployment}   Normal    Created             {kubelet ip-172-31-54-168.ec2.internal}   Created container with docker id 0f1645b84a7e
15m         15m       1         python-app-94-deploy   Pod                     spec.containers{deployment}   Normal    Started             {kubelet ip-172-31-54-168.ec2.internal}   Started container with docker id 0f1645b84a7e
15m         15m       1         python-app             DeploymentConfig                                      Normal    DeploymentCreated   {deploymentconfig-controller }            Created new deployment "python-app-94" for version 94
15m         15m       1         python-app             DeploymentConfig                                      Warning   FailedUpdate        {deployment-controller }                  Cannot update deployment my-app/python-app-94 status to Pending: replicationcontrollers "python-app-94" cannot be updated: the object has been modified; please apply your changes to the latest version and try again

Comment 9 bugreport398 2016-09-01 12:19:35 UTC

I just changed:

Browse > Deployments > python-app > Set Resource Limits:

Memory:  1GB

to:

Browse > Deployments > python-app > Set Resource Limits:

Memory:  525MB

And the python pod deployed and scaled up quickly.

I then tried to scale up MongoDB pod and it got to the light blue "not ready" stage - events tab shows:

10:16:36 PM 	
Warning 
Unhealthy  
Readiness probe failed: sh: cannot set terminal process group (-1): Inappropriate ioctl for device sh: no job control in this shell sh: mongostat: command not found
9 times in the last minute

10:15:05 PM 
Normal 
Created  
Created container with docker id 7d1071cb67ce

10:15:05 PM 
Normal 
Started  
Started container with docker id 7d1071cb67ce

10:15:04 PM 
Normal 	
Pulled  
Successfully pulled image "registry.access.redhat.com/rhscl/mongodb-32-rhel7@sha256:888c0b99e71bf21382e7471f5f6a48d4e52cf7b43b10ce57df05e7b03843c964"

10:15:02 PM 
Normal 
Pulling  
pulling image "registry.access.redhat.com/rhscl/mongodb-32-rhel7@sha256:888c0b99e71bf21382e7471f5f6a48d4e52cf7b43b10ce57df05e7b03843c964"

10:14:56 PM 
Normal 
Scheduled  
Successfully assigned mongodb-12-chn9u to ip-172-31-54-168.ec2.internal

Comment 10 bugreport398 2016-09-01 12:22:39 UTC

If I scale down the python pod to 0 (hoping to give the mongodb pod more available juice) and try and scale up the mongodb pod, the same events tab message above displays.

Comment 11 bugreport398 2016-09-01 12:28:34 UTC

I rolled back to mongodb deployment #11 (where #12 was the automatically triggered deployment that I didn't trigger) and the mongodb pod scaled up.  

The python pod could then be scaled up.

So two issues:

- The new automatic deployment 'doesn't work'.

- The system seems to prefer when MongoDB memory is 1GB and Python memory is 525MB - even though I think I have 4GB memory available :/

Comment 12 Michail Kargakis 2016-09-01 12:46:53 UTC

> Is there a way to minimise output of `oc get all -o yaml` so that only required information is displayed - it currently is very lengthy and includes `tokens` and `secrets` fields - not sure if ok to post here?  

oc get dc,rc -o yaml then and put it in a pastebin

Comment 13 bugreport398 2016-09-03 09:54:41 UTC

Output of oc get dc,rc -o yaml:

http://pastebin.com/U069VnWR

Can anyone give a definitive answer as to why the 4GB memory allowance for the project is not allowing two pods (set to recreate strategy) not to comfortably build deploy and scale when both mongodb and python pods are allocated 1GB memory.  I recently had to take the python pod down to 525MB in order to be able to scale it up without errors as explained above.

Thank you.

Comment 14 bugreport398 2016-09-03 12:52:22 UTC

PS - It just occurred to me that the reason all pods went down and they were so difficult to get back up again was that python and mongodb were on 1GB memory each, and then the automatic mongodb deploy was triggered (when both pods were in a scaled up state), and it was this action that caused a lack of resources which resulted in unable to mount and readiness probe errors etc.  Just a theory but if a similar thing happens to someone else, perhaps it can be tested.  Would still be good to know why 4GB memory resource is not adequate for supporting 2 x 1GB deployments and an automatically triggered deploy (all with recreate strategy).

Comment 15 Abhishek Gupta 2016-11-01 20:51:32 UTC

To your question about why 4GB of "non-terminating" quota is not enough... 

Your "terminating" quota is perhaps coming into play. This quota is used to run the builder/deployer pods as well as the hook pods. The resources that these pods specify is the same as what is specified in the corresponding DeploymentConfig and BuildConfig. Can you check if this is the issue (your deployment is failing on account of a lack of resources in your "terminating" quota?

Comment 17 Red Hat Bugzilla 2023-09-14 03:30:25 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.