| Summary: | MongoDB pod re-deployed without any user action | ||
|---|---|---|---|
| Product: | OpenShift Online | Reporter: | bugreport398 |
| Component: | Deployments | Assignee: | Abhishek Gupta <abhgupta> |
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | zhou ying <yinzhou> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.x | CC: | abhgupta, aos-bugs, bugreport398, mkargaki, pweil |
| Target Milestone: | --- | Flags: | abhgupta:
needinfo?
(bugreport398) |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-06-28 08:18:54 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
bugreport398
2016-09-01 01:55:41 UTC
After a few tries at re-deploying the app, it finally worked, but when I scaled the pod up it just got stuck on the "pulling" state (grey circle indicator). Last comment is in regards to the python app. I deleted the failed mongo deployment (as the previous one was up and running), and again a re-deployment has been triggered without me doing anything. :/ Both pods failed and unable to get back up. This seems so strange, I scale down Python and MongoDB pods, and then just look at the console for about 5 minutes, and MongoDB pod deployment #11 just scales down and MongoDB pod deployment #12 (a new deployment) gets deployed without me doing anything. (This has happened a couple of times, and each time I delete MongoDB pod deployment #12). I can't scale up either the MongoDB or Python pod from 0 to 1. (In reply to bugreport398 from comment #5) > This seems so strange, I scale down Python and MongoDB pods, and then just > look at the console for about 5 minutes, and MongoDB pod deployment #11 just > scales down and MongoDB pod deployment #12 (a new deployment) gets deployed > without me doing anything. (This has happened a couple of times, and each > time I delete MongoDB pod deployment #12). I can't scale up either the > MongoDB or Python pod from 0 to 1. OK, there are couple things grouped in this bug: 1) You see "Could not attach EBS Disk" because your python app deployment config is using the "rolling" strategy but have ReadWriteOnce (RWO) persistent volume attached. What is happening is that rolling strategy will start a new Pod when a new deployment occurs but it does not scale down the old Pod... So at this point, there are two Pods running which have the same volume defined. To fix this, you can switch to "recreate" strategy. 2) "gets deployed without me doing anything" if you removed the replication controller that deployment config created, deployment config will see that as it does not have the latest version running where it should and will re-create it for you. That is expected. 3) "however I haven't made any changes to the pod that would trigger a re-deploy" this is probably caused by pushing a new version of the MongoDB image into Docker Registry. Because you have ImageChangeTrigger defined, it means it will be automatically re-deployed when a new image is available. You can remove this behavior by removing the "automatic: true" from the trigger (edit YAML in web console or `oc edit dc/foo`). The Mongo deployment started probably because an updated image was pushed to the Mongo ImageStream. If you don't want to have automatic deployments happen on image updates, edit your Mongo DeploymentConfig and remove "automatic: true" from its ImageChangeTrigger. https://docs.openshift.org/latest/dev_guide/deployments.html#image-change-trigger In this way, you will need to manually enable the trigger back when you wish to update to the latest Mongo image. From the cli, it should be `oc set triggers dc/mongodb --auto` - from the console, you will need to add "automatic: true" back to the DeploymentConfig. In 1.3.1 we will enable manual deployments without the need to enable/disable triggers. > I deleted the failed mongo deployment (as the previous one was up and running), and again a re-deployment has been triggered without me doing anything. :/ You are not supposed to manipulate ReplicationControllers owned by DeploymentConfigs. We lag in our docs, but the notice is added in https://github.com/openshift/openshift-docs/pull/2694. Long story short, if you delete your latest RC, the DC controller will notice it and recreate a new one. Can you post the output of the following commands from your project? oc get events oc get all -o yaml @Michal Fojtik, both MongoDB and Python were already set to recreate strategies (not rolling):
Browse > Deployments > mongodb > Edit YAML:
spec:
strategy:
type: Recreate
recreateParams:
timeoutSeconds: 600
resources: { }
Browse > Deployments > python-app > Edit YAML:
spec:
strategy:
type: Recreate
recreateParams:
timeoutSeconds: 600
resources: { }
@ Michail Kargakis
This seems likely:
"The Mongo deployment started probably because an updated image was pushed to the Mongo ImageStream".
Based on these two bits of feedback (thanks), I am thinking that the errors are because resources weren't available to build, deploy or scale both pods up at once. (I still cannot scale either pod up from 0 - python behaviour is reminiscent of that described in this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1369644).
I am scratching my head though, because I think it should be able to handle the current configuration, ie:
Browse > Deployments > mongodb > Set Resource Limits:
Memory: 1GB
Browse > Deployments > python-app > Set Resource Limits:
Memory: 1GB
oc get pvc
NAME STATUS VOLUME CAPACITY ACCESSMODES AGE
mongodb Bound pv-aws-1dj3b 4Gi RWO 17d
pvc-nf0kl Bound pv-aws-e1agr 4Gi RWO 16d
oc volume dc --all
deploymentconfigs/mongodb
pvc/mongodb (allocated 4GiB) as mongodb-data
mounted at /var/lib/mongodb/data
deploymentconfigs/python-app
pvc/pvc-nf0kl (allocated 4GiB) as mypythonvolume
mounted at /opt/app-root/src/static/media_files
oc describe quota compute-resources -n my-app
Name: compute-resources
Namespace: my-app
Scopes: NotTerminating
* Matches all pods that do not have an active deadline.
Resource Used Hard
-------- ---- ----
limits.cpu 0 8
limits.memory 0 4Gi
Below is the requested output for `oc get events`.
Is there a way to minimise output of `oc get all -o yaml` so that only required information is displayed - it currently is very lengthy and includes `tokens` and `secrets` fields - not sure if ok to post here?
oc get events
FIRSTSEEN LASTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
18m 18m 1 python-app-57-build Pod Normal Scheduled {default-scheduler } Successfully assigned python-app-57-build to ip-172-31-54-158.ec2.internal
18m 18m 1 python-app-57-build Pod spec.containers{sti-build} Normal Pulling {kubelet ip-172-31-54-158.ec2.internal} pulling image "openshift3/ose-sti-builder:v3.2.1.7"
18m 18m 1 python-app-57-build Pod spec.containers{sti-build} Normal Pulled {kubelet ip-172-31-54-158.ec2.internal} Successfully pulled image "openshift3/ose-sti-builder:v3.2.1.7"
18m 18m 1 python-app-57-build Pod spec.containers{sti-build} Normal Created {kubelet ip-172-31-54-158.ec2.internal} Created container with docker id 23c106a2a94e
18m 18m 1 python-app-57-build Pod spec.containers{sti-build} Normal Started {kubelet ip-172-31-54-158.ec2.internal} Started container with docker id 23c106a2a94e
2h 48m 82 python-app-93-4gnk7 Pod Warning FailedMount {kubelet ip-172-31-54-168.ec2.internal} Unable to mount volumes for pod "python-app-93-4gnk7_my-app(77f4b738-7027-11e6-8f30-12d79454368d)": Could not attach EBS Disk "aws://us-east-1c/vol-abfa5979": Error attaching EBS volume: VolumeInUse: vol-abfa5979 is already attached to an instance
status code: 400, request id:
2h 48m 82 python-app-93-4gnk7 Pod Warning FailedSync {kubelet ip-172-31-54-168.ec2.internal} Error syncing pod, skipping: Could not attach EBS Disk "aws://us-east-1c/vol-abfa5979": Error attaching EBS volume: VolumeInUse: vol-abfa5979 is already attached to an instance
status code: 400, request id:
49m 49m 1 python-app-93 ReplicationController Normal SuccessfulDelete {replication-controller } Deleted pod: python-app-93-4gnk7
15m 15m 1 python-app-94-deploy Pod Normal Scheduled {default-scheduler } Successfully assigned python-app-94-deploy to ip-172-31-54-168.ec2.internal
15m 15m 1 python-app-94-deploy Pod spec.containers{deployment} Normal Pulling {kubelet ip-172-31-54-168.ec2.internal} pulling image "openshift3/ose-deployer:v3.2.1.7"
15m 15m 1 python-app-94-deploy Pod spec.containers{deployment} Normal Pulled {kubelet ip-172-31-54-168.ec2.internal} Successfully pulled image "openshift3/ose-deployer:v3.2.1.7"
15m 15m 1 python-app-94-deploy Pod spec.containers{deployment} Normal Created {kubelet ip-172-31-54-168.ec2.internal} Created container with docker id 0f1645b84a7e
15m 15m 1 python-app-94-deploy Pod spec.containers{deployment} Normal Started {kubelet ip-172-31-54-168.ec2.internal} Started container with docker id 0f1645b84a7e
15m 15m 1 python-app DeploymentConfig Normal DeploymentCreated {deploymentconfig-controller } Created new deployment "python-app-94" for version 94
15m 15m 1 python-app DeploymentConfig Warning FailedUpdate {deployment-controller } Cannot update deployment my-app/python-app-94 status to Pending: replicationcontrollers "python-app-94" cannot be updated: the object has been modified; please apply your changes to the latest version and try again
I just changed: Browse > Deployments > python-app > Set Resource Limits: Memory: 1GB to: Browse > Deployments > python-app > Set Resource Limits: Memory: 525MB And the python pod deployed and scaled up quickly. I then tried to scale up MongoDB pod and it got to the light blue "not ready" stage - events tab shows: 10:16:36 PM Warning Unhealthy Readiness probe failed: sh: cannot set terminal process group (-1): Inappropriate ioctl for device sh: no job control in this shell sh: mongostat: command not found 9 times in the last minute 10:15:05 PM Normal Created Created container with docker id 7d1071cb67ce 10:15:05 PM Normal Started Started container with docker id 7d1071cb67ce 10:15:04 PM Normal Pulled Successfully pulled image "registry.access.redhat.com/rhscl/mongodb-32-rhel7@sha256:888c0b99e71bf21382e7471f5f6a48d4e52cf7b43b10ce57df05e7b03843c964" 10:15:02 PM Normal Pulling pulling image "registry.access.redhat.com/rhscl/mongodb-32-rhel7@sha256:888c0b99e71bf21382e7471f5f6a48d4e52cf7b43b10ce57df05e7b03843c964" 10:14:56 PM Normal Scheduled Successfully assigned mongodb-12-chn9u to ip-172-31-54-168.ec2.internal If I scale down the python pod to 0 (hoping to give the mongodb pod more available juice) and try and scale up the mongodb pod, the same events tab message above displays. I rolled back to mongodb deployment #11 (where #12 was the automatically triggered deployment that I didn't trigger) and the mongodb pod scaled up. The python pod could then be scaled up. So two issues: - The new automatic deployment 'doesn't work'. - The system seems to prefer when MongoDB memory is 1GB and Python memory is 525MB - even though I think I have 4GB memory available :/ > Is there a way to minimise output of `oc get all -o yaml` so that only required information is displayed - it currently is very lengthy and includes `tokens` and `secrets` fields - not sure if ok to post here?
oc get dc,rc -o yaml then and put it in a pastebin
Output of oc get dc,rc -o yaml: http://pastebin.com/U069VnWR Can anyone give a definitive answer as to why the 4GB memory allowance for the project is not allowing two pods (set to recreate strategy) not to comfortably build deploy and scale when both mongodb and python pods are allocated 1GB memory. I recently had to take the python pod down to 525MB in order to be able to scale it up without errors as explained above. Thank you. PS - It just occurred to me that the reason all pods went down and they were so difficult to get back up again was that python and mongodb were on 1GB memory each, and then the automatic mongodb deploy was triggered (when both pods were in a scaled up state), and it was this action that caused a lack of resources which resulted in unable to mount and readiness probe errors etc. Just a theory but if a similar thing happens to someone else, perhaps it can be tested. Would still be good to know why 4GB memory resource is not adequate for supporting 2 x 1GB deployments and an automatically triggered deploy (all with recreate strategy). To your question about why 4GB of "non-terminating" quota is not enough... Your "terminating" quota is perhaps coming into play. This quota is used to run the builder/deployer pods as well as the hook pods. The resources that these pods specify is the same as what is specified in the corresponding DeploymentConfig and BuildConfig. Can you check if this is the issue (your deployment is failing on account of a lack of resources in your "terminating" quota? |