Bug 783218
| Summary: | Failure to launch (start) a multiple assembly deployable to rhev-m | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Retired] CloudForms Cloud Engine | Reporter: | Dave Wilson <dwilson> | ||||||||||||||||
| Component: | aeolus-conductor | Assignee: | Tomas Hrcka <thrcka> | ||||||||||||||||
| Status: | CLOSED ERRATA | QA Contact: | Dave Johnson <dajohnso> | ||||||||||||||||
| Severity: | low | Docs Contact: | |||||||||||||||||
| Priority: | unspecified | ||||||||||||||||||
| Version: | 1.0.0 | CC: | akarol, asettle, cpelland, deltacloud-maint, dmacpher, jprovazn, juwu, morazi, rwsu, ssachdev, thrcka | ||||||||||||||||
| Target Milestone: | rc | ||||||||||||||||||
| Target Release: | --- | ||||||||||||||||||
| Hardware: | x86_64 | ||||||||||||||||||
| OS: | Linux | ||||||||||||||||||
| Whiteboard: | |||||||||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||
| Doc Text: |
When launching multiple instances in a RHEV environment with not enough hardware resources, some instances started and some instances remained stopped. This bug fix adds rollback in Conductor, so that if there is not enough hardware resources to launch multiple instances, all instances remain in a stopped state.
|
Story Points: | --- | ||||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||||
| Last Closed: | 2012-12-04 14:55:53 UTC | Type: | --- | ||||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||
| Embargoed: | |||||||||||||||||||
| Attachments: |
|
||||||||||||||||||
Created attachment 556336 [details]
log files
I should clarify that by "stays in stopped state" is in context to rhev-m. On the conductor UI the state of instance is "new". Created attachment 556535 [details]
requested logs covering instance launch
Attempted to update a stale object: Quota /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/locking/optimistic.rb:97:in `update' /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/attribute_methods/dirty.rb:68:in `update' /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/timestamp.rb:60:in `update' /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/callbacks.rb:281:in `update' /usr/lib/ruby/gems/1.8/gems/activesupport-3.0.10/lib/active_support/callbacks.rb:414:in `_run_update_callbacks' /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/callbacks.rb:281:in `update' /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/persistence.rb:257:in `create_or_update' /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/callbacks.rb:273:in `create_or_update' /usr/lib/ruby/gems/1.8/gems/activesupport-3.0.10/lib/active_support/callbacks.rb:414:in `_run_save_callbacks' /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/callbacks.rb:273:in `create_or_update' /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/persistence.rb:60:in `save!' /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/validations.rb:49:in `save!' /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/attribute_methods/dirty.rb:30:in `save!' /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/transactions.rb:245:in `save!' /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/transactions.rb:292:in `with_transaction_returning_status' /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/connection_adapters/abstract/database_statements.rb:139:in `transaction' /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/transactions.rb:207:in `transaction' /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/transactions.rb:290:in `with_transaction_returning_status' /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/transactions.rb:245:in `save!' /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/associations/association_proxy.rb:222:in `send' /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/associations/association_proxy.rb:222:in `method_missing' /usr/share/aeolus-conductor/app/models/instance_observer.rb:74:in `update_quota' /usr/share/aeolus-conductor/app/models/instance_observer.rb:57:in `each' /usr/share/aeolus-conductor/app/models/instance_observer.rb:57:in `update_quota' /usr/share/aeolus-conductor/app/models/instance_observer.rb:28:in `before_save' /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/observer.rb:118:in `send' /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/observer.rb:118:in `_notify_instance_observer_for_before_save' /usr/lib/ruby/gems/1.8/gems/activesupport-3.0.10/lib/active_support/callbacks.rb:455:in `_run_save_callbacks' /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/callbacks.rb:273:in `create_or_update' /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/persistence.rb:60:in `save!' /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/validations.rb:49:in `save!' /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/attribute_methods/dirty.rb:30:in `save!' /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/transactions.rb:245:in `save!' /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/transactions.rb:292:in `with_transaction_returning_status' /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/connection_adapters/abstract/database_statements.rb:139:in `transaction' /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/transactions.rb:207:in `transaction' /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/transactions.rb:290:in `with_transaction_returning_status' /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/transactions.rb:245:in `save!' /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/associations/association_proxy.rb:222:in `send' /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/associations/association_proxy.rb:222:in `method_missing' /usr/share/aeolus-conductor/app/util/taskomatic.rb:36:in `create_instance' /usr/share/aeolus-conductor/app/models/deployment.rb:268:in `launch' /usr/share/aeolus-conductor/app/models/deployment.rb:249:in `each' /usr/share/aeolus-conductor/app/models/deployment.rb:249:in `launch' /usr/share/aeolus-conductor/app/controllers/deployments_controller.rb:126:in `create' /usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/action_controller/metal/mime_responds.rb:264:in `call' /usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/action_controller/metal/mime_responds.rb:264:in `retrieve_response_from_mimes' /usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/action_controller/metal/mime_responds.rb:191:in `respond_to' /usr/share/aeolus-conductor/app/controllers/deployments_controller.rb:124:in `create' Dave, Are you still seeing this bug in the latest rpms? I haven't been able to reproduce it on aeolus-conductor-0.8.0-15.el6.noarch or on the version you used. Tomas has posted a patch to resolve it if it is still happening. Created attachment 559335 [details]
Sceenshot-Conductor-0.8.0-19
In short, yes I'm still seeing an error. The behavior has been modified though. In starting a 50 assembly deployable an error is returned to the UI (see attachment "Sceenshot-Conductor-0.8.0-19"). However, now the state of all the instances in the Conductor UI is "running" and in rhev-m status is "up" as opposed to "pending" and "stopped" respectively in the previously tested release. So the desired result is achieved but the error returned to the UI needs attention. current version: aeolus-conductor-0.8.0-19 Verified that Tomáš Hrčka patch that adds pessimistic database locking for quota to prevent updating with old values fixes the issue described in the bz. Verified with launching deployables of 50 and 100 assemblies to rhev-m. Pushed Tomáš's patch. aeolus-conductor commit d03def71904642a4f22c5a95128430f0faae93fa Fix is in aeolus-conductor-0.8.0-22 I am still seeing this although it is not displaying errors in the UI now... I can;t speak for Dave Wilson's experience, perhaps he can do a round of testing and comment with his feedback, but this is all based on the rhevm environment and the amount of resources it has. Basically I attempt to launch 20 instances to a rhevm hypervisor with only 8GB of memory. Maybe 14 launch but the rest stayed in pending. I looked in dbomatic.log and found the stale object exception I looked in deltacloud/mock.log and found rhevm backend exceptions on out of memory So my opinion, work is needed so exceptions can communicate status on a per assembly basis and give the end user feedback as to what is going on. Created attachment 565644 [details]
screenshot
Not much to see here, just that multiple assembly launch feedback has been cleaned up to a single line which I believe is better.
Also shows images as stopped, they never started because of out of memory limitations of the rhevm hypervisor host
Created attachment 565648 [details]
dbomatic.log
Created attachment 565650 [details]
deltacloud-core/mock.log
I should have mentioned that this is all seen using... aeolus-all-0.8.0-35.el6.noarch aeolus-conductor-0.8.0-35.el6.noarch aeolus-conductor-daemons-0.8.0-35.el6.noarch aeolus-conductor-doc-0.8.0-35.el6.noarch aeolus-configure-2.5.0-15.el6.noarch deltacloud-core-0.5.0-5.el6.noarch deltacloud-core-ec2-0.5.0-5.el6.noarch deltacloud-core-rhevm-0.5.0-5.el6.noarch deltacloud-core-vsphere-0.5.0-5.el6.noarch rubygem-aeolus-cli-0.3.0-10.el6.noarch rubygem-aeolus-image-0.3.0-9.el6.noarch rubygem-deltacloud-client-0.5.0-2.el6.noarch Tomas Hrcka says that this BZ is resolved by the following commits: 41a068fc24725015f19c86575f505830a98294a1 e3f37f5dab9fda7142d329490060205054d543c4 7451414c6c89f17b80768451aeb2164b400464c2 b15e9f9d4ee484c35d2d3131e3a991e41f570d2e As Jan mentioned those patches do rollback correctly for rhevm if there is not enough HW resources. So if you have not enough resources in process of launching multi instance deployment, launched instances will be stopped and deployment will end in consistent state when all instances are stopped. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2012-1516.html |
Created attachment 556335 [details] conductor.png Description of problem: When launching a deployable of 25 assemblies of same image to a rhev-m environment only ~20 start in rhev-m. The other five stay in a stopped state. The conductor UI returns "Attempted to update a stale object: Quota" on the assemblies that failed to start. Version-Release number of selected component (if applicable): aeolus-conductor-0.8.0-5 How reproducible: Steps to Reproduce: 1.Deploy a 25 assembly deployable with same image to rhev-m 2. 3. Actual results:Only 20 of 25 instances were started Expected results:All 25 instances will be created and started Additional info: /var/logs included (event transpired on 1/19/12 ~11:15AM. Screenshot of conductor error also attached.