Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 783218

Summary: Failure to launch (start) a multiple assembly deployable to rhev-m
Product: [Retired] CloudForms Cloud Engine Reporter: Dave Wilson <dwilson>
Component: aeolus-conductorAssignee: Tomas Hrcka <thrcka>
Status: CLOSED ERRATA QA Contact: Dave Johnson <dajohnso>
Severity: low Docs Contact:
Priority: unspecified    
Version: 1.0.0CC: akarol, asettle, cpelland, deltacloud-maint, dmacpher, jprovazn, juwu, morazi, rwsu, ssachdev, thrcka
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
When launching multiple instances in a RHEV environment with not enough hardware resources, some instances started and some instances remained stopped. This bug fix adds rollback in Conductor, so that if there is not enough hardware resources to launch multiple instances, all instances remain in a stopped state.
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-12-04 14:55:53 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
conductor.png
none
log files
none
requested logs covering instance launch
none
Sceenshot-Conductor-0.8.0-19
none
screenshot
none
dbomatic.log
none
deltacloud-core/mock.log none

Description Dave Wilson 2012-01-19 17:36:14 UTC
Created attachment 556335 [details]
conductor.png

Description of problem: When launching a deployable of 25 assemblies of same image to a rhev-m environment only ~20 start in rhev-m. The other five stay in a stopped state. 
The conductor UI returns  "Attempted to update a stale object: Quota" on the assemblies that failed to start.


Version-Release number of selected component (if applicable): aeolus-conductor-0.8.0-5


How reproducible:


Steps to Reproduce:
1.Deploy a 25 assembly deployable with same image to rhev-m 
2.
3.
  
Actual results:Only 20 of 25 instances were started


Expected results:All 25 instances will be created and started


Additional info: /var/logs included (event transpired on 1/19/12 ~11:15AM. Screenshot of conductor error also attached.

Comment 1 Dave Wilson 2012-01-19 17:42:31 UTC
Created attachment 556336 [details]
log files

Comment 2 Dave Wilson 2012-01-19 17:59:08 UTC
I should clarify that by "stays in stopped state" is in context to rhev-m. On the conductor UI the state of instance is "new".

Comment 3 Dave Wilson 2012-01-20 15:49:51 UTC
Created attachment 556535 [details]
requested logs covering instance launch

Comment 4 wes hayutin 2012-01-20 16:17:30 UTC
Attempted to update a stale object: Quota
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/locking/optimistic.rb:97:in `update'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/attribute_methods/dirty.rb:68:in `update'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/timestamp.rb:60:in `update'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/callbacks.rb:281:in `update'
/usr/lib/ruby/gems/1.8/gems/activesupport-3.0.10/lib/active_support/callbacks.rb:414:in `_run_update_callbacks'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/callbacks.rb:281:in `update'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/persistence.rb:257:in `create_or_update'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/callbacks.rb:273:in `create_or_update'
/usr/lib/ruby/gems/1.8/gems/activesupport-3.0.10/lib/active_support/callbacks.rb:414:in `_run_save_callbacks'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/callbacks.rb:273:in `create_or_update'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/persistence.rb:60:in `save!'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/validations.rb:49:in `save!'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/attribute_methods/dirty.rb:30:in `save!'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/transactions.rb:245:in `save!'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/transactions.rb:292:in `with_transaction_returning_status'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/connection_adapters/abstract/database_statements.rb:139:in `transaction'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/transactions.rb:207:in `transaction'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/transactions.rb:290:in `with_transaction_returning_status'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/transactions.rb:245:in `save!'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/associations/association_proxy.rb:222:in `send'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/associations/association_proxy.rb:222:in `method_missing'
/usr/share/aeolus-conductor/app/models/instance_observer.rb:74:in `update_quota'
/usr/share/aeolus-conductor/app/models/instance_observer.rb:57:in `each'
/usr/share/aeolus-conductor/app/models/instance_observer.rb:57:in `update_quota'
/usr/share/aeolus-conductor/app/models/instance_observer.rb:28:in `before_save'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/observer.rb:118:in `send'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/observer.rb:118:in `_notify_instance_observer_for_before_save'
/usr/lib/ruby/gems/1.8/gems/activesupport-3.0.10/lib/active_support/callbacks.rb:455:in `_run_save_callbacks'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/callbacks.rb:273:in `create_or_update'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/persistence.rb:60:in `save!'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/validations.rb:49:in `save!'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/attribute_methods/dirty.rb:30:in `save!'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/transactions.rb:245:in `save!'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/transactions.rb:292:in `with_transaction_returning_status'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/connection_adapters/abstract/database_statements.rb:139:in `transaction'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/transactions.rb:207:in `transaction'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/transactions.rb:290:in `with_transaction_returning_status'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/transactions.rb:245:in `save!'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/associations/association_proxy.rb:222:in `send'
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/associations/association_proxy.rb:222:in `method_missing'
/usr/share/aeolus-conductor/app/util/taskomatic.rb:36:in `create_instance'
/usr/share/aeolus-conductor/app/models/deployment.rb:268:in `launch'
/usr/share/aeolus-conductor/app/models/deployment.rb:249:in `each'
/usr/share/aeolus-conductor/app/models/deployment.rb:249:in `launch'
/usr/share/aeolus-conductor/app/controllers/deployments_controller.rb:126:in `create'
/usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/action_controller/metal/mime_responds.rb:264:in `call'
/usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/action_controller/metal/mime_responds.rb:264:in `retrieve_response_from_mimes'
/usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/action_controller/metal/mime_responds.rb:191:in `respond_to'
/usr/share/aeolus-conductor/app/controllers/deployments_controller.rb:124:in `create'

Comment 5 Richard Su 2012-02-02 04:27:59 UTC
Dave,

Are you still seeing this bug in the latest rpms? I haven't been able to reproduce it on aeolus-conductor-0.8.0-15.el6.noarch or on the version you used.

Tomas has posted a patch to resolve it if it is still happening.

Comment 6 Dave Wilson 2012-02-03 18:04:25 UTC
Created attachment 559335 [details]
Sceenshot-Conductor-0.8.0-19

Comment 7 Dave Wilson 2012-02-03 18:12:17 UTC
In short, yes I'm still seeing an error. The behavior has been modified though. 

In starting a 50 assembly deployable an error is returned to the UI (see attachment "Sceenshot-Conductor-0.8.0-19"). However, now the state of all the instances in the Conductor UI is "running" and in rhev-m status is "up" as opposed to "pending" and "stopped" respectively in the previously tested release.
So the desired result is achieved but the error returned to the UI needs attention.

current version: aeolus-conductor-0.8.0-19

Comment 8 Dave Wilson 2012-02-06 18:57:15 UTC
Verified that Tomáš Hrčka patch that adds pessimistic database locking for quota to prevent updating with old values fixes the issue described in the bz. 

Verified with launching deployables of 50 and 100 assemblies to rhev-m.

Comment 9 Richard Su 2012-02-06 19:54:03 UTC
Pushed Tomáš's patch.

aeolus-conductor commit d03def71904642a4f22c5a95128430f0faae93fa

Comment 10 Richard Su 2012-02-07 16:58:49 UTC
Fix is in aeolus-conductor-0.8.0-22

Comment 11 Dave Johnson 2012-02-24 17:08:03 UTC
I am still seeing this although it is not displaying errors in the UI now...

I can;t speak for Dave Wilson's experience, perhaps he can do a round of testing and comment with his feedback, but this is all based on the rhevm environment and the amount of resources it has.

Basically I attempt to launch 20 instances to a rhevm hypervisor with only 8GB of memory.  Maybe 14 launch but the rest stayed in pending.  

I looked in dbomatic.log and found the stale object exception

I looked in deltacloud/mock.log and found rhevm backend exceptions on out of memory

So my opinion, work is needed so exceptions can communicate status on a per assembly basis and give the end user feedback as to what is going on.

Comment 12 Dave Johnson 2012-02-24 17:15:39 UTC
Created attachment 565644 [details]
screenshot

Not much to see here, just that multiple assembly launch feedback has been cleaned up to a single line which I believe is better.  

Also shows images as stopped, they never started because of out of memory limitations of the rhevm hypervisor host

Comment 13 Dave Johnson 2012-02-24 17:21:03 UTC
Created attachment 565648 [details]
dbomatic.log

Comment 14 Dave Johnson 2012-02-24 17:23:11 UTC
Created attachment 565650 [details]
deltacloud-core/mock.log

Comment 15 Dave Johnson 2012-02-24 17:25:25 UTC
I should have mentioned that this is all seen using...

aeolus-all-0.8.0-35.el6.noarch
aeolus-conductor-0.8.0-35.el6.noarch
aeolus-conductor-daemons-0.8.0-35.el6.noarch
aeolus-conductor-doc-0.8.0-35.el6.noarch
aeolus-configure-2.5.0-15.el6.noarch
deltacloud-core-0.5.0-5.el6.noarch
deltacloud-core-ec2-0.5.0-5.el6.noarch
deltacloud-core-rhevm-0.5.0-5.el6.noarch
deltacloud-core-vsphere-0.5.0-5.el6.noarch
rubygem-aeolus-cli-0.3.0-10.el6.noarch
rubygem-aeolus-image-0.3.0-9.el6.noarch
rubygem-deltacloud-client-0.5.0-2.el6.noarch

Comment 16 Jan Provaznik 2012-09-06 17:25:36 UTC
Tomas Hrcka says that this BZ is resolved by the following commits:
41a068fc24725015f19c86575f505830a98294a1
e3f37f5dab9fda7142d329490060205054d543c4
7451414c6c89f17b80768451aeb2164b400464c2
b15e9f9d4ee484c35d2d3131e3a991e41f570d2e

Comment 17 Tomas Hrcka 2012-09-10 08:10:40 UTC
As Jan mentioned those patches do rollback correctly for rhevm if there is not enough HW resources. So if you have not enough resources in process of launching multi instance deployment,  launched instances will be stopped and deployment will end in consistent state when all instances are stopped.

Comment 20 errata-xmlrpc 2012-12-04 14:55:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2012-1516.html