Bug 786535

Summary: RHEVMBackendException:Cannot run VM .... not enough memory
Product: [Retired] CloudForms Cloud Engine Reporter: Dave Johnson <dajohnso>
Component: aeolus-conductorAssignee: Imre Farkas <ifarkas>
Status: CLOSED ERRATA QA Contact: wes hayutin <whayutin>
Severity: medium Docs Contact:
Priority: low    
Version: 1.0.0CC: akarol, asettle, athomas, cpelland, deltacloud-maint, dmacpher, fvollero, hbrock, juwu, jzigmund, mfojtik, rananda, rlandy, rwsu, ssachdev, whayutin
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
When the number of virtual machines created in the Red Hat Enterprise Virtualization environment exceed the resource capacity to run them, a STOPPED state returned for some of the virtual machines when Conductor attempts to start them. This bug fix updates deployments_controller.rb and provides an error message to users indicating the nature of the problem.
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-12-04 14:56:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
flash msg only after reload
none
Alert shown w/o refresh none

Description Dave Johnson 2012-02-01 17:42:30 UTC
Description of problem:
============================================
Through conductor I deployed 10 images but only 9 vms actually started.  Looking through the logs I found this backend exception in deltacloud-core.  This type of error need to be communicated to the user.  

Version-Release number of selected component (if applicable):
=================================================================
deltacloud-core-0.5.0-4.rc1.el6.noarch
deltacloud-core-ec2-0.5.0-4.rc1.el6.noarch
deltacloud-core-rhevm-0.5.0-4.rc1.el6.noarch
deltacloud-core-vsphere-0.5.0-4.rc1.el6.noarch
rubygem-deltacloud-client-0.4.0-3.el6.noarch


Additional info:
==================================
thin server (localhost:3002) [deltacloud-mock][18292]: RHEVM::RHEVMBackendException:Cannot run VM. There are no available running Hosts with sufficient memory in VM's Cluster .
/usr/share/deltacloud-core/bin/../lib/deltacloud/drivers/rhevm/rhevm_client.rb:89:in `vm_action'
/usr/share/deltacloud-core/bin/../lib/deltacloud/drivers/rhevm/rhevm_driver.rb:153:in `start_instance'
/usr/share/deltacloud-core/bin/../lib/deltacloud/base_driver/exceptions.rb:151:in `call'
/usr/share/deltacloud-core/bin/../lib/deltacloud/base_driver/exceptions.rb:151:in `safely'
/usr/share/deltacloud-core/bin/../lib/deltacloud/drivers/rhevm/rhevm_driver.rb:152:in `start_instance'
/usr/share/deltacloud-core/bin/../lib/deltacloud/helpers/application_helper.rb:128:in `send'
/usr/share/deltacloud-core/bin/../lib/deltacloud/helpers/application_helper.rb:128:in `instance_action'
/usr/share/deltacloud-core/bin/../lib/deltacloud/server.rb:503
/usr/share/deltacloud-core/bin/../lib/sinatra/rabbit.rb:125:in `instance_eval'
/usr/share/deltacloud-core/bin/../lib/sinatra/rabbit.rb:125:in `POST /api/instances/:id/start'
/usr/lib/ruby/gems/1.8/gems/sinatra-1.2.6/lib/sinatra/base.rb:1151:in `call'
/usr/lib/ruby/gems/1.8/gems/sinatra-1.2.6/lib/sinatra/base.rb:1151:in `compile!'
/usr/lib/ruby/gems/1.8/gems/sinatra-1.2.6/lib/sinatra/base.rb:724:in `instance_eval'
/usr/lib/ruby/gems/1.8/gems/sinatra-1.2.6/lib/sinatra/base.rb:724:in `route_eval'
/usr/lib/ruby/gems/1.8/gems/sinatra-1.2.6/lib/sinatra/base.rb:708:in `route!'
/usr/lib/ruby/gems/1.8/gems/sinatra-1.2.6/lib/sinatra/base.rb:758:in `process_route'
/usr/lib/ruby/gems/1.8/gems/sinatra-1.2.6/lib/sinatra/base.rb:755:in `catch'
/usr/lib/ruby/gems/1.8/gems/sinatra-1.2.6/lib/sinatra/base.rb:755:in `process_route'
/usr/lib/ruby/gems/1.8/gems/sinatra-1.2.6/lib/sinatra/base.rb:707:in `route!'
/usr/lib/ruby/gems/1.8/gems/sinatra-1.2.6/lib/sinatra/base.rb:706:in `each'
/usr/lib/ruby/gems/1.8/gems/sinatra-1.2.6/lib/sinatra/base.rb:706:in `route!'
/usr/lib/ruby/gems/1.8/gems/sinatra-1.2.6/lib/sinatra/base.rb:843:in `dispatch!'
/usr/lib/ruby/gems/1.8/gems/sinatra-1.2.6/lib/sinatra/base.rb:644:in `call!'
/usr/lib/ruby/gems/1.8/gems/sinatra-1.2.6/lib/sinatra/base.rb:808:in `instance_eval'
/usr/lib/ruby/gems/1.8/gems/sinatra-1.2.6/lib/sinatra/base.rb:808:in `invoke'
/usr/lib/ruby/gems/1.8/gems/sinatra-1.2.6/lib/sinatra/base.rb:808:in `catch'
/usr/lib/ruby/gems/1.8/gems/sinatra-1.2.6/lib/sinatra/base.rb:808:in `invoke'
/usr/lib/ruby/gems/1.8/gems/sinatra-1.2.6/lib/sinatra/base.rb:644:in `call!'
/usr/lib/ruby/gems/1.8/gems/sinatra-1.2.6/lib/sinatra/base.rb:629:in `call'
/usr/share/deltacloud-core/bin/../lib/sinatra/rack_syslog.rb:48:in `call'
/usr/share/deltacloud-core/bin/../lib/sinatra/rack_date.rb:31:in `call'
/usr/share/deltacloud-core/bin/../lib/sinatra/rack_accept.rb:149:in `call'
/usr/lib/ruby/gems/1.8/gems/rack-1.3.0/lib/rack/head.rb:9:in `call'
/usr/share/deltacloud-core/bin/../lib/sinatra/rack_driver_select.rb:45:in `call'
/usr/share/deltacloud-core/bin/../lib/sinatra/rack_matrix_params.rb:106:in `call'
/usr/share/deltacloud-core/bin/../lib/sinatra/rack_runtime.rb:36:in `call'
/usr/share/deltacloud-core/bin/../lib/sinatra/rack_etag.rb:41:in `call'
/usr/lib/ruby/gems/1.8/gems/rack-accept-0.4.3/lib/rack/accept/context.rb:22:in `call'

Comment 1 Dave Johnson 2012-02-01 17:43:40 UTC
So maybe this isn't deltacloud's issue, conductor needs to catch this?  

Please advise...

Comment 2 Dave Johnson 2012-02-02 17:32:12 UTC
Cleaning up ON_QA bugs I came across this related issue:

bug 744289 pointing to --> https://issues.apache.org/jira/browse/DTACLOUD-88

Comment 3 Francesco Vollero 2012-02-02 22:06:55 UTC
Is conductor problem not Deltacloud problem.

Comment 4 Ronelle Landy 2012-02-03 15:49:07 UTC
https://issues.apache.org/jira/browse/DTACLOUD-88 is still open, mostly because I had not gotten to that record in the JIRA cleanup process (reverifying and closing old reports). According to BZ-744289, DTACLOUD-88 should actually be resolved.

This report shows that Deltacloud is actually returning the backend error message as requested in DTACLOUD-88:  "Cannot run VM. There are no available running
Hosts with sufficient memory in VM's Cluster", not just the generic "Operation start failed."

Since the problem reported is that "this type of error need to be communicated to the user" and the user was interfacing with Conductor, I'm reassigning this BZ to that component.(Also see development's comments in Comment 3 above).

Comment 5 Michal Fojtik 2012-02-07 11:35:59 UTC
Dave, is this error properly reported through API? Mean do you get '50x' error? If so, then this is not a bug. DC just properly reports what it got from the RHEV-M server.

Comment 6 Angus Thomas 2012-02-07 14:16:36 UTC
Imre,

This looks like an instance where we're not reacting correctly to an error. Can you please investigate?


Angus

Comment 7 Dave Johnson 2012-02-09 14:33:18 UTC
*** Bug 788819 has been marked as a duplicate of this bug. ***

Comment 8 Imre Farkas 2012-02-14 16:33:14 UTC
Patch has been posted: https://fedorahosted.org/pipermail/aeolus-devel/2012-February/008839.html

Comment 9 Imre Farkas 2012-03-06 15:55:38 UTC
Based on the review pushing back to ON_DEV

Comment 11 Jozef Zigmund 2012-03-15 18:57:27 UTC
I tried to reproduce it more times.
Issues:
- wrong flash massage(exception from DC appeared in the UI when the user stopped instance)
- state of instance during starting changed from pending to create_failed and vice versa
- failed instance (rhevm out of memory) changed state from create_failed and stopped and vice versa.

Comment 12 Jozef Zigmund 2012-03-15 19:20:53 UTC
Created attachment 570415 [details]
flash msg only after reload

Comment 13 Imre Farkas 2012-03-20 16:58:17 UTC
As I wrote on the list, could you provide the steps necessary to reproduce these issues because I wasn't able to do that?

Comment 14 Jozef Zigmund 2012-03-22 17:36:18 UTC
After more reproductions I could not to hit issues with states. I think, it appears only with special cases. Before I had 4 running instances from 5 in deployment, I achieved running ones only 2-3 of 5 in today's reproducing. Also strange thing for me was that the count of running instances was various.

I had observed case when I stopped running instances then create_failed instance started in previous reproduction, but not in today's one. I'm not sure if it was caused by rhevm or dbomatic.

Still I think you should fix the flash message when the conductor gets error about memory and show it immediately not after refresh the page.

Comment 16 Imre Farkas 2012-05-23 12:20:23 UTC
Rebased revision sent: https://fedorahosted.org/pipermail/aeolus-devel/2012-May/010523.html

Comment 17 Imre Farkas 2012-06-12 13:30:37 UTC
Pushed to master:
commit f9d0e42701c5bf22e06363cfa9427fd16b206965
Author: Imre Farkas <ifarkas>
Date:   Tue Feb 14 16:51:00 2012 +0100

    BZ786535: display failures for instances (rev. 4)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=786535
    
    Rebased and autoupdate moved to mustache

Comment 19 Ronelle Landy 2012-09-21 19:36:31 UTC
Tested rpms:

>> rpm -qa |grep aeolus
aeolus-configure-2.8.6-1.el6cf.noarch
rubygem-aeolus-image-0.3.0-12.el6.noarch
rubygem-aeolus-cli-0.7.1-1.el6cf.noarch
aeolus-conductor-0.13.8-1.el6cf.noarch
aeolus-conductor-daemons-0.13.8-1.el6cf.noarch
aeolus-conductor-doc-0.13.8-1.el6cf.noarch
aeolus-all-0.13.8-1.el6cf.noarch

I launched a rhevm instance to a realm w/o available hosts. Conductor returned the following alert - no page refresh required:


***********

 Alerts 1

    ce-gqiig/testRHEVM
        Instance Failure
        500 : Cannot run VM. There are no available running Hosts in the Host Cluster.

***********

See the attached screenshot for full page view.

Marking this BZ as 'verified'

Comment 20 Ronelle Landy 2012-09-21 19:37:20 UTC
Created attachment 615589 [details]
Alert shown w/o refresh

Comment 22 errata-xmlrpc 2012-12-04 14:56:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2012-1516.html