Bug 791123

Summary: dbomatic - more robust instance checking
Product: [Retired] CloudForms Cloud Engine Reporter: Jan Provaznik <jprovazn>
Component: aeolus-conductorAssignee: Mo Morsi <mmorsi>
Status: CLOSED ERRATA QA Contact: pushpesh sharma <psharma>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 1.0.0CC: akarol, athomas, bbandari, dajohnso, deltacloud-maint, hbrock, jeckersb, jturner, mfojtik, morazi, ssachdev
Target Milestone: beta5Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-05-15 22:37:08 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Jan Provaznik 2012-02-16 09:20:02 UTC
when dbomatic checks instances status for an account (method check_one_account), it iterates through all account's instances and calls dc-api instance get request (api_instance = connection.instance(instance.external_key)

It seems that this request can raise an exception for some instances, this is core log when an exception raises:

I, [2012-02-16T04:17:20.858803 #10873]  INFO -- : New Aws::Ec2 using per_thread-connection mode
D, [2012-02-16T04:17:20.869549 #10873] DEBUG -- : Rightscale::HttpConnection : server ec2.us-east-1.amazonaws.com closed connection
I, [2012-02-16T04:17:21.370164 #10873]  INFO -- : Opening new HTTPS connection to ec2.us-east-1.amazonaws.com:443
thin server (localhost:3002) [deltacloud-mock][10873]: NoMethodError:undefined method `[]' for nil:NilClass
/usr/share/deltacloud-core/lib/deltacloud/drivers/ec2/ec2_driver.rb:797:in `convert_instance'
/usr/share/deltacloud-core/lib/deltacloud/drivers/ec2/ec2_driver.rb:186:in `instance'
/usr/share/deltacloud-core/lib/deltacloud/base_driver/exceptions.rb:151:in `call'
/usr/share/deltacloud-core/lib/deltacloud/base_driver/exceptions.rb:151:in `safely'
/usr/share/deltacloud-core/lib/deltacloud/drivers/ec2/ec2_driver.rb:184:in `instance'
/usr/share/deltacloud-core/lib/deltacloud/helpers/application_helper.rb:93:in `send'
/usr/share/deltacloud-core/lib/deltacloud/helpers/application_helper.rb:93:in `show'
/usr/lib/ruby/1.8/benchmark.rb:293:in `measure'
/usr/share/deltacloud-core/lib/deltacloud/helpers/application_helper.rb:92:in `show'
/usr/share/deltacloud-core/lib/deltacloud/server.rb:470
/usr/share/deltacloud-core/lib/sinatra/rabbit.rb:125:in `instance_eval'


If an exception occurs, rest of instances is not checked -> we should handle an exception and try to check rest of instances.

Comment 1 Jan Provaznik 2012-02-16 09:24:14 UTC
michal: can you check pls if the exception above is common for ec2?

Comment 2 Michal Fojtik 2012-02-16 12:03:57 UTC
Jan, what version of DC you're running? This exception is definitely a DC related.

Comment 3 Michal Fojtik 2012-02-16 12:32:55 UTC
commit f147e9f360ae276544081d2d1419188e64423d89
Author: Michal Fojtik <mfojtik>
Date:   Thu Feb 16 13:31:51 2012 +0100

    EC2: Raise an 404 exception when trying to access non-existing instance

Comment 4 Michal Fojtik 2012-02-16 12:33:58 UTC
diff --git a/server/lib/deltacloud/drivers/ec2/ec2_driver.rb b/server/lib/deltacloud/drivers/ec2/ec2_driver.rb
index 4569056..74e110f 100644
--- a/server/lib/deltacloud/drivers/ec2/ec2_driver.rb
+++ b/server/lib/deltacloud/drivers/ec2/ec2_driver.rb
@@ -184,6 +184,7 @@ module Deltacloud
           inst_arr = []
           safely do
             ec2_inst = ec2.describe_instances([opts[:id]]).first
+            raise "Instance #{opts[:id]} NotFound" if ec2_inst.nil?
             instance = convert_instance(ec2_inst)
             return nil unless instance
             if ec2_inst[:aws_platform] == 'windows'

Comment 5 Jan Provaznik 2012-02-16 15:40:32 UTC
Even with Michal's fix on dc-core side, an exception is raised -> we need to handle this exception in dbomatic.

Comment 6 John Eckersberg 2012-02-22 19:33:07 UTC
Moving this back to ON_DEV from MODIFIED since it seems a dbomatic patch is required to proceed with building.

Comment 7 Jan Provaznik 2012-02-23 12:32:23 UTC
assigning to angus to do reassignment - patch is needed in dbomatic (conductor side) too. I would suggest to assign higher priority - this breaks instance status checking if a user deletes instances on provider side (ec2).

Comment 8 Mo Morsi 2012-02-29 20:11:54 UTC
How do you produce this issue? Eg under what circumstances is the exception raised for the instance?

Under a fresh Aeolus install w/ the components from the F16 repo, I get the following in the deltacloud log if I try to manually querying for an instance which does not exist:


Deltacloud::ExceptionHandler::ObjectNotFound - InvalidInstanceID.NotFound: The instance ID 'i-01234567' does not exist
REQUEST=ec2.us-east-1.amazonaws.com:443/<snip>
 /usr/lib/ruby/gems/1.8/gems/aws-2.5.5/lib/ses/../awsbase/awsbase.rb:572:in `request_info_impl'
 /usr/lib/ruby/gems/1.8/gems/aws-2.5.5/lib/ec2/ec2.rb:177:in `request_info'
 /usr/lib/ruby/gems/1.8/gems/aws-2.5.5/lib/ses/../awsbase/awsbase.rb:586:in `request_cache_or_info'
 /usr/lib/ruby/gems/1.8/gems/aws-2.5.5/lib/ec2/ec2.rb:432:in `describe_instances'
 /usr/share/deltacloud-core/lib/deltacloud/drivers/ec2/ec2_driver.rb:185:in `instance'
 /usr/share/deltacloud-core/lib/deltacloud/base_driver/exceptions.rb:151:in `call'
 /usr/share/deltacloud-core/lib/deltacloud/base_driver/exceptions.rb:151:in `safely'
 /usr/share/deltacloud-core/lib/deltacloud/drivers/ec2/ec2_driver.rb:184:in `instance'
 /usr/share/deltacloud-core/lib/deltacloud/helpers/application_helper.rb:93:in `send'
 /usr/share/deltacloud-core/lib/deltacloud/helpers/application_helper.rb:93:in `show'
 /usr/lib/ruby/1.8/benchmark.rb:293:in `measure'
<snip>


Note the aws client itself is throwing an exception even before we get to the one added in the deltacloud patch. How do I trigger the aforementioned error / problem in dbomatic / deltacloud?

Comment 9 Mo Morsi 2012-03-05 21:11:41 UTC
Patch sent to list

https://fedorahosted.org/pipermail/aeolus-devel/2012-March/009410.html

Comment 10 Mo Morsi 2012-03-06 14:40:09 UTC
Patch pushed to repo

Comment 12 Dave Johnson 2012-03-16 19:34:44 UTC
Marking needinfo for verification steps...

Comment 13 Jan Provaznik 2012-03-19 08:01:40 UTC
The problem is that dc core can raise an exception when requesting a non-existing instance, this happens only for some providers - e.g. ec2. Example:


irb(main):005:0> ec2.connect.instance('i-1234435')
DeltaCloud::API::BackendError: 502 : InvalidInstanceID.NotFound: The instance ID 'i-01234435' does not exist
  REQUEST=ec2.us-east-1.amazonaws.com:443/?AWSAccessKeyId=AKIAJDNGKF2QAIF5HCSQ&Action=DescribeInstances&InstanceId.1=i-1234435&SignatureMethod=HmacSHA256&SignatureVersion=2&Timestamp=2012-03-19T07%3A49%3A13.000Z&Version=2010-08-31&Signature=FLPHYh1GN20JRRENN338%2BGBumTUjLba1uKXVO20OPVQ%3D 
  REQUEST ID=ba03b46a-8dc8-4b59-9379-b16aba4e3489
	from /usr/lib/ruby/gems/1.8/gems/deltacloud-client-0.5.0/lib/deltacloud.rb:432:in `handle_backend_error'
	from /usr/lib/ruby/gems/1.8/gems/deltacloud-client-0.5.0/lib/deltacloud.rb:390:in `request'
	from /usr/lib/ruby/gems/1.8/gems/rest-client-1.6.1/lib/restclient/request.rb:218:in `call'


So reproduction steps would be:
1. start 3 ec2 instances
2. when all 3 instances are running, stop dbomatic and terminate 1st instance (which was created first in conductor) through aws console
3. when the terminated instance completely disappears from ec2, start dbomatic again
4. w/o this patch, you should see error message + backtrace in dbomatic log similar to the error I pasted above
5. if you stop 2nd and 3rd instance from conductor, their status should not be updated in conductor even if they are really stopped

Comment 14 wes hayutin 2012-03-30 19:02:26 UTC
assigning to Pushpesh

Comment 15 pushpesh sharma 2012-04-03 05:49:32 UTC
Tried steps specified by Jan Provaznik, no error logs seen in dbomatic logs. dbomatic service started and stopped in between instances stopped from both side i.e. ec2-console and conductor.

Comment 16 pushpesh sharma 2012-04-03 05:54:51 UTC
verified on 

[root@qe-blade-01 ~]# rpm -qa|grep aeolus
aeolus-conductor-doc-0.8.7-1.el6.noarch
aeolus-configure-2.5.2-1.el6.noarch
aeolus-conductor-0.8.7-1.el6.noarch
rubygem-aeolus-cli-0.3.1-1.el6.noarch
rubygem-aeolus-image-0.3.0-12.el6.noarch
aeolus-conductor-daemons-0.8.7-1.el6.noarch
aeolus-all-0.8.7-1.el6.noarch

Comment 17 errata-xmlrpc 2012-05-15 22:37:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2012-0583.html