Bug 698716 - dbomatic periodically crashes, no clear recreate atm
Summary: dbomatic periodically crashes, no clear recreate atm
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: CloudForms Cloud Engine
Classification: Retired
Component: aeolus-conductor
Version: 0.3.1
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
Assignee: Mo Morsi
QA Contact: wes hayutin
URL: na
Whiteboard:
Depends On:
Blocks: ce-beta ce-ami
TreeView+ depends on / blocked
 
Reported: 2011-04-21 15:22 UTC by wes hayutin
Modified: 2012-01-26 12:28 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)
dbomatic log (842.30 KB, text/plain)
2011-04-21 15:22 UTC, wes hayutin
no flags Details
logs.tgz (149.39 KB, application/octet-stream)
2011-05-06 18:25 UTC, Dave Johnson
no flags Details
conductor & condor logs (253.50 KB, application/x-gzip)
2011-05-19 22:28 UTC, Dave Johnson
no flags Details

Description wes hayutin 2011-04-21 15:22:11 UTC
Created attachment 493890 [details]
dbomatic log

Description of problem:

Just documenting the issue, after three days of trying to find a clear recreate for why dbomatic appears to crash periodically I am giving up.

To summarize, from what I can tell
After moderate use of cloud engine, bringing up 10-20 instances, shutting them down etc.. dbomatic *sometimes* will crash w/o any clear traces in the dbomatic logs.  The longer dbomatic runs the more likely IMHO to crash.  I was also able to get it crash by bringing up 10 instances and running condor_rm / condor_release on the jobs as the started and stopped..


I'll attach the only trace I was able to get out of dbomatic.

Comment 1 wes hayutin 2011-04-27 21:03:44 UTC
think the recreate is..

confirm dbomatic is up
start instances until you have one w/ an error... 
confirm dbomatic is up
use the webui, instances, check the instance in error.. "removed failed"

at this point I think dbomatic will be down

Comment 2 Dave Johnson 2011-05-06 15:42:41 UTC
So I have been attempting to chase this down and here is what I am seeing...

dbomatic dies when a unhandled ec2 exception occurs in deltacloud.  In terms of seeing any type of indication as to why in the dbomatic logs, there is none.  So a couple of things here that I believe need to happen and possibly split out into seperate bugs, harden dbomatic to prevent crashing when this occurs, and get that exception handled properly, additionally, some improvements to the dbomatic logging would be a nice to have.

Seen with RPMS versions of the released AMI...
[root@ip-10-82-250-97 log]# rpm -qa | grep deltacloud
rubygem-deltacloud-client-0.1.0-2.el6_0.noarch
libdeltacloud-0.7-2.el6_0.x86_64
rubygem-deltacloud-core-0.2.0-5.el6.noarch
[root@ip-10-82-250-97 log]# rpm -qa | grep aeolus
aeolus-conductor-0.2.0-4.el6.x86_64
aeolus-configure-2.0.0-10.el6.noarch
aeolus-conductor-doc-0.2.0-4.el6.x86_64
aeolus-conductor-daemons-0.2.0-4.el6.x86_64


==> deltacloud-ec2-us-east-1/ec2.log <==
Opening new HTTPS connection to ec2.us-east-1.amazonaws.com:443
Opening new HTTPS connection to ec2.us-east-1.amazonaws.com:443
Opening new HTTPS connection to ec2.us-east-1.amazonaws.com:443
##### Aws::Ec2 returned an error: 503 Service Unavailable
<?xml version="1.0" encoding="UTF-8"?>
<Response><Errors><Error><Code>RequestLimitExceeded</Code><Message>Request limit exceeded.</Message></Error></Errors><RequestID>5b1e3fbd-ecb8-4e9d-be74-55b43adeb57f</RequestID></Response> #####
##### Aws::Ec2 request: ec2.us-east-1.amazonaws.com:443/?AWSAccessKeyId=AKIAJI5RB627E2VSWRYQ&Action=DeleteTags&ResourceId.1=i-74c9cb1b&SignatureMethod=HmacSHA256&SignatureVersion=2&Tag.1.Key=name&Timestamp=2011-05-06T15%3A06%3A35.000Z&Version=2010-08-31&Signature=DAuIeGBGABBLe5xYTity0lALDt7DlYnIBel9ZhM5ZEI%3D ####
Opening new HTTPS connection to ec2.us-east-1.amazonaws.com:443
##### Aws::Ec2 returned an error: 503 Service Unavailable
<?xml version="1.0" encoding="UTF-8"?>
<Response><Errors><Error><Code>RequestLimitExceeded</Code><Message>Request limit exceeded.</Message></Error></Errors><RequestID>05af87bf-e247-433c-9806-df64e70d6073</RequestID></Response> #####
======= UNHANDLED EXCEPTION ============
#<Deltacloud::BackendError: RequestLimitExceeded: Request limit exceeded.
REQUEST=ec2.us-east-1.amazonaws.com:443/?AWSAccessKeyId=AKIAJI5RB627E2VSWRYQ&Action=DeleteTags&ResourceId.1=i-74c9cb1b&SignatureMethod=HmacSHA256&SignatureVersion=2&Tag.1.Key=name&Timestamp=2011-05-06T15%3A06%3A35.000Z&Version=2010-08-31&Signature=DAuIeGBGABBLe5xYTity0lALDt7DlYnIBel9ZhM5ZEI%3D
REQUEST ID=5b1e3fbd-ecb8-4e9d-be74-55b43adeb57f >
========================================
##### Aws::Ec2 request: ec2.us-east-1.amazonaws.com:443/?AWSAccessKeyId=AKIAJI5RB627E2VSWRYQ&Action=DeleteTags&ResourceId.1=i-12dfdd7d&SignatureMethod=HmacSHA256&SignatureVersion=2&Tag.1.Key=name&Timestamp=2011-05-06T15%3A06%3A35.000Z&Version=2010-08-31&Signature=3I81LmRtftJPIzXNxYFEEguXlDblGsEmdW%2BWAvEvbrQ%3D ####
======= UNHANDLED EXCEPTION ============
#<Deltacloud::BackendError: RequestLimitExceeded: Request limit exceeded.
REQUEST=ec2.us-east-1.amazonaws.com:443/?AWSAccessKeyId=AKIAJI5RB627E2VSWRYQ&Action=DeleteTags&ResourceId.1=i-12dfdd7d&SignatureMethod=HmacSHA256&SignatureVersion=2&Tag.1.Key=name&Timestamp=2011-05-06T15%3A06%3A35.000Z&Version=2010-08-31&Signature=3I81LmRtftJPIzXNxYFEEguXlDblGsEmdW%2BWAvEvbrQ%3D
REQUEST ID=05af87bf-e247-433c-9806-df64e70d6073 >
========================================
Opening new HTTPS connection to ec2.us-east-1.amazonaws.com:443
Opening new HTTPS connection to ec2warning: peer certificate won't be verified in this SSL session
warning: peer certificate won't be verified in this SSL session
warning: peer certificate won't be verified in this SSL session
warning: peer certificate won't be verified in this SSL session
Deltacloud::BackendError - RequestLimitExceeded: Request limit exceeded.
REQUEST=ec2.us-east-1.amazonaws.com:443/?AWSAccessKeyId=AKIAJI5RB627E2VSWRYQ&Action=DeleteTags&ResourceId.1=i-84cac8eb&SignatureMethod=HmacSHA256&SignatureVersion=2&Tag.1.Key=name&Timestamp=2011-05-06T15%3A06%3A36.000Z&Version=2010-08-31&Signature=K2aaWkVYRBK1hNAkcUjd7dIMAptVw4Frfw74i%2FhDWaU%3D
REQUEST ID=5cd99415-5a2e-4d81-a07f-804fb84e266d :
 ././lib/deltacloud/base_driver/base_driver.rb:241:in `safely'
 ././lib/deltacloud/drivers/ec2/ec2_driver.rb:539:in `delete_unused_tags'
 ././lib/deltacloud/drivers/ec2/ec2_driver.rb:173:in `instances'
 ././lib/deltacloud/base_driver/base_driver.rb:239:in `call'
 ././lib/deltacloud/base_driver/base_driver.rb:239:in `safely'
[root@ip-10-82-250-97 log]# cat deltacloud-ec2-us-east-1/ec2.log | tail -100 -
 /usr/lib/ruby/gems/1.8/gems/rack-1.1.0/lib/rack/methodoverride.rb:24:in `call'
 /usr/lib/ruby/gems/1.8/gems/rack-1.1.0/lib/rack/commonlogger.rb:18:in `call'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:979:in `call'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:1005:in `synchronize'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:979:in `call'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/stats.rb:24:in `call'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/stats.rb:33:in `log'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/stats.rb:24:in `call'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/connection.rb:76:in `pre_process'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/connection.rb:74:in `catch'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/connection.rb:74:in `pre_process'
 /usr/lib/ruby/gems/1.8/gems/eventmachine-0.12.10/lib/eventmachine.rb:1060:in `call'
 /usr/lib/ruby/gems/1.8/gems/eventmachine-0.12.10/lib/eventmachine.rb:1060:in `spawn_threadpool'
 /usr/lib/ruby/gems/1.8/gems/eventmachine-0.12.10/lib/eventmachine.rb:1057:in `initialize'
 /usr/lib/ruby/gems/1.8/gems/eventmachine-0.12.10/lib/eventmachine.rb:1057:in `new'
 /usr/lib/ruby/gems/1.8/gems/eventmachine-0.12.10/lib/eventmachine.rb:1057:in `spawn_threadpool'
 /usr/lib/ruby/gems/1.8/gems/eventmachine-0.12.10/lib/eventmachine.rb:1049:in `defer'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/connection.rb:54:in `process'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/connection.rb:42:in `receive_data'
 /usr/lib/ruby/gems/1.8/gems/eventmachine-0.12.10/lib/eventmachine.rb:256:in `run_machine'
 /usr/lib/ruby/gems/1.8/gems/eventmachine-0.12.10/lib/eventmachine.rb:256:in `run'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/backends/base.rb:57:in `start'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/server.rb:156:in `start'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/controllers/controller.rb:80:in `start'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/runner.rb:177:in `send'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/runner.rb:177:in `run_command'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/runner.rb:143:in `run!'
 /usr/lib/ruby/gems/1.8/gems/deltacloud-core-0.2.0/bin/deltacloudd:123
 /usr/bin/deltacloudd:19:in `load'
 /usr/bin/deltacloudd:19
127.0.0.1 - - [06/May/2011 11:06:36] "DELETE /api/instances/i-b0e8eadf HTTP/1.1" 500 5889 1.6293
Deltacloud::BackendError - RequestLimitExceeded: Request limit exceeded.
REQUEST=ec2.us-east-1.amazonaws.com:443/?AWSAccessKeyId=AKIAJI5RB627E2VSWRYQ&Action=DeleteTags&ResourceId.1=i-b4d1d3db&SignatureMethod=HmacSHA256&SignatureVersion=2&Tag.1.Key=name&Timestamp=2011-05-06T15%3A06%3A36.000Z&Version=2010-08-31&Signature=XseoXEB8bUdDbtccDUYswi4howqmqWZOSX%2FZS3mSrJg%3D 
REQUEST ID=dce4bd65-26f9-4a00-a1bb-209ef9a87533 :
 ././lib/deltacloud/base_driver/base_driver.rb:241:in `safely'
 ././lib/deltacloud/drivers/ec2/ec2_driver.rb:539:in `delete_unused_tags'
 ././lib/deltacloud/drivers/ec2/ec2_driver.rb:173:in `instances'
 ././lib/deltacloud/base_driver/base_driver.rb:239:in `call'
 ././lib/deltacloud/base_driver/base_driver.rb:239:in `safely'
 ././lib/deltacloud/drivers/ec2/ec2_driver.rb:160:in `instances'
 ././lib/deltacloud/base_driver/base_driver.rb:178:in `instance'
 ././lib/deltacloud/helpers/application_helper.rb:95:in `send'
 ././lib/deltacloud/helpers/application_helper.rb:95:in `show'
 /usr/lib/ruby/1.8/benchmark.rb:293:in `measure'
 ././lib/deltacloud/helpers/application_helper.rb:94:in `show'
 ././server.rb:314
 ././lib/sinatra/rabbit.rb:67:in `instance_eval'
 ././lib/sinatra/rabbit.rb:67:in `GET /api/instances/:id'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:863:in `call'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:863:in `route'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:521:in `instance_eval'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:521:in `route_eval'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:500:in `route!'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:497:in `catch'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:497:in `route!'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:476:in `each'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:476:in `route!'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:601:in `dispatch!'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:411:in `call!'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:566:in `instance_eval'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:566:in `invoke'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:566:in `catch'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:566:in `invoke'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:411:in `call!'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:399:in `call'
 ././lib/sinatra/rack_runtime.rb:17:in `call'
 ././lib/sinatra/rack_etag.rb:23:in `call'
 ././lib/sinatra/rack_driver_select.rb:21:in `call'
 /usr/lib/ruby/gems/1.8/gems/rack-accept-0.4.3/lib/rack/accept/context.rb:22:in `call'
 /usr/lib/ruby/gems/1.8/gems/rack-1.1.0/lib/rack/methodoverride.rb:24:in `call'
 /usr/lib/ruby/gems/1.8/gems/rack-1.1.0/lib/rack/commonlogger.rb:18:in `call'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:979:in `call'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:1005:in `synchronize'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:979:in `call'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/stats.rb:24:in `call'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/stats.rb:33:in `log'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/stats.rb:24:in `call'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/connection.rb:76:in `pre_process'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/connection.rb:74:in `catch'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/connection.rb:74:in `pre_process'
 /usr/lib/ruby/gems/1.8/gems/eventmachine-0.12.10/lib/eventmachine.rb:1060:in `call'
 /usr/lib/ruby/gems/1.8/gems/eventmachine-0.12.10/lib/eventmachine.rb:1060:in `spawn_threadpool'
 /usr/lib/ruby/gems/1.8/gems/eventmachine-0.12.10/lib/eventmachine.rb:1057:in `initialize'
 /usr/lib/ruby/gems/1.8/gems/eventmachine-0.12.10/lib/eventmachine.rb:1057:in `new'
 /usr/lib/ruby/gems/1.8/gems/eventmachine-0.12.10/lib/eventmachine.rb:1057:in `spawn_threadpool'
 /usr/lib/ruby/gems/1.8/gems/eventmachine-0.12.10/lib/eventmachine.rb:1049:in `defer'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/connection.rb:54:in `process'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/connection.rb:42:in `receive_data'
 /usr/lib/ruby/gems/1.8/gems/eventmachine-0.12.10/lib/eventmachine.rb:256:in `run_machine'
 /usr/lib/ruby/gems/1.8/gems/eventmachine-0.12.10/lib/eventmachine.rb:256:in `run'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/backends/base.rb:57:in `start'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/server.rb:156:in `start'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/controllers/controller.rb:80:in `start'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/runner.rb:177:in `send'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/runner.rb:177:in `run_command'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/runner.rb:143:in `run!'
 /usr/lib/ruby/gems/1.8/gems/deltacloud-core-0.2.0/bin/deltacloudd:123
 /usr/bin/deltacloudd:19:in `load'
 /usr/bin/deltacloudd:19
127.0.0.1 - - [06/May/2011 11:06:36] "GET /api/instances/i%2D14e8ea7b HTTP/1.1" 500 6060 3.1192

Comment 3 Chris Lalancette 2011-05-06 16:38:06 UTC
(In reply to comment #2)
> So I have been attempting to chase this down and here is what I am seeing...
> 
> dbomatic dies when a unhandled ec2 exception occurs in deltacloud.  In terms of
> seeing any type of indication as to why in the dbomatic logs, there is none. 
> So a couple of things here that I believe need to happen and possibly split out
> into seperate bugs, harden dbomatic to prevent crashing when this occurs, and
> get that exception handled properly, additionally, some improvements to the
> dbomatic logging would be a nice to have.

Good work so far.  So the thing is that dbomatic is only indirectly connected to what happens in deltacloud.  That is, dbomatic is merely looking for changes in /var/log/condor/EventLog.  That file gets updated when "something" happens to a job in condor (start, stop, crash, etc).  So what must be happening is that somehow one of these transitions is causing something to happen in the EventLog, which in turn causes dbomatic to crash.

Dave, next time you see this, can you please also collect /var/log/condor/EventLog* and /var/log/condor/GridmanagerLog*?  That should help us to identify what is causing the dbomatic crash.

Chris Lalancette

Comment 4 Dave Johnson 2011-05-06 18:25:16 UTC
Created attachment 497426 [details]
logs.tgz

Attached are the requested log files, condor/EventLog, condor/Gridmanager*, ec2.log, dbomatic.log zipped up...

Because some of the logs don;t have timestamps, I attempted to tail the logs while printing out the dbomatic process from ps, it may or may not help.  In this case it was a different unhandled exception from ec2 that seemed to trigger the failure...

#  When the dbomatic process isn;t found, it just prints out a date stamp like so, 'Fri May  6 14:00:50 EDT 2011'
#  You will see it dies at 'Fri May  6 14:01:09 EDT 2011'

Fri May  6 14:00:44 EDT 2011	aeolus   20230     1  1 13:58 ?        00:00:01 /usr/bin/ruby /usr/share/aeolus-conductor/dbomatic/dbomatic
Fri May  6 14:00:46 EDT 2011	aeolus   20230     1  1 13:58 ?        00:00:01 /usr/bin/ruby /usr/share/aeolus-conductor/dbomatic/dbomatic
Fri May  6 14:00:48 EDT 2011	aeolus   20230     1  1 13:58 ?        00:00:01 /usr/bin/ruby /usr/share/aeolus-conductor/dbomatic/dbomatic

=> condor/GridmanagerLog.aeolus <==
05/06/11 14:00:50 [21636] GAHP[21797] -> 'R'
05/06/11 14:00:50 [21636] GAHP[21797] -> 'S' '1'
05/06/11 14:00:50 [21636] GAHP[21797] -> '2' 'NULL' 'i-fa350995' 'STOPPED' 'i-f8350997' 'STOPPED' 'i-f6350999' 'STOPPED' 'i-cc585ea3' 'RUNNING' 'i-a00b09cf' 'RUNNING' 'i-840f0deb' 'RUNNING' 'i-d2dfddbd' 'RUNNING' 'i-76606d19' 'RUNNING' 'i-dc606db3' 'STOPPED' 'i-e27b768d' 'STOPPED' 'i-d00b06bf' 'STOPPED' 'i-ae0b06c1' 'RUNNING'
05/06/11 14:00:50 [21636] resource http://localhost:3003/api is now up
05/06/11 14:00:50 [21636] (32.0) doEvaluateState called: gmState GM_CREATE_VM, condorState 1
05/06/11 14:00:50 [21636] GAHP[21797] <- 'DELTACLOUD_VM_SUBMIT 4 http://localhost:3003/api AKIAJI5RB627E2VSWRYQ yUp2VHJQZ3xYkz5LRxxg4OzPHjgnAJxvZrvKxBFC ami-a0d926c9 Condor_localhost_ip-10-82-250-97.ec2.internal#32.0#1304704825 NULL c1.xlarge 7168.0 20.0 1690.0 ec2-east-dj_1304690302_key_70222408432860 NULL'
05/06/11 14:00:50 [21636] GAHP[21797] -> 'S'

==> deltacloud-ec2-us-east-1/ec2.log <==
127.0.0.1 - - [06/May/2011 14:00:50] "GET /api HTTP/1.1" 200 1143 0.0741
Fri May  6 14:00:50 EDT 2011	aeolus   20230     1  1 13:58 ?        00:00:01 /usr/bin/ruby /usr/share/aeolus-conductor/dbomatic/dbomatic
Deltacloud::BackendError - InvalidInstanceID.NotFound: The instance ID 'i-001d106f' does not exist
REQUEST=ec2.us-east-1.amazonaws.com:443/?AWSAccessKeyId=AKIAJI5RB627E2VSWRYQ&Action=CreateTags&ResourceId.1=i-001d106f&SignatureMethod=HmacSHA256&SignatureVersion=2&Tag.1.Key=name&Tag.1.Value=Condor_localhost_ip-10-82-250-97.ec2.internal%2332.0%231304704825&Timestamp=2011-05-06T18%3A00%3A51.000Z&Version=2010-08-31&Signature=qrQUWa6utrhbEUjHEa11QcQu9GVDmhiOOAfML17pRcc%3D 
REQUEST ID=1150d87f-0afb-436e-b542-dd5c65c59c0f :
 ././lib/deltacloud/base_driver/base_driver.rb:241:in `safely'
 ././lib/deltacloud/drivers/ec2/ec2_driver.rb:524:in `tag_instance'
 ././lib/deltacloud/drivers/ec2/ec2_driver.rb:192:in `create_instance'
 ././lib/deltacloud/base_driver/base_driver.rb:239:in `call'
 ././lib/deltacloud/base_driver/base_driver.rb:239:in `safely'
 ././lib/deltacloud/drivers/ec2/ec2_driver.rb:187:in `create_instance'
 ././server.rb:325
 ././lib/sinatra/rabbit.rb:67:in `instance_eval'
 ././lib/sinatra/rabbit.rb:67:in `POST /api/instances'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:863:in `call'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:863:in `route'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:521:in `instance_eval'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:521:in `route_eval'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:500:in `route!'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:497:in `catch'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:497:in `route!'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:476:in `each'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:476:in `route!'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:601:in `dispatch!'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:411:in `call!'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:566:in `instance_eval'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:566:in `invoke'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:566:in `catch'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:566:in `invoke'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:411:in `call!'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:399:in `call'
 ././lib/sinatra/rack_runtime.rb:17:in `call'
 ././lib/sinatra/rack_etag.rb:23:in `call'
 ././lib/sinatra/rack_driver_select.rb:21:in `call'
 /usr/lib/ruby/gems/1.8/gems/rack-accept-0.4.3/lib/rack/accept/context.rb:22:in `call'
 /usr/lib/ruby/gems/1.8/gems/rack-1.1.0/lib/rack/methodoverride.rb:24:in `call'
 /usr/lib/ruby/gems/1.8/gems/rack-1.1.0/lib/rack/commonlogger.rb:18:in `call'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:979:in `call'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:1005:in `synchronize'
 /usr/lib/ruby/gems/1.8/gems/sinatra-1.0/lib/sinatra/base.rb:979:in `call'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/stats.rb:24:in `call'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/stats.rb:33:in `log'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/stats.rb:24:in `call'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/connection.rb:76:in `pre_process'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/connection.rb:74:in `catch'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/connection.rb:74:in `pre_process'
 /usr/lib/ruby/gems/1.8/gems/eventmachine-0.12.10/lib/eventmachine.rb:1060:in `call'
 /usr/lib/ruby/gems/1.8/gems/eventmachine-0.12.10/lib/eventmachine.rb:1060:in `spawn_threadpool'
 /usr/lib/ruby/gems/1.8/gems/eventmachine-0.12.10/lib/eventmachine.rb:1057:in `initialize'
 /usr/lib/ruby/gems/1.8/gems/eventmachine-0.12.10/lib/eventmachine.rb:1057:in `new'
 /usr/lib/ruby/gems/1.8/gems/eventmachine-0.12.10/lib/eventmachine.rb:1057:in `spawn_threadpool'
 /usr/lib/ruby/gems/1.8/gems/eventmachine-0.12.10/lib/eventmachine.rb:1049:in `defer'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/connection.rb:54:in `process'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/connection.rb:42:in `receive_data'
 /usr/lib/ruby/gems/1.8/gems/eventmachine-0.12.10/lib/eventmachine.rb:256:in `run_machine'
 /usr/lib/ruby/gems/1.8/gems/eventmachine-0.12.10/lib/eventmachine.rb:256:in `run'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/backends/base.rb:57:in `start'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/server.rb:156:in `start'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/controllers/controller.rb:80:in `start'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/runner.rb:177:in `send'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/runner.rb:177:in `run_command'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.5/lib/thin/runner.rb:143:in `run!'
 /usr/lib/ruby/gems/1.8/gems/deltacloud-core-0.2.0/bin/deltacloudd:123
 /usr/bin/deltacloudd:19:in `load'
 /usr/bin/deltacloudd:19
127.0.0.1 - - [06/May/2011 14:00:51] "POST /api/instances HTTP/1.1" 500 5688 0.8233

==> condor/GridmanagerLog.aeolus <==
05/06/11 14:00:51 [21636] GAHP[21797] -> 'R'
05/06/11 14:00:51 [21636] GAHP[21797] -> 'S' '1'
05/06/11 14:00:51 [21636] GAHP[21797] -> '4' 'Create_Instance_Failure: InvalidInstanceID.NotFound: The instance ID 'i-001d106f' does not exist'
05/06/11 14:00:51 [21636] GAHP[21797] -> '  REQUEST=ec2.us-east-1.amazonaws.com:443/?AWSAccessKeyId=AKIAJI5RB627E2VSWRYQ&Action=CreateTags&ResourceId.1=i-001d106f&SignatureMethod=HmacSHA256&SignatureVersion=2&Tag.1.Key=name&Tag.1.Value=Condor_localhost_ip-10-82-250-97.ec2.internal%2332.0%231304704825&Timestamp=2011-05-06T18%3A00%3A51.000Z&Version=2010-08-31&Signature=qrQUWa6utrhbEUjHEa11QcQu9GVDmhiOOAfML17pRcc%3D '
05/06/11 14:00:51 [21636] (32.0) doEvaluateState called: gmState GM_CREATE_VM, condorState 1
05/06/11 14:00:51 [21636] (32.0) job submit failed: (null): Create_Instance_Failure: InvalidInstanceID.NotFound: The instance ID 'i-001d106f' does not exist
05/06/11 14:00:51 [21636] (32.0) gm state change: GM_CREATE_VM -> GM_HOLD
05/06/11 14:00:51 [21636] directory_util::rec_touch_file: Creating directory /tmp 
05/06/11 14:00:51 [21636] directory_util::rec_touch_file: Creating directory /tmp/condorLocks 
05/06/11 14:00:51 [21636] directory_util::rec_touch_file: Creating directory /tmp/condorLocks/13 
05/06/11 14:00:51 [21636] directory_util::rec_touch_file: Creating directory /tmp/condorLocks/13/56 
05/06/11 14:00:51 [21636] FileLock object is updating timestamp on: /tmp/condorLocks/13/56/9182961121613650.lockc
05/06/11 14:00:51 [21636] FileLock object is updating timestamp on: /tmp/condorLocks/13/56/9182961121613650.lockc
05/06/11 14:00:51 [21636] FileLock::obtain(1) - @1304704851.236664 lock on /tmp/condorLocks/13/56/9182961121613650.lockc now WRITE
05/06/11 14:00:51 [21636] FileLock::obtain(2) - @1304704851.236769 lock on /tmp/condorLocks/13/56/9182961121613650.lockc now UNLOCKED
05/06/11 14:00:51 [21636] (32.0) Writing hold record to user logfile
05/06/11 14:00:51 [21636] FileLock::obtain(1) - @1304704851.236961 lock on /tmp/condorLocks/13/56/9182961121613650.lockc now WRITE
05/06/11 14:00:51 [21636] directory_util::rec_clean_up: file /tmp/condorLocks/13/56/9182961121613650.lockc has been deleted. 
05/06/11 14:00:51 [21636] Lock file /tmp/condorLocks/13/56/9182961121613650.lockc has been deleted. 
05/06/11 14:00:51 [21636] FileLock::obtain(2) - @1304704851.237173 lock on /tmp/condorLocks/13/56/9182961121613650.lockc now UNLOCKED
05/06/11 14:00:51 [21636] directory_util::rec_touch_file: Creating directory /tmp 
05/06/11 14:00:51 [21636] directory_util::rec_touch_file: Creating directory /tmp/condorLocks 
05/06/11 14:00:51 [21636] directory_util::rec_touch_file: Creating directory /tmp/condorLocks/13 
05/06/11 14:00:51 [21636] directory_util::rec_touch_file: Creating directory /tmp/condorLocks/13/56 
05/06/11 14:00:51 [21636] FileLock object is updating timestamp on: /tmp/condorLocks/13/56/9182961121613650.lockc
05/06/11 14:00:51 [21636] FileLock::obtain(1) - @1304704851.237793 lock on /tmp/condorLocks/13/56/9182961121613650.lockc now WRITE
05/06/11 14:00:51 [21636] FileLock::obtain(2) - @1304704851.237861 lock on /tmp/condorLocks/13/56/9182961121613650.lockc now UNLOCKED
05/06/11 14:00:51 [21636] FileLock::obtain(1) - @1304704851.237919 lock on /tmp/condorLocks/13/56/9182961121613650.lockc now WRITE
05/06/11 14:00:51 [21636] FileLock::obtain(2) - @1304704851.238157 lock on /tmp/condorLocks/13/56/9182961121613650.lockc now UNLOCKED
05/06/11 14:00:51 [21636] FileLock::obtain(1) - @1304704851.238398 lock on /tmp/condorLocks/13/56/9182961121613650.lockc now WRITE
05/06/11 14:00:51 [21636] FileLock::obtain(2) - @1304704851.238607 lock on /tmp/condorLocks/13/56/9182961121613650.lockc now UNLOCKED
05/06/11 14:00:51 [21636] FileLock::obtain(1) - @1304704851.238729 lock on /tmp/condorLocks/13/56/9182961121613650.lockc now WRITE
05/06/11 14:00:51 [21636] directory_util::rec_clean_up: file /tmp/condorLocks/13/56/9182961121613650.lockc has been deleted. 
05/06/11 14:00:51 [21636] Lock file /tmp/condorLocks/13/56/9182961121613650.lockc has been deleted. 
05/06/11 14:00:51 [21636] FileLock::obtain(2) - @1304704851.238985 lock on /tmp/condorLocks/13/56/9182961121613650.lockc now UNLOCKED
05/06/11 14:00:51 [21636] FileLock::obtain(2) - @1304704851.239150 lock on /tmp/condorLocks/13/56/9182961121613650.lockc now UNLOCKED
05/06/11 14:00:51 [21636] directory_util::rec_touch_file: Creating directory /tmp 
05/06/11 14:00:51 [21636] directory_util::rec_touch_file: Creating directory /tmp/condorLocks 
05/06/11 14:00:51 [21636] directory_util::rec_touch_file: Creating directory /tmp/condorLocks/13 
05/06/11 14:00:51 [21636] directory_util::rec_touch_file: Creating directory /tmp/condorLocks/13/56 
05/06/11 14:00:51 [21636] FileLock::obtain(1) - @1304704851.239740 lock on /tmp/condorLocks/13/56/9182961121613650.lockc now WRITE
05/06/11 14:00:51 [21636] directory_util::rec_clean_up: file /tmp/condorLocks/13/56/9182961121613650.lockc has been deleted. 
05/06/11 14:00:51 [21636] Lock file /tmp/condorLocks/13/56/9182961121613650.lockc has been deleted. 
05/06/11 14:00:51 [21636] FileLock::obtain(2) - @1304704851.240004 lock on /tmp/condorLocks/13/56/9182961121613650.lockc now UNLOCKED
05/06/11 14:00:51 [21636] (32.0) gm state change: GM_HOLD -> GM_DELETE

==> condor/EventLog <==
<c>
    <a n="MyType"><s>JobHeldEvent</s></a>
    <a n="Cluster"><i>32</i></a>
    <a n="EventTypeNumber"><i>12</i></a>
    <a n="HoldReasonSubCode"><i>0</i></a>
    <a n="EventTime"><s>2011-05-06T14:00:51</s></a>
    <a n="HoldReason"><s>Create_Instance_Failure: InvalidInstanceID.NotFound: The instance ID 'i-001d106f' does not exist</s></a>
    <a n="Proc"><i>0</i></a>
    <a n="Subproc"><i>0</i></a>
    <a n="HoldReasonCode"><i>0</i></a>
    <a n="CurrentTime"><e>time()</e></a>
</c>
<c>
    <a n="MyType"><s>JobAdInformationEvent</s></a>
    <a n="TriggerEventTypeNumber"><i>12</i></a>
    <a n="DeltacloudUsername"><s>AKIAJI5RB627E2VSWRYQ</s></a>
    <a n="Cmd"><s>job_31_32</s></a>
    <a n="Cluster"><i>32</i></a>
    <a n="EventTypeNumber"><i>28</i></a>
    <a n="JobStatus"><i>5</i></a>
    <a n="GlobalJobId"><s>ip-10-82-250-97.ec2.internal#32.0#1304704825</s></a>
    <a n="HoldReasonSubCode"><i>0</i></a>
    <a n="EventTime"><s>2011-05-06T14:00:51</s></a>
    <a n="TriggerEventTypeName"><s>ULOG_JOB_HELD</s></a>
    <a n="HoldReason"><s>Create_Instance_Failure: InvalidInstanceID.NotFound: The instance ID 'i-001d106f' does not exist</s></a>
    <a n="Proc"><i>0</i></a>
    <a n="Subproc"><i>0</i></a>
    <a n="HoldReasonCode"><i>0</i></a>
    <a n="CurrentTime"><e>time()</e></a>
    <a n="Owner"><s>aeolus</s></a>
</c>

==> condor/GridmanagerLog.aeolus <==

==> condor/EventLog <==

==> condor/GridmanagerLog.aeolus <==
05/06/11 14:00:51 [21636] GAHP[21797] -> '  REQUEST ID=1150d87f-0afb-436e-b542-dd5c65c59c0f'
Fri May  6 14:00:52 EDT 2011	aeolus   20230     1  1 13:58 ?        00:00:02 /usr/bin/ruby /usr/share/aeolus-conductor/dbomatic/dbomatic
05/06/11 14:00:52 [21636] Received ADD_JOBS signal
Fri May  6 14:00:53 EDT 2011	aeolus   20230     1  1 13:58 ?        00:00:03 /usr/bin/ruby /usr/share/aeolus-conductor/dbomatic/dbomatic
05/06/11 14:00:54 [21636] in doContactSchedd()
05/06/11 14:00:54 [21636] querying for new jobs
05/06/11 14:00:54 [21636] Using constraint ((Owner=?="aeolus"&&JobUniverse==9)) && (Managed =!= "ScheddDone") && (Matched =!= FALSE) && (JobStatus != 5) && (Managed =!= "External")
05/06/11 14:00:54 [21636] Using job type Deltacloud for job 33.0
05/06/11 14:00:54 [21636] (33.0) SetJobLeaseTimers()
05/06/11 14:00:54 [21636] (33.0) deltacloud job: hwp_id c1.xlarge, hwp_cpu 20.0, hwp_mem 7168.0, hwp_storage 1690.0
Fri May  6 14:00:54 EDT 2011	aeolus   20230     1  2 13:58 ?        00:00:03 /usr/bin/ruby /usr/share/aeolus-conductor/dbomatic/dbomatic
Fri May  6 14:00:55 EDT 2011	aeolus   20230     1  2 13:58 ?        00:00:04 /usr/bin/ruby /usr/share/aeolus-conductor/dbomatic/dbomatic
Fri May  6 14:00:56 EDT 2011	aeolus   20230     1  2 13:58 ?        00:00:04 /usr/bin/ruby /usr/share/aeolus-conductor/dbomatic/dbomatic
Fri May  6 14:00:57 EDT 2011	aeolus   20230     1  2 13:58 ?        00:00:04 /usr/bin/ruby /usr/share/aeolus-conductor/dbomatic/dbomatic
Fri May  6 14:00:58 EDT 2011	aeolus   20230     1  2 13:58 ?        00:00:04 /usr/bin/ruby /usr/share/aeolus-conductor/dbomatic/dbomatic
Fri May  6 14:00:59 EDT 2011	aeolus   20230     1  2 13:58 ?        00:00:04 /usr/bin/ruby /usr/share/aeolus-conductor/dbomatic/dbomatic
Fri May  6 14:01:00 EDT 2011	aeolus   20230     1  2 13:58 ?        00:00:04 /usr/bin/ruby /usr/share/aeolus-conductor/dbomatic/dbomatic
Fri May  6 14:01:01 EDT 2011	aeolus   20230     1  2 13:58 ?        00:00:04 /usr/bin/ruby /usr/share/aeolus-conductor/dbomatic/dbomatic
Fri May  6 14:01:02 EDT 2011	aeolus   20230     1  2 13:58 ?        00:00:04 /usr/bin/ruby /usr/share/aeolus-conductor/dbomatic/dbomatic
Fri May  6 14:01:03 EDT 2011	aeolus   20230     1  2 13:58 ?        00:00:04 /usr/bin/ruby /usr/share/aeolus-conductor/dbomatic/dbomatic
05/06/11 14:01:04 [21636] Found job 33.0 --- inserting
05/06/11 14:01:04 [21636] Fetched 1 new job ads from schedd
05/06/11 14:01:04 [21636] querying for removed/held jobs
05/06/11 14:01:04 [21636] Using constraint ((Owner=?="aeolus"&&JobUniverse==9)) && ((Managed =!= "ScheddDone")) && (JobStatus == 3 || JobStatus == 4 || (JobStatus == 5 && Managed =?= "External"))
05/06/11 14:01:04 [21636] Fetched 0 job ads from schedd
05/06/11 14:01:04 [21636] Updating classad values for 32.0:
05/06/11 14:01:04 [21636]    EnteredCurrentStatus = 1304704851
05/06/11 14:01:04 [21636]    HoldReason = "Create_Instance_Failure: InvalidInstanceID.NotFound: The instance ID 'i-001d106f' does not exist"
05/06/11 14:01:04 [21636]    HoldReasonCode = 0
05/06/11 14:01:04 [21636]    HoldReasonSubCode = 0
05/06/11 14:01:04 [21636]    JobStatus = 5
05/06/11 14:01:04 [21636]    Managed = "Schedd"
05/06/11 14:01:04 [21636]    NumSystemHolds = 1
05/06/11 14:01:04 [21636]    ReleaseReason = undefined
05/06/11 14:01:04 [21636] leaving doContactSchedd()
05/06/11 14:01:04 [21636] (33.0) doEvaluateState called: gmState GM_INIT, condorState 1
05/06/11 14:01:04 [21636] (33.0) gm state change: GM_INIT -> GM_START
05/06/11 14:01:04 [21636] (33.0) gm state change: GM_START -> GM_CLEAR_REQUEST
05/06/11 14:01:04 [21636] (33.0) gm state change: GM_CLEAR_REQUEST -> GM_UNSUBMITTED
05/06/11 14:01:04 [21636] (33.0) gm state change: GM_UNSUBMITTED -> GM_SAVE_INSTANCE_NAME

==> condor/EventLog <==
<c>
    <a n="MyType"><s>SubmitEvent</s></a>
    <a n="SubmitHost"><s>&lt;10.82.250.97:54702&gt;</s></a>
    <a n="Cluster"><i>34</i></a>
    <a n="EventTypeNumber"><i>0</i></a>
    <a n="EventTime"><s>2011-05-06T14:01:04</s></a>
    <a n="Proc"><i>0</i></a>
    <a n="Subproc"><i>0</i></a>
    <a n="CurrentTime"><e>time()</e></a>
</c>
<c>
    <a n="MyType"><s>JobAdInformationEvent</s></a>
    <a n="TriggerEventTypeNumber"><i>0</i></a>
    <a n="DeltacloudUsername"><s>$$(username)</s></a>
    <a n="Cmd"><s>job_33_34</s></a>
    <a n="SubmitHost"><s>&lt;10.82.250.97:54702&gt;</s></a>
    <a n="Cluster"><i>34</i></a>
    <a n="EventTypeNumber"><i>28</i></a>
    <a n="JobStatus"><i>1</i></a>
    <a n="GlobalJobId"><s>ip-10-82-250-97.ec2.internal#34.0#1304704864</s></a>
    <a n="EventTime"><s>2011-05-06T14:01:04</s></a>
    <a n="TriggerEventTypeName"><s>ULOG_SUBMIT</s></a>
    <a n="Proc"><i>0</i></a>
    <a n="Subproc"><i>0</i></a>
    <a n="CurrentTime"><e>time()</e></a>
    <a n="Owner"><s>aeolus</s></a>
</c>

==> condor/GridmanagerLog.aeolus <==
Fri May  6 14:01:04 EDT 2011	aeolus   20230     1  2 13:58 ?        00:00:04 /usr/bin/ruby /usr/share/aeolus-conductor/dbomatic/dbomatic
Fri May  6 14:01:05 EDT 2011	aeolus   20230     1  3 13:58 ?        00:00:05 /usr/bin/ruby /usr/share/aeolus-conductor/dbomatic/dbomatic
Fri May  6 14:01:06 EDT 2011	aeolus   20230     1  3 13:58 ?        00:00:06 /usr/bin/ruby /usr/share/aeolus-conductor/dbomatic/dbomatic
Fri May  6 14:01:08 EDT 2011	aeolus   20230     1  3 13:58 ?        00:00:07 /usr/bin/ruby /usr/share/aeolus-conductor/dbomatic/dbomatic
05/06/11 14:01:09 [21636] in doContactSchedd()
05/06/11 14:01:09 [21636] querying for removed/held jobs
05/06/11 14:01:09 [21636] Using constraint ((Owner=?="aeolus"&&JobUniverse==9)) && ((Managed =!= "ScheddDone")) && (JobStatus == 3 || JobStatus == 4 || (JobStatus == 5 && Managed =?= "External"))
05/06/11 14:01:09 [21636] Fetched 0 job ads from schedd
05/06/11 14:01:09 [21636] Updating classad values for 33.0:
05/06/11 14:01:09 [21636]    GridJobId = "deltacloud Condor_localhost_ip-10-82-250-97.ec2.internal#33.0#1304704844"
05/06/11 14:01:09 [21636]    LastRemoteStatusUpdate = 1304704864
05/06/11 14:01:09 [21636] leaving doContactSchedd()
05/06/11 14:01:09 [21636] (33.0) doEvaluateState called: gmState GM_SAVE_INSTANCE_NAME, condorState 1
05/06/11 14:01:09 [21636] (33.0) gm state change: GM_SAVE_INSTANCE_NAME -> GM_CREATE_VM
05/06/11 14:01:09 [21636] GAHP[21797] <- 'DELTACLOUD_VM_SUBMIT 5 http://localhost:3003/api AKIAJI5RB627E2VSWRYQ yUp2VHJQZ3xYkz5LRxxg4OzPHjgnAJxvZrvKxBFC ami-a0d926c9 Condor_localhost_ip-10-82-250-97.ec2.internal#33.0#1304704844 NULL c1.xlarge 7168.0 20.0 1690.0 ec2-east-dj_1304690302_key_70222408432860 NULL'
05/06/11 14:01:09 [21636] GAHP[21797] -> 'S'

==> deltacloud-ec2-us-east-1/ec2.log <==
127.0.0.1 - - [06/May/2011 14:01:09] "GET /api HTTP/1.1" 200 1143 0.0059
Fri May  6 14:01:09 EDT 2011	
 Aws::Ec2 using per_request-connection mode
New Aws::Ec2 using per_request-connection mode
New Aws::Ec2 using per_request-connection mode
New Aws::Ec2 using per_request-connection mode
New Aws::Ec2 using per_request-connection mode
New Aws::Ec2 using per_request-connection mode
New Aws::Ec2 using per_request-connection mode
New Aws::Ec2 using per_request-connection mode
Launching instance of image ami-a0d926c9 for AKIAJI5RB627E2VSWRYQ, key: ec2-east-dj_1304690302_key_70222408432860, groups: 
New Aws::Ec2 using per_request-connection mode
##### Aws::Ec2 returned an error: 400 Bad Request
<?xml version="1.0" encoding="UTF-8"?>
<Response><Errors><Error><Code>InvalidInstanceID.NotFound</Code><Message>The instance ID 'i-001d106f' does not exist</Message></Error></Errors><RequestID>1150d87f-0afb-436e-b542-dd5c65c59c0f</RequestID></Response> #####
##### Aws::Ec2 request: ec2.us-east-1.amazonaws.com:443/?AWSAccessKeyId=AKIAJI5RB627E2VSWRYQ&Action=CreateTags&ResourceId.1=i-001d106f&SignatureMethod=HmacSHA256&SignatureVersion=2&Tag.1.Key=name&Tag.1.Value=Condor_localhost_ip-10-82-250-97.ec2.internal%2332.0%231304704825&Timestamp=2011-05-06T18%3A00%3A51.000Z&Version=2010-08-31&Signature=qrQUWa6utrhbEUjHEa11QcQu9GVDmhiOOAfML17pRcc%3D ####
======= UNHANDLED EXCEPTION ============
#<Deltacloud::BackendError: InvalidInstanceID.NotFound: The instance ID 'i-001d106f' does not exist
REQUEST=ec2.us-east-1.amazonaws.com:443/?AWSAccessKeyId=AKIAJI5RB627E2VSWRYQ&Action=CreateTags&ResourceId.1=i-001d106f&SignatureMethod=HmacSHA256&SignatureVersion=2&Tag.1.Key=name&Tag.1.Value=Condor_localhost_ip-10-82-250-97.ec2.internal%2332.0%231304704825&Timestamp=2011-05-06T18%3A00%3A51.000Z&Version=2010-08-31&Signature=qrQUWa6utrhbEUjHEa11QcQu9GVDmhiOOAfML17pRcc%3D 
REQUEST ID=1150d87f-0afb-436e-b542-dd5c65c59c0f >
========================================
New Aws::Ec2 using per_request-connection mode
New Aws::Ec2 using per_request-connection mode
Launching instance of image ami-a0d926c9 for AKIAJI5RB627E2VSWRYQ, key: ec2-east-dj_1304690302_key_70222408432860, groups: 
Fri May  6 14:01:10 EDT 2011	
127.0.0.1 - - [06/May/2011 14:01:10] "POST /api/instances HTTP/1.1" 201 874 1.2510

==> condor/GridmanagerLog.aeolus <==
05/06/11 14:01:10 [21636] GAHP[21797] -> 'R'
05/06/11 14:01:10 [21636] GAHP[21797] -> 'S' '1'
05/06/11 14:01:10 [21636] GAHP[21797] -> '5' 'NULL' 'id=i-ac1d10c3' 'state=PENDING' 'actions=stop' 'public_addresses=' 'private_addresses='
05/06/11 14:01:10 [21636] (33.0) doEvaluateState called: gmState GM_CREATE_VM, condorState 1
05/06/11 14:01:10 [21636] directory_util::rec_touch_file: Creating directory /tmp 
05/06/11 14:01:10 [21636] directory_util::rec_touch_file: Creating directory /tmp/condorLocks 
05/06/11 14:01:10 [21636] directory_util::rec_touch_file: Creating directory /tmp/condorLocks/13 
05/06/11 14:01:10 [21636] directory_util::rec_touch_file: Creating directory /tmp/condorLocks/13/56 
05/06/11 14:01:10 [21636] FileLock object is updating timestamp on: /tmp/condorLocks/13/56/9182961121613650.lockc
05/06/11 14:01:10 [21636] FileLock object is updating timestamp on: /tmp/condorLocks/13/56/9182961121613650.lockc
05/06/11 14:01:10 [21636] FileLock::obtain(1) - @1304704870.278845 lock on /tmp/condorLocks/13/56/9182961121613650.lockc now WRITE
05/06/11 14:01:10 [21636] FileLock::obtain(2) - @1304704870.278918 lock on /tmp/condorLocks/13/56/9182961121613650.lockc now UNLOCKED
05/06/11 14:01:10 [21636] (33.0) Writing grid submit record to user logfile
05/06/11 14:01:10 [21636] FileLock::obtain(1) - @1304704870.279086 lock on /tmp/condorLocks/13/56/9182961121613650.lockc now WRITE
05/06/11 14:01:10 [21636] directory_util::rec_clean_up: file /tmp/condorLocks/13/56/9182961121613650.lockc has been deleted. 
05/06/11 14:01:10 [21636] Lock file /tmp/condorLocks/13/56/9182961121613650.lockc has been deleted. 
05/06/11 14:01:10 [21636] FileLock::obtain(2) - @1304704870.279300 lock on /tmp/condorLocks/13/56/9182961121613650.lockc now UNLOCKED
05/06/11 14:01:10 [21636] directory_util::rec_touch_file: Creating directory /tmp 
05/06/11 14:01:10 [21636] directory_util::rec_touch_file: Creating directory /tmp/condorLocks 
05/06/11 14:01:10 [21636] directory_util::rec_touch_file: Creating directory /tmp/condorLocks/13 
05/06/11 14:01:10 [21636] directory_util::rec_touch_file: Creating directory /tmp/condorLocks/13/56 
05/06/11 14:01:10 [21636] FileLock object is updating timestamp on: /tmp/condorLocks/13/56/9182961121613650.lockc
05/06/11 14:01:10 [21636] FileLock::obtain(1) - @1304704870.279875 lock on /tmp/condorLocks/13/56/9182961121613650.lockc now WRITE
05/06/11 14:01:10 [21636] FileLock::obtain(2) - @1304704870.279942 lock on /tmp/condorLocks/13/56/9182961121613650.lockc now UNLOCKED
05/06/11 14:01:10 [21636] FileLock::obtain(1) - @1304704870.280003 lock on /tmp/condorLocks/13/56/9182961121613650.lockc now WRITE

==> condor/EventLog <==
<c>
    <a n="MyType"><s>GridSubmitEvent</s></a>
    <a n="Cluster"><i>33</i></a>
    <a n="EventTypeNumber"><i>27</i></a>
    <a n="GridResource"><s>deltacloud http://localhost:3003/api</s></a>
    <a n="EventTime"><s>2011-05-06T14:01:10</s></a>
    <a n="GridJobId"><s>deltacloud Condor_localhost_ip-10-82-250-97.ec2.internal#33.0#1304704844 i-ac1d10c3</s></a>
    <a n="Proc"><i>0</i></a>
    <a n="Subproc"><i>0</i></a>
    <a n="CurrentTime"><e>time()</e></a>
</c>

==> condor/GridmanagerLog.aeolus <==
05/06/11 14:01:10 [21636] FileLock::obtain(2) - @1304704870.280222 lock on /tmp/condorLocks/13/56/9182961121613650.lockc now UNLOCKED
05/06/11 14:01:10 [21636] FileLock::obtain(1) - @1304704870.280469 lock on /tmp/condorLocks/13/56/9182961121613650.lockc now WRITE

==> condor/EventLog <==
<c>
    <a n="MyType"><s>JobAdInformationEvent</s></a>
    <a n="DeltacloudPublicNetworkAddresses"><s></s></a>
    <a n="TriggerEventTypeNumber"><i>27</i></a>
    <a n="DeltacloudUsername"><s>AKIAJI5RB627E2VSWRYQ</s></a>
    <a n="Cmd"><s>job_32_33</s></a>
    <a n="Cluster"><i>33</i></a>
    <a n="EventTypeNumber"><i>28</i></a>
    <a n="JobStatus"><i>1</i></a>
    <a n="GridResource"><s>deltacloud http://localhost:3003/api</s></a>
    <a n="DeltacloudProviderId"><s>i-ac1d10c3</s></a>
    <a n="GlobalJobId"><s>ip-10-82-250-97.ec2.internal#33.0#1304704844</s></a>
    <a n="EventTime"><s>2011-05-06T14:01:10</s></a>
    <a n="DeltacloudPrivateNetworkAddresses"><s></s></a>
    <a n="TriggerEventTypeName"><s>ULOG_GRID_SUBMIT</s></a>
    <a n="GridJobId"><s>deltacloud Condor_localhost_ip-10-82-250-97.ec2.internal#33.0#1304704844 i-ac1d10c3</s></a>
    <a n="Proc"><i>0</i></a>
    <a n="Subproc"><i>0</i></a>
    <a n="DeltacloudAvailableActions"><s>stop</s></a>
    <a n="CurrentTime"><e>time()</e></a>
    <a n="Owner"><s>aeolus</s></a>
</c>

==> condor/GridmanagerLog.aeolus <==
05/06/11 14:01:10 [21636] FileLock::obtain(2) - @1304704870.280741 lock on /tmp/condorLocks/13/56/9182961121613650.lockc now UNLOCKED
05/06/11 14:01:10 [21636] FileLock::obtain(1) - @1304704870.280846 lock on /tmp/condorLocks/13/56/9182961121613650.lockc now WRITE
05/06/11 14:01:10 [21636] directory_util::rec_clean_up: file /tmp/condorLocks/13/56/9182961121613650.lockc has been deleted. 
05/06/11 14:01:10 [21636] Lock file /tmp/condorLocks/13/56/9182961121613650.lockc has been deleted. 
05/06/11 14:01:10 [21636] FileLock::obtain(2) - @1304704870.281201 lock on /tmp/condorLocks/13/56/9182961121613650.lockc now UNLOCKED
05/06/11 14:01:10 [21636] FileLock::obtain(2) - @1304704870.281397 lock on /tmp/condorLocks/13/56/9182961121613650.lockc now UNLOCKED
05/06/11 14:01:10 [21636] directory_util::rec_touch_file: Creating directory /tmp 
05/06/11 14:01:10 [21636] directory_util::rec_touch_file: Creating directory /tmp/condorLocks 
05/06/11 14:01:10 [21636] directory_util::rec_touch_file: Creating directory /tmp/condorLocks/13 
05/06/11 14:01:10 [21636] directory_util::rec_touch_file: Creating directory /tmp/condorLocks/13/56 
05/06/11 14:01:10 [21636] FileLock::obtain(1) - @1304704870.282158 lock on /tmp/condorLocks/13/56/9182961121613650.lockc now WRITE
05/06/11 14:01:10 [21636] directory_util::rec_clean_up: file /tmp/condorLocks/13/56/9182961121613650.lockc has been deleted. 
05/06/11 14:01:10 [21636] Lock file /tmp/condorLocks/13/56/9182961121613650.lockc has been deleted. 
05/06/11 14:01:10 [21636] FileLock::obtain(2) - @1304704870.282513 lock on /tmp/condorLocks/13/56/9182961121613650.lockc now UNLOCKED
05/06/11 14:01:10 [21636] (33.0) gm state change: GM_CREATE_VM -> GM_SAVE_INSTANCE_ID
05/06/11 14:01:10 [21636] (33.0) doEvaluateState called: gmState GM_SAVE_INSTANCE_ID, condorState 1
Fri May  6 14:01:11 EDT 2011	
05/06/11 14:01:12 [21636] Received ADD_JOBS signal
Fri May  6 14:01:12 EDT 2011

Comment 5 Michal Fojtik 2011-05-09 15:43:58 UTC
Created a JIRA ticket for Deltacloud API. (DTACLOUD-39)

Comment 6 Dave Johnson 2011-05-09 16:07:46 UTC
So as far as reproduction steps, its pretty inconsistent but this is what I did this morning.

1. Launch the cloudEngine AMI, RHEL-6.1-CloudEngine-x86_64-8-Access (ami-a0d926c9)
2. add a provider account
3. import a ami into cloud engine
4. launch 5-10 instances
5. if no dbomatic crash occurs, stop all instances
6. if no dbomatic crash occurs, launch 5-10 instances
7. repeat steps 5 and 6 as needed.

Please note, that dbomatic crash can occur during instance creating and stopping instances, it is purely random.  Both cases are logged in the previous comment 2 (delete) and comment 4 (create).

Comment 7 Dave Johnson 2011-05-19 22:28:58 UTC
Created attachment 499962 [details]
conductor & condor logs

so I added the following to dbomatic script to print out the events that it is parsing...

def characters(string)
    unless string.strip == ""
      @logger.info "character string - #{string}"     #  <----------------
      if @tag == "MyType"
        @event_type = string


At this point I restart dbomatic and start 10 images, if dbomatic crashed then I marked it in the dbomatic.log similar to the following and restarted.  then I would stop the images, if dbomatic crashed then I would update the log again and do it all over again.

******************************************************************************************
*******   dbomatic died here on stopping multiple images (seemed to crash on the first), manually restarting to try again...  *********
******************************************************************************************

This was the last message I added to the log, I restarted dbomatic and it finished printing out a bunch of events about request exceeded limit, all of the instances went to error on the restart but dbomatic was still running at the end.

Comment 8 Dave Johnson 2011-05-20 02:53:43 UTC
So I did more debugging around the dbomatic crash and here is what I got...First I had to add a ton of log messages and isolated it down to this code block from dbomatic script.  

notifier.watch(condor_event_log_dir, :all_events){ |event|
  logger.info("inside notifier.watch, checking #{event.name}...")
  if event.name == "EventLog" && event.flags.include?(:modify)
     parse_log_file parser
  end
  logger.info("notifier.watch, finished checking #{event.name}...")
}

I surrounded this code with a begin/rescue code block but that revealed nothing, the dbomatic log file still showed the process silently ended with a bunch of these messages:

    inside notifier.watch, checking GridmanagerLog.aeolus...
    notifier.watch, finished checking GridmanagerLog.aeolus...

So it seems that this code block runs out of events and dies?  It never reaches the next lines in the script which are:

  while true
    begin
      logger.info("starting notifier.run...")     # dajo added
      notifier.run
      logger.info("finishing notifier.run...")    # dajo added
    rescue => e

So I am at a lost as what the next step is but I believe it is related to notifier.watch.  Looking at its rdoc, its suppose to raise a SystemCallError if something bad happens but my begin/rescue didn't catch anything.

Comment 9 Mo Morsi 2011-05-20 03:44:46 UTC
notifier.watch is a callback that gets invoked whenever there is a
change to the condor log directory.

A begin/rescue around the call to notifier.watch won't necessarily catch
the error because you will be wrapping the registration of the notifier
handler itself and not the invocation of that handler.

Rather if inside the notifier.watch handler you surrounded the code with
a begin/rescue all, you might have some more luck (be sure to print out
the backtrace).

Comment 10 Dave Johnson 2011-05-20 14:51:17 UTC
comment 9 mentions to surround the lines within notifier.watch with a begin/rescue but that didn't reveal anything which is consistent with the previous log output:  

    inside notifier.watch, checking GridmanagerLog.aeolus...
    notifier.watch, finished checking GridmanagerLog.aeolus...


It starts checking a event and comes back to print the finished log message.  Its as if that notifier.watch dries up (under heavy use perhaps?).  

Here is what I tried this time:
    notifier.watch(condor_event_log_dir, :all_events){ |event|
        begin
          logger.info("inside notifier.watch, checking #{event.name}...")
          if event.name == "EventLog" && event.flags.include?(:modify)
            logger.info("starting parse_log_file-parser inside notifier.watch...")
            parse_log_file parser
            logger.info("finished parse_log_file-parser inside notifier.watch...")
          end
          logger.info("notifier.watch, finished checking #{event.name}...")
        rescue => e
          logger.error "#{e.backtrace.shift}: #{e.message}"
          e.backtrace.each do |step|
            logger.error "\tfrom #{step}"
          end
        end
    }

Comment 11 Mo Morsi 2011-05-20 16:08:24 UTC
What is the result of this last round of testing w/ this output?

Just sent a patch for extended dbomatic logging to the list. Would try  to build a conductor rpm with that and running the tests. Hopefully this will give us a more detailed look at what is failing where.


https://fedorahosted.org/pipermail/aeolus-devel/2011-May/001635.html

Comment 12 Dave Johnson 2011-05-20 17:03:19 UTC
The same as comment 8, the log ended with a bunch of the following:

    inside notifier.watch, checking GridmanagerLog.aeolus...
    notifier.watch, finished checking GridmanagerLog.aeolus...

Comment 13 Dave Johnson 2011-05-20 19:28:42 UTC
I pulled the patch down from the mailing this, moved it into space, and gave it a whirl.  There are several lines that are throwing exceptions on a bad logger object, I commented out several and gave up on it.  Below is a snippet of the end of the log after the process disappears.

Feel free to use this server and attempt to bring the issue to a resolution.  10.16.120.131


EventLog modification event triggered, parsing log
EventLog modification event trigger completed, parsing finished - current position 301180
EventLog modification event triggered, parsing log
Unexpected nil instance, skipping...
EventLog modification event trigger completed, parsing finished - current position 301860
EventLog modification event triggered, parsing log
EventLog modification event trigger completed, parsing finished - current position 302350
EventLog modification event triggered, parsing log
update_instance_state_event for #<Instance:0x7fd9a448b6f8>
Unexpected nil GridResource/ExecuteHost field, skipping cloud id update
EventLog modification event trigger completed, parsing finished - current position 303183

Comment 14 Dave Johnson 2011-05-25 20:18:17 UTC
Mo passed me some new RPMS to test which include a bunch of additional logging and with it, these two similar stack traces are print multiple times in the dbomatic-parser.log:

Feel free to log into 10.16.120.131 to take a peek...

Instance #<Instance:0x7f4b0de78330> found, running update events
update_instance_state_event for #<Instance:0x7f4b0de78330>
update_instance_state_event saving instance #<Instance:0x7f4b0de78330>
/usr/lib/ruby/gems/1.8/gems/activerecord-2.3.8/lib/active_record/attribute_methods.rb:264:in `method_missing': undefined method `stat' for #<Instance:0x7f4b0de78330>
        from /usr/share/aeolus-conductor/dbomatic/dbomatic:173:in `update_instance_state_event'
        from /usr/share/aeolus-conductor/dbomatic/dbomatic:253:in `end_element'
        from /usr/lib/ruby/gems/1.8/gems/nokogiri-1.4.3.1/lib/nokogiri/xml/sax/document.rb:125:in `end_element_namespace'
        from /usr/lib/ruby/gems/1.8/gems/nokogiri-1.4.3.1/lib/nokogiri/xml/sax/push_parser.rb:47:in `native_write'
        from /usr/lib/ruby/gems/1.8/gems/nokogiri-1.4.3.1/lib/nokogiri/xml/sax/push_parser.rb:47:in `<<'
        from /usr/share/aeolus-conductor/dbomatic/dbomatic:313:in `parse_log_file'
        from /usr/share/aeolus-conductor/dbomatic/dbomatic:337
update_instance_state_event completed fo #<Instance:0x7f4b0de78330>
update_instance_cloud_id for #<Instance:0x7f4b0de78330>


Instance #<Instance:0x7f4b0de5c108> found, running update events
update_instance_state_event for #<Instance:0x7f4b0de5c108>
update_instance_state_event saving instance #<Instance:0x7f4b0de5c108>
/usr/lib/ruby/gems/1.8/gems/activerecord-2.3.8/lib/active_record/attribute_methods.rb:264:in `method_missing': undefined method `stat' for #<Instance:0x7f4b0de5c108>
        from /usr/share/aeolus-conductor/dbomatic/dbomatic:173:in `update_instance_state_event'
        from /usr/share/aeolus-conductor/dbomatic/dbomatic:253:in `end_element'
        from /usr/lib/ruby/gems/1.8/gems/nokogiri-1.4.3.1/lib/nokogiri/xml/sax/document.rb:125:in `end_element_namespace'
        from /usr/lib/ruby/gems/1.8/gems/nokogiri-1.4.3.1/lib/nokogiri/xml/sax/push_parser.rb:47:in `native_write'
        from /usr/lib/ruby/gems/1.8/gems/nokogiri-1.4.3.1/lib/nokogiri/xml/sax/push_parser.rb:47:in `<<'
        from /usr/share/aeolus-conductor/dbomatic/dbomatic:313:in `parse_log_file'
        from /usr/share/aeolus-conductor/dbomatic/dbomatic:351
        from /usr/lib/ruby/gems/1.8/gems/rb-inotify-0.8.1/lib/rb-inotify/watcher.rb:41:in `[]'
        from /usr/lib/ruby/gems/1.8/gems/rb-inotify-0.8.1/lib/rb-inotify/watcher.rb:41:in `callback!'
        from /usr/lib/ruby/gems/1.8/gems/rb-inotify-0.8.1/lib/rb-inotify/event.rb:128:in `callback!'
        from /usr/lib/ruby/gems/1.8/gems/rb-inotify-0.8.1/lib/rb-inotify/notifier.rb:215:in `process'
        from /usr/lib/ruby/gems/1.8/gems/rb-inotify-0.8.1/lib/rb-inotify/notifier.rb:215:in `each'
        from /usr/lib/ruby/gems/1.8/gems/rb-inotify-0.8.1/lib/rb-inotify/notifier.rb:215:in `process'
        from /usr/lib/ruby/gems/1.8/gems/rb-inotify-0.8.1/lib/rb-inotify/notifier.rb:198:in `run'
        from /usr/share/aeolus-conductor/dbomatic/dbomatic:357
update_instance_state_event completed fo #<Instance:0x7f4b0de5c108>
update_instance_cloud_id for #<Instance:0x7f4b0de5c108>

Comment 15 wes hayutin 2011-05-31 15:18:38 UTC
instances in various states..
condor_rm all jobs in condor..
then tried to remove failed jobs in conductor ui


-- Submitter: dhcp-2-205.brq.redhat.com : <10.34.2.205:41258> : dhcp-2-205.brq.redhat.com
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
   1.0   aeolus          5/27 19:11   0+00:00:00 X  0   0.0  job_ce12_3        
   2.0   aeolus          5/27 14:18   0+00:00:00 X  0   0.0  job_ce20_4        
   3.0   aeolus          5/27 15:04   0+00:00:00 X  0   0.0  job_ce30_5        
 150.0   aeolus          5/27 23:50   3+16:10:37 X  0   0.0  job_instancep861l_
 151.0   aeolus          5/28 00:03   0+00:00:00 X  0   0.0  job_instancemanyp_
 152.0   aeolus          5/28 00:04   0+00:00:00 X  0   0.0  job_instancerfqjo_
 153.0   aeolus          5/28 00:10   0+00:00:00 X  0   0.0  job_instance9vd87_
 154.0   aeolus          5/28 00:21   0+00:00:00 X  0   0.0  job_instancelud5s_
 155.0   aeolus          5/28 00:21   0+00:00:00 X  0   0.0  job_instancejk78f_
 156.0   aeolus          5/28 00:22   0+00:00:00 X  0   0.0  job_instancedj05a_
 157.0   aeolus          5/28 00:23   0+00:00:00 X  0   0.0  job_instanceggjej_
 158.0   aeolus          5/28 00:23   0+00:00:00 X  0   0.0  job_instancewpr83_
 159.0   aeolus          5/28 00:24   0+00:00:00 X  0   0.0  job_instances1llc_
 160.0   aeolus          5/28 00:24   0+00:00:00 X  0   0.0  job_instancejjj60_
 161.0   aeolus          5/28 00:25   0+00:00:00 X  0   0.0  job_instanceu53jc_
 162.0   aeolus          5/28 00:25   0+00:00:00 X  0   0.0  job_instancee68i1_
 163.0   aeolus          5/28 00:26   0+00:00:00 X  0   0.0  job_instancef84kr_
 164.0   aeolus          5/28 00:26   0+00:00:00 X  0   0.0  job_instancexulkq_
 165.0   aeolus          5/28 00:27   0+00:00:00 X  0   0.0  job_instancejydh5_
 166.0   aeolus          5/28 00:27   0+00:00:00 X  0   0.0  job_instance9ixbh_
 167.0   aeolus          5/28 00:28   0+00:00:00 X  0   0.0  job_instanceg855o_
 168.0   aeolus          5/28 00:28   0+00:00:00 X  0   0.0  job_instancefmn7u_
 169.0   aeolus          5/31 16:30   0+00:00:00 X  0   0.0  job_instance4wmyl_
 170.0   aeolus          5/31 16:43   0+00:00:00 X  0   0.0  job_instancevcstw_

0 jobs; 0 idle, 0 running, 0 held
[root@dhcp-2-205 tmp]# tail -f /var/log/aeolus-conductor/dbomatic-parser.log
2011-05-31 17:05:26 INFO Instance instancee68i1 update events completed
2011-05-31 17:05:26 INFO Instance instancee68i1 found, running update events
2011-05-31 17:05:26 INFO update_instance_state_event for instancee68i1
2011-05-31 17:05:26 WARN Unexpected trigger type ULOG_GRID_RESOURCE_DOWN, not updating instance state
2011-05-31 17:05:26 INFO update_instance_cloud_id for instancee68i1
2011-05-31 17:05:26 INFO update_instance_cloud_id updating instance instancee68i1 to cloud provider ec2-us-east-1
2011-05-31 17:05:26 INFO update_instance_cloud_id completed for instancee68i1
2011-05-31 17:05:26 INFO update_instance_addresses for instancee68i1,                   setting public addresses:                    --- and private addresses 
2011-05-31 17:05:26 INFO update_instance_addresses completed for instancee68i1
2011-05-31 17:05:26 INFO Instance instancee68i1 update events completed
2011-05-31 17:13:34 INFO Instance instancedj05a found, running update events
2011-05-31 17:13:34 INFO update_instance_state_event for instancedj05a
2011-05-31 17:13:34 INFO update_instance_state_event saving instance instancedj05a
2011-05-31 17:13:34 DEBUG updated_instance_state_event saved instance instancedj05a, creating event for state stopped@2011-05-31T17:13:34
2011-05-31 17:13:34 INFO update_instance_state_event completed fo instancedj05a
2011-05-31 17:13:34 INFO update_instance_cloud_id for instancedj05a
2011-05-31 17:13:34 WARN Unexpected nil GridResource/ExecuteHost field, skipping cloud id update
2011-05-31 17:13:34 INFO update_instance_addresses for instancedj05a,                   setting public addresses:                    --- and private addresses 
2011-05-31 17:13:34 INFO update_instance_addresses completed for instancedj05a
2011-05-31 17:13:34 INFO Instance instancedj05a update events completed
2011-05-31 17:13:34 INFO Instance instancee68i1 found, running update events
2011-05-31 17:13:34 INFO update_instance_state_event for instancee68i1
2011-05-31 17:13:34 INFO update_instance_state_event saving instance instancee68i1
2011-05-31 17:13:34 DEBUG updated_instance_state_event saved instance instancee68i1, creating event for state stopped@2011-05-31T17:13:34
2011-05-31 17:13:34 INFO update_instance_state_event completed fo instancee68i1
2011-05-31 17:13:34 INFO update_instance_cloud_id for instancee68i1
2011-05-31 17:13:34 WARN Unexpected nil GridResource/ExecuteHost field, skipping cloud id update
2011-05-31 17:13:34 INFO update_instance_addresses for instancee68i1,                   setting public addresses:                    --- and private addresses 
2011-05-31 17:13:34 INFO update_instance_addresses completed for instancee68i1
2011-05-31 17:13:34 INFO Instance instancee68i1 update events completed

Comment 16 wes hayutin 2011-05-31 18:46:02 UTC
resolved the issue in comment 15 to be from a crashed but running deltacloud ec2 driver

Comment 17 wes hayutin 2011-06-16 19:42:36 UTC
no commit messages. :(  moving to on_qa because we know this is fixed

Comment 18 wes hayutin 2011-06-16 19:43:08 UTC
should be retested because dbomatic currently does not start correctly after install

Comment 19 wes hayutin 2011-06-23 21:40:49 UTC
verified 

[root@ibm-x3950m2-01 noarch]# rpm -qa | grep aeolus
aeolus-conductor-0.3.0-0.el6.20110623205403git551632a.noarch
aeolus-configure-2.0.1-0.el6.20110622123902gitdf4ae05.noarch
aeolus-conductor-doc-0.3.0-0.el6.20110623205403git551632a.noarch
aeolus-all-0.3.0-0.el6.20110623205403git551632a.noarch
rubygem-aeolus-cli-0.0.1-1.el6.20110623205403git551632a.noarch
aeolus-conductor-daemons-0.3.0-0.el6.20110623205403git551632a.noarch
aeolus-conductor-devel-0.3.0-0.el6.20110623205403git551632a.noarch

Comment 20 wes hayutin 2011-08-01 20:02:30 UTC
release pending...

Comment 21 wes hayutin 2011-08-01 20:02:49 UTC
release pending 3

Comment 22 wes hayutin 2011-08-01 20:02:52 UTC
release pending...

Comment 23 wes hayutin 2011-08-01 20:02:57 UTC
release pending.. 2

Comment 25 wes hayutin 2011-12-08 14:02:29 UTC
perm close

Comment 26 wes hayutin 2011-12-08 14:05:08 UTC
closing out old bugs


Note You need to log in before you can comment on or make changes to this bug.