Created attachment 495025 [details] ami listed in ec2 us-west Note: I'll be trying to recreate this as I am not sure how repeatable this is. -- Submitter: hp-xw8600-01.rhts.eng.bos.redhat.com : <10.16.65.43:47121> : hp-xw8600-01.rhts.eng.bos.redhat.com ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 3.0 aeolus 4/26 15:24 0+00:16:58 R 0 0.0 job_test2_2 4.0 aeolus 4/26 15:50 0+00:00:00 H 0 0.0 job_test03_3 2 jobs; 0 idle, 1 running, 1 held [root@hp-xw8600-01 ~]# condor_q -better -- Submitter: hp-xw8600-01.rhts.eng.bos.redhat.com : <10.16.65.43:47121> : hp-xw8600-01.rhts.eng.bos.redhat.com --- 003.000: Request is being serviced --- 004.000: Request is held. Hold reason: Create_Instance_Failure: InvalidAMIID.NotFound: The AMI ID 'ami-51693a14' does not exist [root@hp-xw8600-01 ~]# [root@hp-xw8600-01 ~]# cat /var/log/imagefactory.log | grep ami-51693a14 2011-04-26 13:58:20,016 DEBUG imagefactory.builders.BaseBuilder.FedoraBuilder pid(21666) Message: Register output: IMAGE ami-51693a14 2011-04-26 13:58:20,016 DEBUG imagefactory.builders.BaseBuilder.FedoraBuilder pid(21666) Message: Extracted AMI ID: ami-51693a14 2011-04-26 13:58:20,020 DEBUG imagefactory.ImageWarehouse.ImageWarehouse pid(21666) Message: Setting metadata ({'provider': 'ec2-us-west-1', 'uuid': '259bb81e-5f8e-4b30-8f06-55a7339bbc15', 'icicle': 'none', 'target_identifier': 'ami-51693a14', 'object_type': 'provider_image', 'image': '1970c4ed-1fb5-48f0-8676-5343a82fbf21'}) for http://localhost:9090/provider_images/259bb81e-5f8e-4b30-8f06-55a7339bbc15 2011-04-26 13:58:21,022 DEBUG imagefactory.builders.BaseBuilder.FedoraBuilder pid(21666) Message: FedoraBuilder instance 41418960 pushed image with uuid 1970c4ed-1fb5-48f0-8676-5343a82fbf21 to provider_image UUID (259bb81e-5f8e-4b30-8f06-55a7339bbc15) and set metadata: {'target_identifier': 'ami-51693a14', 'icicle': 'none', 'image': '1970c4ed-1fb5-48f0-8676-5343a82fbf21', 'provider': 'ec2-us-west-1'} [root@hp-xw8600-01 ~]# Recreate: 1. create a provider account for us-west and us-east 2. create a template 3. build and push template 4. create a realm for us-east and us-west 5. start the instance in us-west realm error.. ami not found
--- 004.000: Request is held. Hold reason: Create_Instance_Failure: InvalidAMIID.NotFound: The AMI ID 'ami-51693a14' does not exist --- 005.000: Request is held. Hold reason: Create_Instance_Failure: InvalidAMIID.NotFound: The AMI ID 'ami-2d693a68' does not exist Hold reason: Create_Instance_Failure: InvalidAMIID.NotFound: The AMI ID 'ami-2d693a68' does not exist [root@hp-xw8600-01 ~]# cat /var/log/imagefactory.log | grep ami-2d693a68 2011-04-26 16:14:17,473 DEBUG imagefactory.builders.BaseBuilder.FedoraBuilder pid(21666) Message: Register output: IMAGE ami-2d693a68 2011-04-26 16:14:17,473 DEBUG imagefactory.builders.BaseBuilder.FedoraBuilder pid(21666) Message: Extracted AMI ID: ami-2d693a68 2011-04-26 16:14:17,477 DEBUG imagefactory.ImageWarehouse.ImageWarehouse pid(21666) Message: Setting metadata ({'provider': 'ec2-us-west-1', 'uuid': '28f60499-0228-493b-b653-89cd4e04b338', 'icicle': 'none', 'target_identifier': 'ami-2d693a68', 'object_type': 'provider_image', 'image': 'b8041cd6-367a-4c96-bd83-bb30ca1de74c'}) for http://localhost:9090/provider_images/28f60499-0228-493b-b653-89cd4e04b338 2011-04-26 16:14:20,439 DEBUG imagefactory.builders.BaseBuilder.FedoraBuilder pid(21666) Message: FedoraBuilder instance 41419856 pushed image with uuid b8041cd6-367a-4c96-bd83-bb30ca1de74c to provider_image UUID (28f60499-0228-493b-b653-89cd4e04b338) and set metadata: {'target_identifier': 'ami-2d693a68', 'icicle': 'none', 'image': 'b8041cd6-367a-4c96-bd83-bb30ca1de74c', 'provider': 'ec2-us-west-1'} [root@hp-xw8600-01 ~]#
======================================================= Wes asked me to recreate this and it appears I did, basically followed his recreation steps in comment 0 at which point I got: ======================================================= [root@hp-ml370g6-01 deltacloud-ec2-us-west-1]# condor_q -- Submitter: hp-ml370g6-01.rhts.eng.bos.redhat.com : <10.16.66.124:50937> : hp-ml370g6-01.rhts.eng.bos.redhat.com ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 1.0 aeolus 4/26 17:09 0+00:00:00 I 0 0.0 job_dave_1 1 jobs; 1 idle, 0 running, 0 held [root@hp-ml370g6-01 deltacloud-ec2-us-west-1]# condor_q -better -- Submitter: hp-ml370g6-01.rhts.eng.bos.redhat.com : <10.16.66.124:50937> : hp-ml370g6-01.rhts.eng.bos.redhat.com error: bad form error: problem with ExprToProfile --- 001.000: Run analysis summary. Of 1 machines, 1 are rejected by your job's requirements 0 reject your job because of their own requirements 0 match but are serving users with a better priority in the pool 0 match but reject the job for unknown reasons 0 match but will not currently preempt their existing job 0 match but are currently offline 0 are available to run your job No successful match recorded. Last failed match: Tue Apr 26 17:10:47 2011 Reason for last match failure: no match found WARNING: Be advised: No resources matched request's constraints The Requirements expression for your job is: ( target.front_end_hardware_profile_id == "14" && target.image == "1" && target.realm == "2" && conductor_quota_check(1,other.provider_account_id) ) [root@hp-ml370g6-01 deltacloud-ec2-us-west-1]# ==================================================================== Wasn;t sure anout this (seems like a hwp matching issue), I spoke to Wes who directed me to restart condor ==================================================================== [root@hp-ml370g6-01 deltacloud-ec2-us-west-1]# /etc/init.d/condor restart Stopping Condor daemons: [ OK ] Starting Condor daemons: [ OK ] [root@hp-ml370g6-01 deltacloud-ec2-us-west-1]# condor_q -better Warning: Found no submitters -- Submitter: hp-ml370g6-01.rhts.eng.bos.redhat.com : <10.16.66.124:44533> : hp-ml370g6-01.rhts.eng.bos.redhat.com --- 001.000: Run analysis summary. Of 0 machines, 0 are rejected by your job's requirements 0 reject your job because of their own requirements 0 match but are serving users with a better priority in the pool 0 match but reject the job for unknown reasons 0 match but will not currently preempt their existing job 0 match but are currently offline 0 are available to run your job No successful match recorded. Last failed match: Tue Apr 26 17:14:48 2011 Reason for last match failure: no match found WARNING: Be advised: No resources matched request's constraints WARNING: Be advised: Request 1.0 did not match any resource's constraints ========================================================================= At this point I considered the instance orphaned from the condor restart so I created a second instance ========================================================================= [root@hp-ml370g6-01 deltacloud-ec2-us-west-1]# condor_q -- Submitter: hp-ml370g6-01.rhts.eng.bos.redhat.com : <10.16.66.124:44533> : hp-ml370g6-01.rhts.eng.bos.redhat.com ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 1.0 aeolus 4/26 17:09 0+00:00:00 H 0 0.0 job_dave_1 2.0 aeolus 4/26 17:18 0+00:00:00 I 0 0.0 job_dave2_2 2 jobs; 1 idle, 0 running, 1 held [root@hp-ml370g6-01 deltacloud-ec2-us-west-1]# condor_q -better -- Submitter: hp-ml370g6-01.rhts.eng.bos.redhat.com : <10.16.66.124:44533> : hp-ml370g6-01.rhts.eng.bos.redhat.com --- 001.000: Request is held. Hold reason: Create_Instance_Failure: InvalidAMIID.NotFound: The AMI ID 'ami-ef693aaa' does not exist --- 002.000: Request has been matched. ========================================================================= Shortly thereafter, they both showed Not Found ========================================================================= [root@hp-ml370g6-01 deltacloud-ec2-us-west-1]# condor_q -- Submitter: hp-ml370g6-01.rhts.eng.bos.redhat.com : <10.16.66.124:44533> : hp-ml370g6-01.rhts.eng.bos.redhat.com ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 1.0 aeolus 4/26 17:09 0+00:00:00 H 0 0.0 job_dave_1 2.0 aeolus 4/26 17:18 0+00:00:00 H 0 0.0 job_dave2_2 2 jobs; 0 idle, 0 running, 2 held You have mail in /var/spool/mail/root [root@hp-ml370g6-01 deltacloud-ec2-us-west-1]# condor_q -better -- Submitter: hp-ml370g6-01.rhts.eng.bos.redhat.com : <10.16.66.124:44533> : hp-ml370g6-01.rhts.eng.bos.redhat.com --- 001.000: Request is held. Hold reason: Create_Instance_Failure: InvalidAMIID.NotFound: The AMI ID 'ami-ef693aaa' does not exist --- 002.000: Request is held. Hold reason: Create_Instance_Failure: InvalidAMIID.NotFound: The AMI ID 'ami-ef693aaa' does not exist [root@hp-ml370g6-01 deltacloud-ec2-us-west-1]#
For me , template build in east succeeded where as it failed for west Thu, 28 Apr 2011 06:45:21 GMT / 2011-04-28 02:45:21,950 DEBUG boto pid(13116) Message: Method: GET 2011-04-28 02:45:21,951 DEBUG boto pid(13116) Message: Path: /?AWSAccessKeyId=AKIAI2KPFDYVZKSRTJMQ&Action=TerminateInstances&InstanceId.1=i-427e5006&SignatureMethod=HmacSHA256&SignatureVersion=2&Timestamp=2011-04-28T06%3A45%3A21&Version=2009-11-30&Signature=FH3FrJ48SauxlYytCU0QQcZJq6nY3auZS8AI6csG/Gk%3D 2011-04-28 02:45:21,951 DEBUG boto pid(13116) Message: Data: 2011-04-28 02:45:21,951 DEBUG boto pid(13116) Message: Headers: {'Date': 'Thu, 28 Apr 2011 06:45:21 GMT', 'Content-Length': '0', 'Authorization': 'AWS AKIAI2KPFDYVZKSRTJMQ:cI6VvOo0G/dTpPsi+5hY9c+KhTc=', 'User-Agent': 'Boto/1.9b (linux2)'} 2011-04-28 02:45:21,951 DEBUG boto pid(13116) Message: Host: None 2011-04-28 02:45:22,603 DEBUG boto pid(13116) Message: <?xml version="1.0" encoding="UTF-8"?> <TerminateInstancesResponse xmlns="http://ec2.amazonaws.com/doc/2009-11-30/"> <requestId>079cf389-4808-4586-80aa-d5d037fb5366</requestId> <instancesSet> <item> <instanceId>i-427e5006</instanceId> <currentState> <code>32</code> <name>shutting-down</name> </currentState> <previousState> <code>16</code> <name>running</name> </previousState> </item> </instancesSet> </TerminateInstancesResponse> 2011-04-28 02:45:22,604 DEBUG imagefactory.builders.BaseBuilder.FedoraBuilder pid(13116) Message: Exception during push_image 2011-04-28 02:45:22,604 DEBUG imagefactory.builders.BaseBuilder.FedoraBuilder pid(13116) Message: Unexpected error: (<class 'imagefactory.ImageFactoryException.ImageFactoryException'>) 2011-04-28 02:45:22,604 DEBUG imagefactory.builders.BaseBuilder.FedoraBuilder pid(13116) Message: value: (Unable to gain ssh access after 300 seconds - aborting) 2011-04-28 02:45:22,605 DEBUG imagefactory.builders.BaseBuilder.FedoraBuilder pid(13116) Message: traceback: [' File "/usr/lib/python2.6/site-packages/imagefactory/builders/FedoraBuilder.py", line 409, in push_image\n self.push_image_snapshot(image_id, provider, credentials)\n', ' File "/usr/lib/python2.6/site-packages/imagefactory/builders/FedoraBuilder.py", line 562, in push_image_snapshot\n raise ImageFactoryException("Unable to gain ssh access after 300 seconds - aborting")\n'] 2011-04-28 02:45:22,605 DEBUG imagefactory.qmfagent.BuildAdaptor.BuildAdaptor pid(13116) Message: Raising event with agent handler (<ImageFactoryAgent(Thread-1, initial)>), changed status from PUSHING to FAILED ========================================================================== rpm -qa |grep aeolus aeolus-conductor-daemons-0.2.0-2.el6.x86_64 aeolus-configure-2.0.0-9.el6.noarch aeolus-conductor-0.2.0-2.el6.x86_64 aeolus-conductor-doc-0.2.0-2.el6.x86_64 =============================================================================
problem is most probably in deltacloud-api: with my account I can find random public images from us-east, but I can't see any image in us-west. Though in aws console I can switch between regions and can see public images from both regions. Will discuss this with Michal Fojtik.
So it seems we should set API_PROVIDER env variable when starting driver, otherwise default (us-east-1) is used.
With API_PROVIDER I can see images from us-west w/o problems. On machine hp-ml370g6-01.rhts.eng.bos.redhat.com are now fixed init deltacloud-core daemons, but I can't reproduce bug because I'm hitting same problem as Shveta - imagefactory can't connect to us-west with my account. Wes, could you please test it with your account, which worked (I think)? I believe problem should be fixed now.
root@ip-10-118-63-61 ~]# condor_q -- Submitter: ip-10-118-63-61.ec2.internal : <10.118.63.61:45189> : ip-10-118-63-61.ec2.internal ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 1.0 aeolus 5/2 05:22 0+00:33:53 R 0 0.0 job_ease_insta_1 2.0 aeolus 5/2 05:24 0+00:29:48 R 0 0.0 job_realm_insta_2 3.0 aeolus 5/2 05:41 0+00:00:00 I 0 0.0 job_west_insta_3 4.0 aeolus 5/2 05:45 0+00:00:00 I 0 0.0 job_west_insta_2_4 4 jobs; 2 idle, 2 running, 0 held condor_q shows the job running for us-west also.
this problem should be fixed in current version of aeolus-configure
moving to on_qa for review
woot.. finally working :) http://hp-xw6600-02.rhts.eng.bos.redhat.com:3006 10.16.65.48 - - [08/Jul/2011 17:39:25] "GET / HTTP/1.1" 301 - 0.0011 10.16.65.48 - - [08/Jul/2011 17:39:25] "GET /api HTTP/1.1" 200 926 0.0094 10.16.65.48 - - [08/Jul/2011 17:39:25] "GET / HTTP/1.1" 301 - 0.0008 10.16.65.48 - - [08/Jul/2011 17:39:25] "GET /api HTTP/1.1" 200 926 0.0150 10.16.65.48 - - [08/Jul/2011 17:39:26] "GET / HTTP/1.1" 301 - 0.0011 10.16.65.48 - - [08/Jul/2011 17:39:26] "GET /api HTTP/1.1" 200 926 0.0091 10.16.65.48 - - [08/Jul/2011 17:39:26] "GET /api/hardware_profiles HTTP/1.1" 200 1813 0.0123 [root@hp-xw6600-02 ~]# rpm -qa | grep aeolus rubygem-aeolus-cli-0.0.1-1.el6.20110708135911gitdb1097c.noarch aeolus-all-0.3.0-0.el6.20110708135911gitdb1097c.noarch aeolus-configure-2.0.1-0.el6.20110707131907gitfaa220b.noarch aeolus-conductor-0.3.0-0.el6.20110708135911gitdb1097c.noarch aeolus-conductor-daemons-0.3.0-0.el6.20110708135911gitdb1097c.noarch aeolus-conductor-doc-0.3.0-0.el6.20110708135911gitdb1097c.noarch
ignore comment 11 .. added text to wrong bug.. this bug is blocked by 719382
making sure all the bugs are at the right version for future queries
condor is gone.. [root@unused bin]# rpm -qa | grep aeolus aeolus-conductor-doc-0.4.0-0.20110929145941git7594098.fc15.noarch aeolus-conductor-0.4.0-0.20110929145941git7594098.fc15.noarch aeolus-conductor-daemons-0.4.0-0.20110929145941git7594098.fc15.noarch aeolus-conductor-devel-0.4.0-0.20110929145941git7594098.fc15.noarch aeolus-all-0.4.0-0.20110929145941git7594098.fc15.noarch rubygem-aeolus-image-0.1.0-3.20110919115936gitd1d24b4.fc15.noarch aeolus-configure-2.0.2-4.20110926142838git5044e56.fc15.noarch [root@unused bin]# rpm -qa | grep condor [root@unused bin]#