Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 795403

Summary:

"resource not found" under component outlines in uploading image

Product:

[Retired] CloudForms Cloud Engine

Reporter:

Shveta <ssachdev>

Component:

iwhd

Assignee:

jrd <jrd>

Status:

CLOSED ERRATA

QA Contact:

wes hayutin <whayutin>

Severity:

medium

Docs Contact:

Priority:

unspecified

Version:

1.0.0

CC:

akarol, cpelland, dajohnso, deltacloud-maint, hbrock, jprovazn, jturner, meyering, mkoci, morazi, ssachdev, whayutin

Target Milestone:

beta5

Keywords:

Rebase, Triaged

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Rebase: Bug Fixes and Enhancements

Doc Text:

Story Points:

---

Clone Of:

Clones:

803792 809504 (view as bug list)

Environment:

Last Closed:

2012-05-15 20:05:51 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

803792, 809504

Attachments:

Description	Flags
resource_not_found	none
ImageFactoryLogs_Pushpesh	none

Description Shveta 2012-02-20 12:29:45 UTC

Created attachment 564424 [details]
resource_not_found

Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. install aeolus-all
2. add ec2 provider account
3. aeolus-image build ...
4. aeolus-image push 
5. cloud --component outline -- image name -- delete 
6. Repeated steps 3, 4 and 5 several times and suddenly i got "resource not found"

I was uploading the image several times from UI and clicking on "view xml" button also in between and suddenly this came up .

Wil try to recreate and put down the exact steps ..
  
Actual results:


Expected results:


Additional info:

rpm -qa|grep aeolus
aeolus-conductor-0.8.0-28.el6.noarch
rubygem-aeolus-cli-0.3.0-8.el6.noarch
aeolus-conductor-daemons-0.8.0-28.el6.noarch
rubygem-aeolus-image-0.3.0-7.el6.noarch
aeolus-configure-2.5.0-14.el6.noarch
aeolus-all-0.8.0-28.el6.noarch
aeolus-conductor-doc-0.8.0-28.el6.noarch

Comment 1 Shveta 2012-02-20 12:31:56 UTC

Resource Not Found: 
/usr/lib/ruby/gems/1.8/gems/aeolus-image-0.3.0/lib/aeolus_image/model/warehouse/warehouse_client.rb:154:in `do_request'
 /usr/lib/ruby/gems/1.8/gems/aeolus-image-0.3.0/lib/aeolus_image/model/warehouse/warehouse_client.rb:43:in `body'
 /usr/lib/ruby/gems/1.8/gems/aeolus-image-0.3.0/lib/aeolus_image/model/warehouse/image.rb:25:in `initialize'
 /usr/lib/ruby/gems/1.8/gems/aeolus-image-0.3.0/lib/aeolus_image/model/warehouse/warehouse_model.rb:103:in `new'
 /usr/lib/ruby/gems/1.8/gems/aeolus-image-0.3.0/lib/aeolus_image/model/warehouse/warehouse_model.rb:103:in `all'
 /usr/lib/ruby/gems/1.8/gems/aeolus-image-0.3.0/lib/aeolus_image/model/warehouse/warehouse_model.rb:102:in `map'
 /usr/lib/ruby/gems/1.8/gems/aeolus-image-0.3.0/lib/aeolus_image/model/warehouse/warehouse_model.rb:102:in `all'
 /usr/share/aeolus-conductor/app/controllers/images_controller.rb:31:in `index'
 /usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/action_controller/metal/implicit_render.rb:4:in `send_action'
 /usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/action_controller/metal/implicit_render.rb:4:in `send_action'
 /usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/abstract_controller/base.rb:150:in `process_action'
 /usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/action_controller/metal/rendering.rb:11:in `process_action'
 /usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/abstract_controller/callbacks.rb:18:in `process_action'
 /usr/lib/ruby/gems/1.8/gems/activesupport-3.0.10/lib/active_support/callbacks.rb:446:in `_run__121526048__process_action__565139051__callbacks'
 /usr/lib/ruby/gems/1.8/gems/activesupport-3.0.10/lib/active_support/callbacks.rb:410:in `send'
 /usr/lib/ruby/gems/1.8/gems/activesupport-3.0.10/lib/active_support/callbacks.rb:410:in `_run_process_action_callbacks'
 /usr/lib/ruby/gems/1.8/gems/activesupport-3.0.10/lib/active_support/callbacks.rb:94:in `send'
 /usr/lib/ruby/gems/1.8/gems/activesupport-3.0.10/lib/active_support/callbacks.rb:94:in `run_callbacks'
 /usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/abstract_controller/callbacks.rb:17:in `process_action'
 /usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/action_controller/metal/rescue.rb:17:in `process_action'
 /usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/action_controller/metal/instrumentation.rb:30:in `process_action'
 /usr/lib/ruby/gems/1.8/gems/activesupport-3.0.10/lib/active_support/notifications.rb:52:in `instrument'
 /usr/lib/ruby/gems/1.8/gems/activesupport-3.0.10/lib/active_support/notifications/instrumenter.rb:21:in `instrument'
 /usr/lib/ruby/gems/1.8/gems/activesupport-3.0.10/lib/active_support/notifications.rb:52:in `instrument'
 /usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/action_controller/metal/instrumentation.rb:29:in `process_action'
 /usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/abstract_controller/base.rb:119:in `process'
 /usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/abstract_controller/rendering.rb:41:in `process'
 /usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/action_controller/metal.rb:138:in `dispatch'
 /usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/action_controller/metal/rack_delegation.rb:14:in `dispatch'
 /usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/action_controller/metal.rb:178:in `action'
 /usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/action_dispatch/routing/route_set.rb:62:in `call'
 /usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/action_dispatch/routing/route_set.rb:62:in `dispatch'
 /usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/action_dispatch/routing/route_set.rb:27:in `call'
 /usr/lib/ruby/gems/1.8/gems/rack-mount-0.7.1/lib/rack/mount/route_set.rb:150:in `call'
 /usr/lib/ruby/gems/1.8/gems/rack-mount-0.7.1/lib/rack/mount/code_generation.rb:93:in `recognize'
 /usr/lib/ruby/gems/1.8/gems/rack-mount-0.7.1/lib/rack/mount/code_generation.rb:131:in `optimized_each'
 /usr/lib/ruby/gems/1.8/gems/rack-mount-0.7.1/lib/rack/mount/code_generation.rb:92:in `recognize'
 /usr/lib/ruby/gems/1.8/gems/rack-mount-0.7.1/lib/rack/mount/route_set.rb:141:in `call'
 /usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/action_dispatch/routing/route_set.rb:493:in `call'
 /usr/lib/ruby/gems/1.8/gems/warden-1.0.5/lib/warden/manager.rb:35:in `call'
 /usr/lib/ruby/gems/1.8/gems/warden-1.0.5/lib/warden/manager.rb:34:in `catch'
 /usr/lib/ruby/gems/1.8/gems/warden-1.0.5/lib/warden/manager.rb:34:in `call'
 /usr/lib/ruby/gems/1.8/gems/rack-restful_submit-1.2.1/lib/rack/rack-restful_submit.rb:29:in `call'
 /usr/lib/ruby/gems/1.8/gems/sass-3.1.4/lib/sass/../sass/plugin/rack.rb:54:in `call'
 /usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/action_dispatch/middleware/best_standards_support.rb:17:in `call'
 /usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/action_dispatch/middleware/head.rb:14:in `call'
 /usr/lib/ruby/gems/1.8/gems/rack-1.3.0/lib/rack/methodoverride.rb:24:in `call'
 /usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/action_dispatch/middleware/params_parser.rb:21:in `call'
 /usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/action_dispatch/middleware/flash.rb:182:in `call'
 /usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/action_dispatch/middleware/session/abstract_store.rb:149:in `call'
 /usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/action_dispatch/middleware/cookies.rb:302:in `call'
 /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/query_cache.rb:32:in `call'
 /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/connection_adapters/abstract/query_cache.rb:28:in `cache'
 /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/query_cache.rb:12:in `cache'
 /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/query_cache.rb:31:in `call'
 /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.10/lib/active_record/connection_adapters/abstract/connection_pool.rb:354:in `call'
 /usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/action_dispatch/middleware/callbacks.rb:46:in `call'
 /usr/lib/ruby/gems/1.8/gems/activesupport-3.0.10/lib/active_support/callbacks.rb:416:in `_run_call_callbacks'
 /usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/action_dispatch/middleware/callbacks.rb:44:in `call'
 /usr/lib/ruby/gems/1.8/gems/rack-1.3.0/lib/rack/sendfile.rb:102:in `call'
 /usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/action_dispatch/middleware/remote_ip.rb:48:in `call'
 /usr/lib/ruby/gems/1.8/gems/actionpack-3.0.10/lib/action_dispatch/middleware/show_exceptions.rb:47:in `call'
 /usr/lib/ruby/gems/1.8/gems/railties-3.0.10/lib/rails/rack/logger.rb:13:in `call'
 /usr/lib/ruby/gems/1.8/gems/rack-1.3.0/lib/rack/runtime.rb:17:in `call'
 /usr/lib/ruby/gems/1.8/gems/rack-1.3.0/lib/rack/lock.rb:34:in `call'
 /usr/lib/ruby/gems/1.8/gems/railties-3.0.10/lib/rails/application.rb:168:in `call'
 /usr/lib/ruby/gems/1.8/gems/railties-3.0.10/lib/rails/application.rb:77:in `send'
 /usr/lib/ruby/gems/1.8/gems/railties-3.0.10/lib/rails/application.rb:77:in `method_missing'
 /usr/lib/ruby/gems/1.8/gems/rack-1.3.0/lib/rack/urlmap.rb:52:in `call'
 /usr/lib/ruby/gems/1.8/gems/rack-1.3.0/lib/rack/urlmap.rb:46:in `each'
 /usr/lib/ruby/gems/1.8/gems/rack-1.3.0/lib/rack/urlmap.rb:46:in `call'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.11/lib/thin/connection.rb:84:in `pre_process'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.11/lib/thin/connection.rb:82:in `catch'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.11/lib/thin/connection.rb:82:in `pre_process'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.11/lib/thin/connection.rb:57:in `process'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.11/lib/thin/connection.rb:42:in `receive_data'
 /usr/lib/ruby/gems/1.8/gems/eventmachine-0.12.10/lib/eventmachine.rb:256:in `run_machine'
 /usr/lib/ruby/gems/1.8/gems/eventmachine-0.12.10/lib/eventmachine.rb:256:in `run'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.11/lib/thin/backends/base.rb:61:in `start'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.11/lib/thin/server.rb:159:in `start'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.11/lib/thin/controllers/controller.rb:86:in `start'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.11/lib/thin/runner.rb:185:in `send'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.11/lib/thin/runner.rb:185:in `run_command'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.11/lib/thin/runner.rb:151:in `run!'
 /usr/lib/ruby/gems/1.8/gems/thin-1.2.11/bin/thin:6
 /usr/bin/thin:19:in `load'
 /usr/bin/thin:19
Rendered layouts/_masthead.html.haml (1.7ms)
Rendered layouts/_new_notification.html.haml (0.5ms)
Rendered layouts/error.rhtml within layouts/application (157.0ms)
Completed 200 OK in 180ms (Views: 158.7ms | ActiveRecord: 7.3ms)

Comment 2 Shveta 2012-02-20 13:20:07 UTC

May be this can help
============================== rails.log=== provider.id=null===================


  Permission Load (0.7ms)  SELECT * FROM (SELECT DISTINCT ON ("permissions".id) "permissions".id, permissions.id AS alias_0 FROM "permissions" LEFT OUTER JOIN "roles" ON "roles"."id" = "permissions"."role_id" LEFT OUTER JOIN "privileges" ON "privileges"."role_id" = "roles"."id" WHERE ("permissions".permission_object_id = 1 AND "permissions".permission_object_type = 'BasePermissionObject') AND (permissions.user_id=1 and
 privileges.target_type='Catalog' and
 privileges.action='modify')) AS id_list ORDER BY id_list.alias_0 LIMIT 1
  Permission Load (0.5ms)  SELECT "permissions"."id" AS t0_r0, "permissions"."role_id" AS t0_r1, "permissions"."user_id" AS t0_r2, "permissions"."permission_object_id" AS t0_r3, "permissions"."permission_object_type" AS t0_r4, "permissions"."lock_version" AS t0_r5, "permissions"."created_at" AS t0_r6, "permissions"."updated_at" AS t0_r7, "roles"."id" AS t1_r0, "roles"."name" AS t1_r1, "roles"."scope" AS t1_r2, "roles"."lock_version" AS t1_r3, "roles"."created_at" AS t1_r4, "roles"."updated_at" AS t1_r5, "roles"."assign_to_owner" AS t1_r6, "privileges"."id" AS t2_r0, "privileges"."role_id" AS t2_r1, "privileges"."target_type" AS t2_r2, "privileges"."action" AS t2_r3, "privileges"."lock_version" AS t2_r4, "privileges"."created_at" AS t2_r5, "privileges"."updated_at" AS t2_r6 FROM "permissions" LEFT OUTER JOIN "roles" ON "roles"."id" = "permissions"."role_id" LEFT OUTER JOIN "privileges" ON "privileges"."role_id" = "roles"."id" WHERE "permissions"."id" IN (6) AND ("permissions".permission_object_id = 1 AND "permissions".permission_object_type = 'BasePermissionObject') AND (permissions.user_id=1 and
 privileges.target_type='Catalog' and
 privileges.action='modify') ORDER BY permissions.id ASC
  SQL (0.1ms)  SELECT COUNT(*) FROM "catalogs"
  SQL (0.1ms)  SELECT COUNT(*) FROM "hardware_profiles" WHERE "hardware_profiles"."provider_id" IS NULL
Rendered layouts/_admin_nav.html.haml (5.0ms)
  HardwareProfile Load (0.4ms)  SELECT "hardware_profiles".* FROM "hardware_profiles" WHERE "hardware_profiles"."provider_id" IS NULL
  CACHE (0.0ms)  SELECT COUNT(*) FROM "catalogs"
  Catalog Load (0.1ms)  SELECT "catalogs".* FROM "catalogs"
Rendered deployables/_new_from_image.html.haml (145.0ms)
Rendered layouts/_masthead.html.haml (1.5ms)
Rendered deployables/new.html.haml within layouts/application (165.2ms)
Completed 200 OK in 237ms (Views: 167.7ms | ActiveRecord: 21.1ms)


Started GET "/conductor/images" for 10.11.11.103 at Mon Feb 20 08:04:52 -0500 2012
  Processing by ImagesController#index as HTML

Comment 3 Shveta 2012-02-20 16:13:15 UTC

issue can be seen at 
https://qe-blade-13.idm.lab.bos.redhat.com/conductor/

Comment 4 Hugh Brock 2012-02-24 16:19:28 UTC

Scott, could this also be a result of the problems with image-environment scoping from the CLI?

Comment 5 Scott Seago 2012-02-29 17:27:31 UTC

Hugh -- hmm I don't think so if it's working several times in a row and then failing.

Is the latest aeolus-image installed? Any more info on how reliably it's reproduced? With exact steps?

Comment 6 Jan Provaznik 2012-03-06 12:41:53 UTC

This now occurs on ibm-ls22-03.rhts.eng.brq.redhat.com

This is in iwhd.log when listing images:
Mar 06 06:29:01 iwhd[25829]: 127.0.0.1.55349 "DELETE /images/3d8a303a-6760-11e1-a4e5-00215e860d70" -
...
Mar 06 06:50:51 iwhd[25829]: 127.0.0.1.55409 "PUT /images/3d8a303a-6760-11e1-a4e5-00215e860d70/latest_build" 36
Mar 06 06:50:51 iwhd[25829]: no size, mongo may be dead
Mar 06 06:50:51 iwhd[25829]: 127.0.0.1.55409 201 36
Mar 06 06:50:51 iwhd[25829]: 127.0.0.1.55409 "PUT /images/3d8a303a-6760-11e1-a4e5-00215e860d70/latest_unpushed" 4
Mar 06 06:50:51 iwhd[25829]: no size, mongo may be dead
...
Mar 06 07:12:32 iwhd[25829]: 127.0.0.1.56195 "GET /images/3d8a303a-6760-11e1-a4e5-00215e860d70" -
Mar 06 07:12:32 iwhd[25829]: images/3d8a303a-6760-11e1-a4e5-00215e860d70 not found at primary

So maybe we don't delete objects properly, or there is a bug in iwhd and an object stays in db after delete operation even if associated file was deleted?

Comment 7 Jim Meyering 2012-03-06 18:13:31 UTC

(In reply to comment #6)
> This is in iwhd.log when listing images:
> Mar 06 06:29:01 iwhd[25829]: 127.0.0.1.55349 "DELETE
> /images/3d8a303a-6760-11e1-a4e5-00215e860d70" -
> ...
> Mar 06 06:50:51 iwhd[25829]: 127.0.0.1.55409 "PUT
> /images/3d8a303a-6760-11e1-a4e5-00215e860d70/latest_build" 36
> Mar 06 06:50:51 iwhd[25829]: no size, mongo may be dead
> Mar 06 06:50:51 iwhd[25829]: 127.0.0.1.55409 201 36

Hi Jan,

The diagnostic above arises any time you tell iwhd to add an
attribute to a nonexistent object:

    $ printf junk | curl --write-out '%{http_code}\n' \
	-T - $http://localhost:9090/b/no-such/new_attr
    201

The DELETE deletes an object, and the following PUT
tries to give it the "latest_build" attribute.

---
PS. Yes, iwhd should give a different HTTP status code,
if possible, to indicate that the request is invalid.

Comment 8 Hugh Brock 2012-03-07 18:37:43 UTC

This seems like a race condition, I think we need to fix it. Given it's intermittent though I don't think it should block beta 4. Moving blocker to beta 5.

Comment 10 wes hayutin 2012-03-09 15:37:37 UTC

k. folks.. Scott has requested that QE provide a more accurate recreate.
We can do that..  Dev please hold off until QE has provided more information.

Thanks

Comment 11 Dave Johnson 2012-03-09 17:08:30 UTC

Wes asked that I take a peek at this and I stumbled across it banging on the UI. Now the box constantly shows the error when trying to access component outlines page. 

Is this consistent with comment 0 and comment 6?  One it started was it there permanently?

The same behaviour exists on the command line when using trying to list out the images with aeolus-image (which is ultimately querying conductor/api/images


[root@hp-sl390s-01 ~]# aeolus-image list --images

ERROR:  Service Temporarily Unavailable => Please check that Conductor is running.
[root@hp-sl390s-01 ~]# aeolus-check-services 

Checking mongod ...
 Success: mongod (pid 11427) is running...

Checking iwhd ...
 Success: iwhd (pid  11479) is running...

Checking postgresql ...
 Success: postmaster (pid  11665) is running...

Checking httpd ...
 Success: httpd (pid  11339) is running...

Checking qpidd ...
 Success: qpidd (pid  11844) is running...

Checking deltacloud-core ...
 Success: deltacloud-core (pid  11286) is running...

Checking libvirtd ...
 Success: libvirtd (pid  11913) is running...

Checking aeolus-conductor ...
 Success: thin (pid  21851) is running...

Checking conductor-dbomatic ...
 Success: dbomatic (pid  11820) is running...

Checking imagefactory ...
 Success: imagefactory (pid  21991) is running...

Checking ntpd ...
 Success: ntpd (pid  32676) is running...
[root@hp-sl390s-01 ~]# 



[root@hp-sl390s-01 ~]# curl -k https://127.0.0.1/conductor/api/images -u admin:password
<!DOCTYPE html>
<html lang='en' xml:lang='en'>
<head>
<meta charset='utf-8'>
<title>CloudForms Cloud Engine</title>

<<<cut some clutter out>>>

<div id='flash-hud'>
<a href="#" class="control">Close</a>
<div class='error flash-group'>
<div class='flash-subset'>
<div class='heading'>
<img alt="Errors" src="/conductor/images/flash_error_icon.png?1331073154" />
</div>
Resource Not Found
</div>
</div>
</div>

<<<cut some clutter out>>>

</html>
[root@hp-sl390s-01 ~]#

Comment 12 Dave Johnson 2012-03-09 18:10:21 UTC

Okay, started another round of testing and I think I have a solid recreate.  Just as comment 6 & 7 talk about, if you delete a image while it is being built or pushed, once that action is finished, imagefactory attempts to update warehouse, fails to find the object and then the box is in a bad state.

1.  Build a image
2.  Push a image 
3.  While pushing is still in progress, delete the image
4.  Tail imagefactory.log and wait for push to finish
5.  Resource not found starts showing up on the top level images page 
https://hp-dl380g6-01.rhts.eng.bos.redhat.com/conductor/pool_families
6.  Same with aeolus-image list --images

[root@intel-s3ea2-02 ~]# aeolus-image list --images

ERROR:  Service Temporarily Unavailable => Please check that Conductor is running.
[root@intel-s3ea2-02 ~]# hostname
intel-s3ea2-02.rhts.eng.bos.redhat.com


** I believe the same can be accomplished if you build a image (on a slower provider like rhevm/vsphere), allow it to finish, rebuild it again, while rebuilding, delete it, wait for rebuild to finish, resource not found

Comment 13 Jim Meyering 2012-03-09 23:42:32 UTC

patch posted here:

http://thread.gmane.org/gmane.comp.lib.iwhd.devel/1107

Comment 14 Jim Meyering 2012-03-12 18:30:19 UTC

built and ready for F16.  karma required:

  https://admin.fedoraproject.org/updates/iwhd-1.5-1.fc16

Also for F17 and F18.

Comment 15 Scott Seago 2012-03-13 14:00:11 UTC

It looks like, for now, the IWHD fix should be sufficient to resolve this. Setting to MODIFIED, as the fix is upstream.

Comment 16 Jim Meyering 2012-03-13 14:07:25 UTC

While fixing the iwhd bug solves part of the problem, note that there
is another one in the race that makes conductor remove an object and
then try to set attributes on the nonexistent object.

If someone expects to let the iwhd fix close this bug, it'd be good
to ensure that there's a separate BZ for the conductor race.

Comment 17 Scott Seago 2012-03-13 14:51:06 UTC

Yeah, the other component needed eventually is that factory (which is the component making the iwhd writes here at the end of a build/push) needs to check for the existence of the image object before 1) writing the latest_build attribute (or whatever attibute it's setting), and 2) creating the other associated objects that reference now-deleted objects (either the provider image or the target image).

So to clarify what I think will happen here if the image has been deleted:
1) factory will create the provider or target image objects without error (but will include a metadata key pointing to a now-invalid image object)
2) when factory attempts to set metadata attributes on the nonexistent object, iwhd will now throw an error response.

So what I don't know is how factory will recover from this error response. I'm guessing it will log that the build or push failed but leave the dangling target or provider image.

So first, have I understood the consequences of having the IWHD fix only? And second, is the resulting semi-fix acceptable for now, or do we really need some factory work as well (in the short term/1.0 -- longer-term we definitely need it).

Comment 18 Jim Meyering 2012-03-13 15:56:32 UTC

(In reply to comment #17)
...
> So to clarify what I think will happen here if the image has been deleted:
...
> 2) when factory attempts to set metadata attributes on the nonexistent object,
> iwhd will now throw an error response.

The above is correct:
The iwhd fix adds a test case that demonstrates a 404 return
code when attempting to set an attribute on a nonexistent object.

Comment 21 Mike Orazi 2012-03-15 17:50:53 UTC

Resetting status to MODIFIED so it can be associated with a different advisory.

Comment 22 jrd 2012-03-19 18:16:04 UTC

Jim's on PTO, so I'm dispatching this for him.  Per discussion, we believe the rightest course is to get the patched iwhd into service and make sure that causes the desired behaviour.

Marking back to on-qa for retesting.

Comment 23 wes hayutin 2012-03-30 19:02:05 UTC

assigning to Pushpesh

Comment 24 pushpesh sharma 2012-04-03 09:41:01 UTC

Followed the steps provided in comment-12.

I am not able to see the "resource not found" from web-UI as well as "ERROR:  Service Temporarily Unavailable => Please check that Conductor is
running." from the CLI.

But the AMI in the ec2 account has been created and not marked for removal.
Deleting a image while pushing it to provider,means I am not sure about  the image and i dont want to upload it to provider.So it shouldn't be uploaded. 





From imagefactory.log it is visible that it tries to  push image and then communicate with imagefactory with imageId Image factory responded back with "Message: Image Warehouse returned status (404) with message: 404 no such object"
So a possible fix would be to initiate a clean-up call for this kind of AMI,with AMI number returned to imagefactory. 


I am marking this bug to "assigned" for a better fix.

Comment 25 pushpesh sharma 2012-04-03 09:43:06 UTC

Created attachment 574800 [details]
ImageFactoryLogs_Pushpesh

PFA for corresponding log ImageFactoryLogs_Pushpesh

Comment 26 pushpesh sharma 2012-04-03 09:46:14 UTC

ec2-console has the corresponding ami while conductor don't have the image.

Comment 27 Mike Orazi 2012-04-03 17:49:23 UTC

I'm going to copy over the most recent handful of comments to 803792 which I think is lined up against the underlying image factory issue.

Comment 28 wes hayutin 2012-04-04 01:19:27 UTC

OK OK.. lots of confusion here..
To boil down the critical issue here..

The current behavior should be the following..
1. build/push an image
2. if that image is some how deleted by the user or other acts of god occur, when factory has finished its push task, and factory goes to write the data it then queries warehouse which returns a 404 (1 part of the fix) because the image has been deleted.
3. Note that warehouse should *not* crash

The second part of this bug.. which morazi, sseago, and jim are addressing in a future release. Factory needs to be aware of the fact that the image was deleted.
That will be addressed in 803792

While building/pushing and deleting ec2 and mock images, I noticed several 404's in the warehouse log (a good thing)

Apr 03 21:02:54 iwhd[12107]: 127.0.0.1.40490 200 36
Apr 03 21:02:54 iwhd[12107]: 127.0.0.1.40491 "POST /target_images/_query" 50
Apr 03 21:02:54 iwhd[12107]: 127.0.0.1.40491 404 0
Apr 03 21:03:23 iwhd[12107]: 127.0.0.1.40498 "GET /images" -
Apr 03 21:03:23 iwhd[12107]: 127.0.0.1.40498 200 118
Apr 03 21:03:23 iwhd[12107]: 127.0.0.1.40499 "POST /images/e09895b6-7ded-11e1-b835-5cf3fc1c85b0" 8
Apr 03 21:03:23 iwhd[12107]: 127.0.0.1.40499 200 919
Apr 03 21:03:23 iwhd[12107]: 127.0.0.1.40516 404 0
Apr 03 21:03:53 iwhd[12107]: 127.0.0.1.40540 404 0
Apr 03 21:04:23 iwhd[12107]: 127.0.0.1.40559 404 0

I also noticed that all the services remained running.. through out this malicious test.

In response to Scott's theory that the push will fail, I found the opposite to be true.. The push seems to continue even though the image has been deleted.  While this is not the expected or correct behavior, the customer is not impacted .
2012-04-03 20:40:06,646 DEBUG imgfac.BuildJob.BuildJob thread(968d042e) Message: Builder (968d042e-a979-4d88-83dd-d42590a925f0) changed status from PUSHING to COMPLETED
2012-04-03 20:40:48,979 DEBUG imgfac.BuildJob.BuildJob thread(c7f81e08) Message: Builder (c7f81e08-96ae-48dc-af9d-92c8a2141884) changed status from PUSHING to COMPLETED
2012-04-03 20:40:48,981 DEBUG imgfac.BuildJob.BuildJob thread(c7617a9f) Message: Builder (c7617a9f-95ef-4729-9f8c-8080597b8894) changed status from NEW to PUSHING
2012-04-03 20:42:59,397 DEBUG imgfac.BuildJob.BuildJob thread(c7617a9f) Message: Builder (c7617a9f-95ef-4729-9f8c-8080597b8894) changed status from PUSHING to FAILED
2012-04-03 20:42:59,398 DEBUG imgfac.BuildJob.BuildJob thread(e69e5c29) Message: Builder (e69e5c29-ba39-4e4b-8f06-ad30683c6055) changed status from NEW to PUSHING
2012-04-03 20:43:37,233 DEBUG imgfac.BuildJob.BuildJob thread(573c6001) Message: Builder (573c6001-6541-4d53-b2cb-df6986548ca3) changed status from PUSHING to COMPLETED
2012-04-03 20:43:37,236 DEBUG imgfac.BuildJob.BuildJob thread(7ada1305) Message: Builder (7ada1305-5942-43b9-b2bf-8cbd09087d03) changed status from NEW to PUSHING
2012-04-03 20:43:53,897 DEBUG imgfac.BuildJob.BuildJob thread(7a17194f) Message: Builder (7a17194f-2a3d-436b-a5cf-a34356327981) changed status from PUSHING to COMPLETED
2012-04-03 20:43:55,333 DEBUG imgfac.BuildJob.BuildJob thread(6464df59) Message: Builder (6464df59-b70b-457c-bed9-8fa2b4f9b224) changed status from PUSHING to COMPLETED
2012-04-03 20:47:11,598 DEBUG imgfac.BuildJob.BuildJob thread(e69e5c29) Message: Builder (e69e5c29-ba39-4e4b-8f06-ad30683c6055) changed status from PUSHING to COMPLETED
2012-04-03 20:47:54,521 DEBUG imgfac.BuildJob.BuildJob thread(7ada1305) Message: Builder (7ada1305-5942-43b9-b2bf-8cbd09087d03) changed status from PUSHING to COMPLETED
2012-04-03 20:58:15,170 DEBUG imgfac.BuildJob.BuildJob thread(b0aec824) Message: Builder (b0aec824-c6b7-4862-b049-8d6b72f1449c) changed status from NEW to PUSHING
2012-04-03 21:01:40,501 DEBUG imgfac.BuildJob.BuildJob thread(a63d3a6e) Message: Builder (a63d3a6e-d0a1-4c29-ad84-4a15f7cbbaf9) changed status from NEW to PUSHING
2012-04-03 21:02:50,467 DEBUG imgfac.BuildJob.BuildJob thread(b0aec824) Message: Builder (b0aec824-c6b7-4862-b049-8d6b72f1449c) changed status from PUSHING to COMPLETED
2012-04-03 21:05:39,386 DEBUG imgfac.BuildJob.BuildJob thread(a63d3a6e) Message: Builder (a63d3a6e-d0a1-4c29-ad84-4a15f7cbbaf9) changed status from PUSHING to COMPLETED


marking this bug as verified..
[root@qeblade30 ~]# rpm -qa | grep aeolus
aeolus-conductor-daemons-0.8.7-1.el6.noarch
rubygem-aeolus-image-0.3.0-12.el6.noarch
aeolus-configure-2.5.2-1.el6.noarch
aeolus-all-0.8.7-1.el6.noarch
aeolus-conductor-doc-0.8.7-1.el6.noarch
aeolus-conductor-0.8.7-1.el6.noarch
rubygem-aeolus-cli-0.3.1-1.el6.noarch
[root@qeblade30 ~]# rpm -qa | grep factory
imagefactory-jeosconf-ec2-fedora-1.0.0rc11-1.el6.noarch
imagefactory-jeosconf-ec2-rhel-1.0.0rc11-1.el6.noarch
imagefactory-1.0.0rc11-1.el6.noarch
[root@qeblade30 ~]# rpm -qa | grep iwhd
iwhd-1.5-2.el6.x86_64

Comment 29 errata-xmlrpc 2012-05-15 20:05:51 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2012-0589.html