Bug 1040113 - deconfigure operation fails if downloadable cart has already been removed from a gear
Summary: deconfigure operation fails if downloadable cart has already been removed fro...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Pod
Version: 2.x
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Abhishek Gupta
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-12-10 18:39 UTC by Andy Grimm
Modified: 2016-11-08 03:47 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-03-12 03:05:53 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Andy Grimm 2013-12-10 18:39:34 UTC
Description of problem:

If the broker and node get out of sync where a node has deconfigured a downloadable cartridge in a gear, but the broker believes the operation is still pending, the broker will retry indefinitely to clear the pending op, and it will fail on the node each time.

Version-Release number of selected component (if applicable):

rubygem-openshift-origin-node-1.17.8-1.el6oso.noarch

How reproducible:

Always

Steps to Reproduce (I think this will produce the result; I have not tested yet):
1. create an app 
2. Add a downloadable cartridge which fails in some way (timeout, bad command in setup, etc.)
3. as soon as the broker issues "deconfigure" after the failure, kill the broker, so that it is not aware of the result of the deconfigure
4. attempt to run oo-admin-clear-pending-ops on the gear

Actual results (in platform log):

Dec  6 12:00:35 ex-med-node15 mcollectived[278959]: openshift.rb:92:in `cartridge_do_action' cartridge_do_action call / action: cartridge_do, agent=openshift, data={:cartridge=>"continuent-tungsten-2.0",
 :action=>"deconfigure",
 :args=>
  {"--with-app-uuid"=>"52a17718e0b8cdeaa00000a2",
   "--with-app-name"=>"tdf",
   "--with-container-uuid"=>"52a17a675973cab1da00001f",
   "--with-container-name"=>"52a17a675973cab1da00001f",
   "--with-namespace"=>"narmitag",
   "--with-uid"=>1956,
   "--with-request-id"=>nil,
   "--cart-name"=>"tungsten-2.0",
   "--component-name"=>"continuent-tungsten-2.0",
   "--with-software-version"=>"2.0",
   "--cartridge-vendor"=>"continuent"},
 :process_results=>true}
Dec  6 12:00:35 ex-med-node15 mcollectived[278959]: openshift.rb:93:in `cartridge_do_action' cartridge_do_action validation = continuent-tungsten-2.0 deconfigure {"--with-app-uuid"=>"52a17718e0b8cdeaa00000a2", "--with-app-name"=>"tdf", "--with-container-uuid"=>"52a17a675973cab1da00001f", "--with-container-name"=>"52a17a675973cab1da00001f", "--with-namespace"=>"narmitag", "--with-uid"=>1956, "--with-request-id"=>nil, "--cart-name"=>"tungsten-2.0", "--component-name"=>"continuent-tungsten-2.0", "--with-software-version"=>"2.0", "--cartridge-vendor"=>"continuent"}
Dec  6 12:00:35 ex-med-node15 mcollectived[278959]: openshift.rb:134:in `execute_action' Executing action [deconfigure] using method oo_deconfigure with args [{"--with-app-uuid"=>"52a17718e0b8cdeaa00000a2", "--with-app-name"=>"tdf", "--with-container-uuid"=>"52a17a675973cab1da00001f", "--with-container-name"=>"52a17a675973cab1da00001f", "--with-namespace"=>"narmitag", "--with-uid"=>1956, "--with-request-id"=>nil, "--cart-name"=>"tungsten-2.0", "--component-name"=>"continuent-tungsten-2.0", "--with-software-version"=>"2.0", "--cartridge-vendor"=>"continuent"}]
Dec  6 12:00:35 ex-med-node15 mcollectived[278959]: openshift.rb:310:in `rescue in with_container_from_args' key not found: (tungsten, 2.0, _)
Dec  6 12:00:35 ex-med-node15 mcollectived[278959]: openshift.rb:311:in `rescue in with_container_from_args' /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.17.8/lib/openshift-origin-node/model/cartridge_repository.rb:155:in `select'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.17.8/lib/openshift-origin-node/model/v2_cart_model.rb:379:in `rescue in rescue in deconfigure'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.17.8/lib/openshift-origin-node/model/v2_cart_model.rb:374:in `rescue in deconfigure'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.17.8/lib/openshift-origin-node/model/v2_cart_model.rb:367:in `deconfigure'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.17.8/lib/openshift-origin-node/model/application_container_ext/cartridge_actions.rb:120:in `deconfigure'
/opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:880:in `block in oo_deconfigure'
/opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:299:in `with_container_from_args'
/opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:879:in `oo_deconfigure'
/opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:139:in `execute_action'
/opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:104:in `cartridge_do_action'
/opt/rh/ruby193/root/usr/share/ruby/mcollective/rpc/agent.rb:86:in `handlemsg'
/opt/rh/ruby193/root/usr/share/ruby/mcollective/agents.rb:126:in `block (2 levels) in dispatch'
/opt/rh/ruby193/root/usr/share/ruby/timeout.rb:69:in `timeout'
/opt/rh/ruby193/root/usr/share/ruby/mcollective/agents.rb:125:in `block in dispatch'
Dec  6 12:00:35 ex-med-node15 mcollectived[278959]: openshift.rb:150:in `execute_action' Finished executing action [deconfigure] (1)
Dec  6 12:00:35 ex-med-node15 mcollectived[278959]: openshift.rb:114:in `cartridge_do_action' cartridge_do_action failed (1)
------
key not found: (tungsten, 2.0, _)
------)
Dec  6 12:00:35 ex-med-node15 mcollectived[278959]: openshift.rb:1241:in `rescue in has_app_cartridge_action' Failed to get cartridge 'continuent-tungsten-2.0' from  in gear 52a17a675973cab1da00001f: Cartridge directory not found for continuent-tungsten-2.0
Dec  6 12:00:35 ex-med-node15 mcollectived[278959]: openshift.rb:1242:in `rescue in has_app_cartridge_action' /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.17.8/lib/openshift-origin-node/model/v2_cart_model.rb:149:in `rescue in get_cartridge'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.17.8/lib/openshift-origin-node/model/v2_cart_model.rb:142:in `get_cartridge'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.17.8/lib/openshift-origin-node/model/application_container.rb:604:in `get_cartridge'
/opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:1236:in `has_app_cartridge_action'
/opt/rh/ruby193/root/usr/share/ruby/mcollective/rpc/agent.rb:86:in `handlemsg'
/opt/rh/ruby193/root/usr/share/ruby/mcollective/agents.rb:126:in `block (2 levels) in dispatch'
/opt/rh/ruby193/root/usr/share/ruby/timeout.rb:69:in `timeout'
/opt/rh/ruby193/root/usr/share/ruby/mcollective/agents.rb:125:in `block in dispatch'

Expected results:

deconfigure should succeed, even if the cartridge directory no longer exists in the gear.

Additional info:

It looks like the code in deconfigure in openshift-origin-node/model/v2_cart_model.rb might just need one more begin/rescue that simply returns if the cartridge repository lookup fails and all other methods to locate the cartridge have failed already.  It's possible that this will leave some cruft, but it's the best we can do.

Comment 1 Abhishek Gupta 2014-01-16 19:41:48 UTC
Andy Grimm will be submitting a patch for this bug.

Comment 2 Mrunal Patel 2014-01-16 22:33:44 UTC
https://github.com/openshift/origin-server/pull/4500

Node fix. Expecting PR from broker as well.

Comment 3 Abhishek Gupta 2014-01-21 00:56:03 UTC
Fixed with --> https://github.com/openshift/origin-server/pull/4531

Comment 4 openshift-github-bot 2014-01-21 03:41:34 UTC
Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/8a16cd487f4c5a8a7a29c9f5f7491de35cb3e2a5
Bug 1040113: Handling edge cases in cleaning up downloaded cart map
Also, fixing a couple of minor issues

Comment 5 Jianwei Hou 2014-01-23 02:38:55 UTC
@Andy I was trying to verify this bug on this edge case when a download cart fails installation and deconfigure handle the exception when the download cart is removed from a gear. However, I was unable to reach to that scenario. I've added some bad commands in setup hooks of my download cart, and my app was deleted immediately after setup fails. Can you give me some detailed instructions on how to reach this scenario?
Many thanks!

Comment 6 Jianwei Hou 2014-01-24 03:36:41 UTC
Verified on devenv_4267

steps:
1. Add non-zero return code in setup hook in the dowload cart
2. Add some sleep time in v2_cart_model.rb in deconfigure method
3. Clear cache and restart mcollective
4. Create an app using this cart
5. As soon as deconfigure action is underway, stop broker
6. Delete the gears from node
7. Revert above changes, start broker and clear pending ops

Comment 7 Jianwei Hou 2014-02-26 03:13:07 UTC



Note You need to log in before you can comment on or make changes to this bug.