Bug 1040113

Summary: deconfigure operation fails if downloadable cart has already been removed from a gear
Product: OpenShift Online Reporter: Andy Grimm <agrimm>
Component: PodAssignee: Abhishek Gupta <abhgupta>
Status: CLOSED CURRENTRELEASE QA Contact: libra bugs <libra-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.xCC: agrimm, jgoulding, jhou, mpatel, xtian
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-03-12 03:05:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Andy Grimm 2013-12-10 18:39:34 UTC
Description of problem:

If the broker and node get out of sync where a node has deconfigured a downloadable cartridge in a gear, but the broker believes the operation is still pending, the broker will retry indefinitely to clear the pending op, and it will fail on the node each time.

Version-Release number of selected component (if applicable):

rubygem-openshift-origin-node-1.17.8-1.el6oso.noarch

How reproducible:

Always

Steps to Reproduce (I think this will produce the result; I have not tested yet):
1. create an app 
2. Add a downloadable cartridge which fails in some way (timeout, bad command in setup, etc.)
3. as soon as the broker issues "deconfigure" after the failure, kill the broker, so that it is not aware of the result of the deconfigure
4. attempt to run oo-admin-clear-pending-ops on the gear

Actual results (in platform log):

Dec  6 12:00:35 ex-med-node15 mcollectived[278959]: openshift.rb:92:in `cartridge_do_action' cartridge_do_action call / action: cartridge_do, agent=openshift, data={:cartridge=>"continuent-tungsten-2.0",
 :action=>"deconfigure",
 :args=>
  {"--with-app-uuid"=>"52a17718e0b8cdeaa00000a2",
   "--with-app-name"=>"tdf",
   "--with-container-uuid"=>"52a17a675973cab1da00001f",
   "--with-container-name"=>"52a17a675973cab1da00001f",
   "--with-namespace"=>"narmitag",
   "--with-uid"=>1956,
   "--with-request-id"=>nil,
   "--cart-name"=>"tungsten-2.0",
   "--component-name"=>"continuent-tungsten-2.0",
   "--with-software-version"=>"2.0",
   "--cartridge-vendor"=>"continuent"},
 :process_results=>true}
Dec  6 12:00:35 ex-med-node15 mcollectived[278959]: openshift.rb:93:in `cartridge_do_action' cartridge_do_action validation = continuent-tungsten-2.0 deconfigure {"--with-app-uuid"=>"52a17718e0b8cdeaa00000a2", "--with-app-name"=>"tdf", "--with-container-uuid"=>"52a17a675973cab1da00001f", "--with-container-name"=>"52a17a675973cab1da00001f", "--with-namespace"=>"narmitag", "--with-uid"=>1956, "--with-request-id"=>nil, "--cart-name"=>"tungsten-2.0", "--component-name"=>"continuent-tungsten-2.0", "--with-software-version"=>"2.0", "--cartridge-vendor"=>"continuent"}
Dec  6 12:00:35 ex-med-node15 mcollectived[278959]: openshift.rb:134:in `execute_action' Executing action [deconfigure] using method oo_deconfigure with args [{"--with-app-uuid"=>"52a17718e0b8cdeaa00000a2", "--with-app-name"=>"tdf", "--with-container-uuid"=>"52a17a675973cab1da00001f", "--with-container-name"=>"52a17a675973cab1da00001f", "--with-namespace"=>"narmitag", "--with-uid"=>1956, "--with-request-id"=>nil, "--cart-name"=>"tungsten-2.0", "--component-name"=>"continuent-tungsten-2.0", "--with-software-version"=>"2.0", "--cartridge-vendor"=>"continuent"}]
Dec  6 12:00:35 ex-med-node15 mcollectived[278959]: openshift.rb:310:in `rescue in with_container_from_args' key not found: (tungsten, 2.0, _)
Dec  6 12:00:35 ex-med-node15 mcollectived[278959]: openshift.rb:311:in `rescue in with_container_from_args' /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.17.8/lib/openshift-origin-node/model/cartridge_repository.rb:155:in `select'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.17.8/lib/openshift-origin-node/model/v2_cart_model.rb:379:in `rescue in rescue in deconfigure'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.17.8/lib/openshift-origin-node/model/v2_cart_model.rb:374:in `rescue in deconfigure'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.17.8/lib/openshift-origin-node/model/v2_cart_model.rb:367:in `deconfigure'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.17.8/lib/openshift-origin-node/model/application_container_ext/cartridge_actions.rb:120:in `deconfigure'
/opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:880:in `block in oo_deconfigure'
/opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:299:in `with_container_from_args'
/opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:879:in `oo_deconfigure'
/opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:139:in `execute_action'
/opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:104:in `cartridge_do_action'
/opt/rh/ruby193/root/usr/share/ruby/mcollective/rpc/agent.rb:86:in `handlemsg'
/opt/rh/ruby193/root/usr/share/ruby/mcollective/agents.rb:126:in `block (2 levels) in dispatch'
/opt/rh/ruby193/root/usr/share/ruby/timeout.rb:69:in `timeout'
/opt/rh/ruby193/root/usr/share/ruby/mcollective/agents.rb:125:in `block in dispatch'
Dec  6 12:00:35 ex-med-node15 mcollectived[278959]: openshift.rb:150:in `execute_action' Finished executing action [deconfigure] (1)
Dec  6 12:00:35 ex-med-node15 mcollectived[278959]: openshift.rb:114:in `cartridge_do_action' cartridge_do_action failed (1)
------
key not found: (tungsten, 2.0, _)
------)
Dec  6 12:00:35 ex-med-node15 mcollectived[278959]: openshift.rb:1241:in `rescue in has_app_cartridge_action' Failed to get cartridge 'continuent-tungsten-2.0' from  in gear 52a17a675973cab1da00001f: Cartridge directory not found for continuent-tungsten-2.0
Dec  6 12:00:35 ex-med-node15 mcollectived[278959]: openshift.rb:1242:in `rescue in has_app_cartridge_action' /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.17.8/lib/openshift-origin-node/model/v2_cart_model.rb:149:in `rescue in get_cartridge'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.17.8/lib/openshift-origin-node/model/v2_cart_model.rb:142:in `get_cartridge'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.17.8/lib/openshift-origin-node/model/application_container.rb:604:in `get_cartridge'
/opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:1236:in `has_app_cartridge_action'
/opt/rh/ruby193/root/usr/share/ruby/mcollective/rpc/agent.rb:86:in `handlemsg'
/opt/rh/ruby193/root/usr/share/ruby/mcollective/agents.rb:126:in `block (2 levels) in dispatch'
/opt/rh/ruby193/root/usr/share/ruby/timeout.rb:69:in `timeout'
/opt/rh/ruby193/root/usr/share/ruby/mcollective/agents.rb:125:in `block in dispatch'

Expected results:

deconfigure should succeed, even if the cartridge directory no longer exists in the gear.

Additional info:

It looks like the code in deconfigure in openshift-origin-node/model/v2_cart_model.rb might just need one more begin/rescue that simply returns if the cartridge repository lookup fails and all other methods to locate the cartridge have failed already.  It's possible that this will leave some cruft, but it's the best we can do.

Comment 1 Abhishek Gupta 2014-01-16 19:41:48 UTC
Andy Grimm will be submitting a patch for this bug.

Comment 2 Mrunal Patel 2014-01-16 22:33:44 UTC
https://github.com/openshift/origin-server/pull/4500

Node fix. Expecting PR from broker as well.

Comment 3 Abhishek Gupta 2014-01-21 00:56:03 UTC
Fixed with --> https://github.com/openshift/origin-server/pull/4531

Comment 4 openshift-github-bot 2014-01-21 03:41:34 UTC
Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/8a16cd487f4c5a8a7a29c9f5f7491de35cb3e2a5
Bug 1040113: Handling edge cases in cleaning up downloaded cart map
Also, fixing a couple of minor issues

Comment 5 Jianwei Hou 2014-01-23 02:38:55 UTC
@Andy I was trying to verify this bug on this edge case when a download cart fails installation and deconfigure handle the exception when the download cart is removed from a gear. However, I was unable to reach to that scenario. I've added some bad commands in setup hooks of my download cart, and my app was deleted immediately after setup fails. Can you give me some detailed instructions on how to reach this scenario?
Many thanks!

Comment 6 Jianwei Hou 2014-01-24 03:36:41 UTC
Verified on devenv_4267

steps:
1. Add non-zero return code in setup hook in the dowload cart
2. Add some sleep time in v2_cart_model.rb in deconfigure method
3. Clear cache and restart mcollective
4. Create an app using this cart
5. As soon as deconfigure action is underway, stop broker
6. Delete the gears from node
7. Revert above changes, start broker and clear pending ops

Comment 7 Jianwei Hou 2014-02-26 03:13:07 UTC