Hide Forgot
Description of problem: If the broker and node get out of sync where a node has deconfigured a downloadable cartridge in a gear, but the broker believes the operation is still pending, the broker will retry indefinitely to clear the pending op, and it will fail on the node each time. Version-Release number of selected component (if applicable): rubygem-openshift-origin-node-1.17.8-1.el6oso.noarch How reproducible: Always Steps to Reproduce (I think this will produce the result; I have not tested yet): 1. create an app 2. Add a downloadable cartridge which fails in some way (timeout, bad command in setup, etc.) 3. as soon as the broker issues "deconfigure" after the failure, kill the broker, so that it is not aware of the result of the deconfigure 4. attempt to run oo-admin-clear-pending-ops on the gear Actual results (in platform log): Dec 6 12:00:35 ex-med-node15 mcollectived[278959]: openshift.rb:92:in `cartridge_do_action' cartridge_do_action call / action: cartridge_do, agent=openshift, data={:cartridge=>"continuent-tungsten-2.0", :action=>"deconfigure", :args=> {"--with-app-uuid"=>"52a17718e0b8cdeaa00000a2", "--with-app-name"=>"tdf", "--with-container-uuid"=>"52a17a675973cab1da00001f", "--with-container-name"=>"52a17a675973cab1da00001f", "--with-namespace"=>"narmitag", "--with-uid"=>1956, "--with-request-id"=>nil, "--cart-name"=>"tungsten-2.0", "--component-name"=>"continuent-tungsten-2.0", "--with-software-version"=>"2.0", "--cartridge-vendor"=>"continuent"}, :process_results=>true} Dec 6 12:00:35 ex-med-node15 mcollectived[278959]: openshift.rb:93:in `cartridge_do_action' cartridge_do_action validation = continuent-tungsten-2.0 deconfigure {"--with-app-uuid"=>"52a17718e0b8cdeaa00000a2", "--with-app-name"=>"tdf", "--with-container-uuid"=>"52a17a675973cab1da00001f", "--with-container-name"=>"52a17a675973cab1da00001f", "--with-namespace"=>"narmitag", "--with-uid"=>1956, "--with-request-id"=>nil, "--cart-name"=>"tungsten-2.0", "--component-name"=>"continuent-tungsten-2.0", "--with-software-version"=>"2.0", "--cartridge-vendor"=>"continuent"} Dec 6 12:00:35 ex-med-node15 mcollectived[278959]: openshift.rb:134:in `execute_action' Executing action [deconfigure] using method oo_deconfigure with args [{"--with-app-uuid"=>"52a17718e0b8cdeaa00000a2", "--with-app-name"=>"tdf", "--with-container-uuid"=>"52a17a675973cab1da00001f", "--with-container-name"=>"52a17a675973cab1da00001f", "--with-namespace"=>"narmitag", "--with-uid"=>1956, "--with-request-id"=>nil, "--cart-name"=>"tungsten-2.0", "--component-name"=>"continuent-tungsten-2.0", "--with-software-version"=>"2.0", "--cartridge-vendor"=>"continuent"}] Dec 6 12:00:35 ex-med-node15 mcollectived[278959]: openshift.rb:310:in `rescue in with_container_from_args' key not found: (tungsten, 2.0, _) Dec 6 12:00:35 ex-med-node15 mcollectived[278959]: openshift.rb:311:in `rescue in with_container_from_args' /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.17.8/lib/openshift-origin-node/model/cartridge_repository.rb:155:in `select' /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.17.8/lib/openshift-origin-node/model/v2_cart_model.rb:379:in `rescue in rescue in deconfigure' /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.17.8/lib/openshift-origin-node/model/v2_cart_model.rb:374:in `rescue in deconfigure' /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.17.8/lib/openshift-origin-node/model/v2_cart_model.rb:367:in `deconfigure' /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.17.8/lib/openshift-origin-node/model/application_container_ext/cartridge_actions.rb:120:in `deconfigure' /opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:880:in `block in oo_deconfigure' /opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:299:in `with_container_from_args' /opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:879:in `oo_deconfigure' /opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:139:in `execute_action' /opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:104:in `cartridge_do_action' /opt/rh/ruby193/root/usr/share/ruby/mcollective/rpc/agent.rb:86:in `handlemsg' /opt/rh/ruby193/root/usr/share/ruby/mcollective/agents.rb:126:in `block (2 levels) in dispatch' /opt/rh/ruby193/root/usr/share/ruby/timeout.rb:69:in `timeout' /opt/rh/ruby193/root/usr/share/ruby/mcollective/agents.rb:125:in `block in dispatch' Dec 6 12:00:35 ex-med-node15 mcollectived[278959]: openshift.rb:150:in `execute_action' Finished executing action [deconfigure] (1) Dec 6 12:00:35 ex-med-node15 mcollectived[278959]: openshift.rb:114:in `cartridge_do_action' cartridge_do_action failed (1) ------ key not found: (tungsten, 2.0, _) ------) Dec 6 12:00:35 ex-med-node15 mcollectived[278959]: openshift.rb:1241:in `rescue in has_app_cartridge_action' Failed to get cartridge 'continuent-tungsten-2.0' from in gear 52a17a675973cab1da00001f: Cartridge directory not found for continuent-tungsten-2.0 Dec 6 12:00:35 ex-med-node15 mcollectived[278959]: openshift.rb:1242:in `rescue in has_app_cartridge_action' /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.17.8/lib/openshift-origin-node/model/v2_cart_model.rb:149:in `rescue in get_cartridge' /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.17.8/lib/openshift-origin-node/model/v2_cart_model.rb:142:in `get_cartridge' /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.17.8/lib/openshift-origin-node/model/application_container.rb:604:in `get_cartridge' /opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:1236:in `has_app_cartridge_action' /opt/rh/ruby193/root/usr/share/ruby/mcollective/rpc/agent.rb:86:in `handlemsg' /opt/rh/ruby193/root/usr/share/ruby/mcollective/agents.rb:126:in `block (2 levels) in dispatch' /opt/rh/ruby193/root/usr/share/ruby/timeout.rb:69:in `timeout' /opt/rh/ruby193/root/usr/share/ruby/mcollective/agents.rb:125:in `block in dispatch' Expected results: deconfigure should succeed, even if the cartridge directory no longer exists in the gear. Additional info: It looks like the code in deconfigure in openshift-origin-node/model/v2_cart_model.rb might just need one more begin/rescue that simply returns if the cartridge repository lookup fails and all other methods to locate the cartridge have failed already. It's possible that this will leave some cruft, but it's the best we can do.
Andy Grimm will be submitting a patch for this bug.
https://github.com/openshift/origin-server/pull/4500 Node fix. Expecting PR from broker as well.
Fixed with --> https://github.com/openshift/origin-server/pull/4531
Commit pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/8a16cd487f4c5a8a7a29c9f5f7491de35cb3e2a5 Bug 1040113: Handling edge cases in cleaning up downloaded cart map Also, fixing a couple of minor issues
@Andy I was trying to verify this bug on this edge case when a download cart fails installation and deconfigure handle the exception when the download cart is removed from a gear. However, I was unable to reach to that scenario. I've added some bad commands in setup hooks of my download cart, and my app was deleted immediately after setup fails. Can you give me some detailed instructions on how to reach this scenario? Many thanks!
Verified on devenv_4267 steps: 1. Add non-zero return code in setup hook in the dowload cart 2. Add some sleep time in v2_cart_model.rb in deconfigure method 3. Clear cache and restart mcollective 4. Create an app using this cart 5. As soon as deconfigure action is underway, stop broker 6. Delete the gears from node 7. Revert above changes, start broker and clear pending ops