Bug 872024 - Cartridge list doesn't update after installing new cartridge on node
Summary: Cartridge list doesn't update after installing new cartridge on node
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Documentation
Version: 1.0.0
Hardware: x86_64
OS: Linux
low
low
Target Milestone: ---
: ---
Assignee: Alex Dellapenta
QA Contact: ecs-bugs
URL:
Whiteboard:
Depends On: 867609
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-11-01 03:04 UTC by Johnny Liu
Modified: 2015-07-20 00:22 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 867609
Environment:
Last Closed: 2013-03-15 17:58:57 UTC
Target Upstream Version:
Embargoed:
anross: needinfo+


Attachments (Terms of Use)

Description Johnny Liu 2012-11-01 03:04:37 UTC
+++ This bug was initially created as a clone of Bug #867609 +++

Description of problem:

The embedded cartridge list from "rhc app cartridge list" doesn't show a new cartridge after adding a cartridge rpm on a node.


Version-Release number of selected component (if applicable):

node:

stickshift-abstract-0.16.1-1.el6_3.noarch
rubygem-stickshift-common-0.15.2-1.el6_3.noarch
stickshift-mcollective-agent-0.3.1-1.el6_3.noarch
rubygem-stickshift-node-0.16.2-1.el6_3.noarch

broker:

rubygem-stickshift-controller-0.16.1-1.el6_3.noarch
stickshift-broker-0.6.8-1.el6_3a.noarch

How reproducible:


Steps to Reproduce:
1. Install a node with minimal set of cartridges.  e.g. php-5.3 and cron-1.4
2. Run "rhc app cartridge list" on a client, which should return only cron-1.4
3. yum install another cartridge on the node.  e.g. cartridge-mysql-5.1
4. Run "rhc app cartridge list" on a client

I ran into this in a single broker, single node installation.
  
Actual results:

The cartridge list in #4 only shows cron-1.4.  ss-cartridge-list on the node shows php-5.3, cron-1.4, and mysql-5.1.

Expected results:

List of supported embedded cartridges:

Obtaining list of cartridges (please excuse the delay)...
cron-1.4, mysql-5.1

Additional info:

In my case, this turned out to be caching problem on the broker.  I manually cleared the cache via a rails dev console on the broker, and the rhc cartridge list immediately added mysql-5.1.  Is it possible to expire the broker's cache when a new cartridge is registered?

--- Additional comment from dmcphers on 2012-10-17 17:09:46 EDT ---

This is currently as designed.  You have to clear the cache to get new data picked up immediately.  We could provide a command to run that would clear the cache but that doesn't seem much different than manually clearing the cache.

Is your complaint really that it wasn't obvious your needed to clear the cache?

--- Additional comment from john on 2012-10-17 18:10:15 EDT ---

Near term, a note in the docs about the need to flush cache on the broker would be great.  Long term, it seems like the cache should expire or something should trigger a flush when a new cartridge is available.  There are probably other state change conditions that should trigger a flush, as well.

After installing the mysql cartridge, I restarted httpd on the broker, restarted services on the node, removed and reinstalled the mysql cartridge a few times, rebooted the node, rebooted the broker, rebooted my client, and ended up leaving the entire thing overnight.  It still didn't show up the next morning.  The command "rhc app cartridge add -a testphpapp -c mysql-5.1" failed with an invalid type error.

I guess requiring a manual cache flush isn't any more deus ex machina than most apps require.  It would just be nice for the software to self manage.

--- Additional comment from xjia on 2012-10-22 03:26:08 EDT ---

I have one node and one broker, and don't install any node packages on broker.

Firstly, I install all the cartridge packages except jbosseap and jbossews. Then I create one php application. Thirdly I install jbosseap and jbossews cartridge on node. Modify "config.action_controller.perform_caching = false" in the file "/var/www/openshift/broker/config/environments/development.rb", then restart openshift-broker service. Execute "rhc setup", it show that "Connection to server timed out. It is possible the operation finished without being able to report success. Use 'rhc domain show' or 'rhc app status' to check the status of your applications"

Version-Release number of selected component (if applicable):
http://buildvm-devops.usersys.redhat.com/puddle/build/OpenShiftEnterprise/Beta/2012-10-19.4/

How reproducible:
Always

Steps to Reproduce:
1.First install all kinds of cartridge except jbosseap and jbossews cartridge
2.Create one php cartridge successfully
3.Install jbosseap and jbossews cartridge on node
4.Modify "config.action_controller.perform_caching = false" in the file "/var/www/openshift/broker/config/environments/development.rb"
5.Restart Openshift-broker service
6.execute "rhc setup"
7.Modify "config.action_controller.perform_caching = true" in the file "/var/www/openshift/broker/config/environments/development.rb", and restart openshift-broker service
8.execute "rhc setup"
  
Actual results:
6. Show message "Connection to server timed out. It is possible the operation finished without being able to report success. Use 'rhc domain show' or 'rhc app status' to check the status of your applications."

8.Can't see the jbosseap and jbossews cartridge in the list.

Expected results:
6. Should success and the cartridge list should contain jbossews and jbosseap
8. The cartridge list should contain jbossews and jbosseap

Additional info:
Maybe I misunderstand something about it, I hope you can help me resolve this problem.

The log in development.log
<--snip-->
Started GET "/broker/rest/cartridges" for 10.66.65.143 at Mon Oct 22 02:48:17 -0400 2012
  Processing by CartridgesController#index as JSON
CURRENT SERVER: node0.example.com
DEBUG: find_one_impl: current_server: node0.example.com
DEBUG: rpc_exec_direct: rpc_client=#<MCollective::RPC::Client:0x7fee4bf45ba0>
DEBUG: rpc_client.custom_request('cartridge_do', {:action=>"cartridge-list", :cartridge=>"openshift-origin-node", :args=>{"--with-descriptors"=>true, "--porcelain"=>true}}, @id, {'identity' => @id})
CURRENT SERVER: node0.example.com
DEBUG: find_one_impl: current_server: node0.example.com
DEBUG: rpc_exec_direct: rpc_client=#<MCollective::RPC::Client:0x7fee4bf64ed8>
DEBUG: rpc_client.custom_request('cartridge_do', {:action=>"cartridge-list", :cartridge=>"openshift-origin-node", :args=>{"--with-descriptors"=>true, "--porcelain"=>true}}, @id, {'identity' => @id})
CURRENT SERVER: node0.example.com
DEBUG: find_one_impl: current_server: node0.example.com
DEBUG: rpc_exec_direct: rpc_client=#<MCollective::RPC::Client:0x7fee4bf893a0>
DEBUG: rpc_client.custom_request('cartridge_do', {:action=>"cartridge-list", :cartridge=>"openshift-origin-node", :args=>{"--with-descriptors"=>true, "--porcelain"=>true}}, @id, {'identity' => @id})
CURRENT SERVER: node0.example.com
DEBUG: find_one_impl: current_server: node0.example.com
DEBUG: rpc_exec_direct: rpc_client=#<MCollective::RPC::Client:0x7fee4bfa5758>
DEBUG: rpc_client.custom_request('cartridge_do', {:action=>"cartridge-list", :cartridge=>"openshift-origin-node", :args=>{"--with-descriptors"=>true, "--porcelain"=>true}}, @id, {'identity' => @id})
CURRENT SERVER: node0.example.com
DEBUG: find_one_impl: current_server: node0.example.com
DEBUG: rpc_exec_direct: rpc_client=#<MCollective::RPC::Client:0x7fee4bfc3460>
DEBUG: rpc_client.custom_request('cartridge_do', {:action=>"cartridge-list", :cartridge=>"openshift-origin-node", :args=>{"--with-descriptors"=>true, "--porcelain"=>true}}, @id, {'identity' => @id})
CURRENT SERVER: node0.example.com
DEBUG: find_one_impl: current_server: node0.example.com
DEBUG: rpc_exec_direct: rpc_client=#<MCollective::RPC::Client:0x7fee4bfdc848>
DEBUG: rpc_client.custom_request('cartridge_do', {:action=>"cartridge-list", :cartridge=>"openshift-origin-node", :args=>{"--with-descriptors"=>true, "--porcelain"=>true}}, @id, {'identity' => @id})
CURRENT SERVER: node0.example.com
DEBUG: find_one_impl: current_server: node0.example.com
DEBUG: rpc_exec_direct: rpc_client=#<MCollective::RPC::Client:0x7fee4bffb608>
DEBUG: rpc_client.custom_request('cartridge_do', {:action=>"cartridge-list", :cartridge=>"openshift-origin-node", :args=>{"--with-descriptors"=>true, "--porcelain"=>true}}, @id, {'identity' => @id})
CURRENT SERVER: node0.example.com
DEBUG: find_one_impl: current_server: node0.example.com
DEBUG: rpc_exec_direct: rpc_client=#<MCollective::RPC::Client:0x7fee4c0200c0>
DEBUG: rpc_client.custom_request('cartridge_do', {:action=>"cartridge-list", :cartridge=>"openshift-origin-node", :args=>{"--with-descriptors"=>true, "--porcelain"=>true}}, @id, {'identity' => @id})
CURRENT SERVER: node0.example.com
DEBUG: find_one_impl: current_server: node0.example.com
DEBUG: rpc_exec_direct: rpc_client=#<MCollective::RPC::Client:0x7fee4c0363c0>
DEBUG: rpc_client.custom_request('cartridge_do', {:action=>"cartridge-list", :cartridge=>"openshift-origin-node", :args=>{"--with-descriptors"=>true, "--porcelain"=>true}}, @id, {'identity' => @id})
CURRENT SERVER: node0.example.com
DEBUG: find_one_impl: current_server: node0.example.com
DEBUG: rpc_exec_direct: rpc_client=#<MCollective::RPC::Client:0x7fee4c052a48>
DEBUG: rpc_client.custom_request('cartridge_do', {:action=>"cartridge-list", :cartridge=>"openshift-origin-node", :args=>{"--with-descriptors"=>true, "--porcelain"=>true}}, @id, {'identity' => @id})
CURRENT SERVER: node0.example.com
DEBUG: find_one_impl: current_server: node0.example.com
DEBUG: rpc_exec_direct: rpc_client=#<MCollective::RPC::Client:0x7fee4c069360>
DEBUG: rpc_client.custom_request('cartridge_do', {:action=>"cartridge-list", :cartridge=>"openshift-origin-node", :args=>{"--with-descriptors"=>true, "--porcelain"=>true}}, @id, {'identity' => @id})
CURRENT SERVER: node0.example.com
<--snip-->

--- Additional comment from dmcphers on 2012-10-25 13:18:46 EDT ---

xjia,

  It sounds like your rhc tools aren't pointing to the right server.  Either way I don't want to track anything in this bug other than a request for auto clearing the cache.  Regarding your problem can you run the command with -d to see what server it's hitting and check the apache access/error logs to see if the request is even getting through.  If you are pointing to the right location and the logs don't tell you what's wrong please open a new bug for your issue.

Thanks,

Dan

--- Additional comment from jialiu on 2012-10-31 03:15:57 EDT ---

[root@broker broker]# pwd
/var/www/openshift/broker
[root@broker broker]# bundle exec rake tmp:clear

Then the cartridge list case will be cleared. This functionality is also necessary to OSE. I thinks this should be added into openshift-broker service to clear rails cache automatically.

Comment 1 Johnny Liu 2012-11-01 03:08:23 UTC
This issue also exists in OSE env after retest it against 2012-10-30.7 puddle.

Discuss this issue with bleanhar by irc, this bug should be fixed in OSE 1.0, so I cloned this bug here and set "high" priority.

Comment 2 Brenton Leanhardt 2012-11-01 13:40:34 UTC
Unfortunately I don't think we're going to get to this before 1.0.  We'll likely ship it shortly after in an Errata.  We'll be sure to make the change upstream as well.  I'm going to lower the priority for now.

Comment 3 Brenton Leanhardt 2012-11-20 16:33:04 UTC
We're still working how we're going to solve this for our next release.

Comment 5 Brenton Leanhardt 2012-12-04 19:47:49 UTC
We're creating an R&D story for this.

Comment 6 Luke Meyer 2012-12-05 14:40:09 UTC
Clearing the cache in response to stale list of cartridges is mentioned in the Admin guide at https://access.redhat.com/knowledge/docs/en-US/OpenShift_Enterprise/1.0/html-single/Administration_Guide/index.html#sect-OpenShift_Enterprise-Administration_Guide-Clearing_Broker_Application_Cache

... which is admittedly rather obscure.

Some possible ways to partially address this problem soon:
1. Add something to the deployment and troubleshooting guides about this.
2. Have "service openshift-broker" clear the cache whenever started.
3. Have oo-accept-systems and/or oo-diagnostics check for cartridge list differing between cache and mcollective and notify to clear cache.

In the long term, the problem is that this is a cache, and by design can go stale. It's mainly a usability issue at install time, so it's particularly painful during evaluations, so we would like to address it. The R&D story is to explore ways to address it more permanently, e.g. lower the cache timeout, don't cache this,  node notify broker at cart installation time, use memcached instead of files for caching, etc.

Creating US3201 for the long-term story.

Comment 7 Luke Meyer 2013-02-05 17:02:05 UTC
oo-diagnostics and oo-accept-systems now test for this and recommend a fix.

Technical solutions may develop upstream or per US3201. 

Deployment manual already has https://access.redhat.com/knowledge/docs/en-US/OpenShift_Enterprise/1/html-single/Deployment_Guide/index.html#sect-OpenShift_Enterprise-Deployment_Guide-Continuing_Node_Installation_for_Enterprise-Cartridge_Availability

Converting this to a docs bug as all that is left is to mention this in troubleshooting manual.

Comment 10 Alex Dellapenta 2013-03-15 17:58:57 UTC
Recommended fix is live in the OSE Troubleshooting Guide as of Revision 1.1-1:

https://access.redhat.com/knowledge/docs/en-US/OpenShift_Enterprise/1/html/Troubleshooting_Guide/sect-OpenShift_Enterprise-Troubleshooting_Guide-Outdated_Cartridge_List.html

Per Luke in comment #7, "Technical solutions may develop upstream or per US3201," so closing this out CURRENTRELEASE as the docs update was the last remaining item.


Note You need to log in before you can comment on or make changes to this bug.