Description of problem: When all nodes of a given profile are offline (meaning MCollective can't communicate with them), commands such as "rhc apps" and trying to list a user's applications in the web console will fail. Version-Release number of selected component (if applicable): rubygem-openshift-origin-controller-1.10.1-1.git.97.f50a498.el6op.noarch How reproducible: 100% Steps to Reproduce: 1. Configure a node to use whatever profile name you'd like (e.g. small is fine, or pick another one) 2. Create an application using that profile 3. Run "rhc apps" and verify you can see your app in the list 4. Turn off mcollective on the node 5. Run "rhc apps" again Actual results: an error Expected results: the list of applications Additional info: The broker attempts to get the base disk quota for each application's gear profile. If all nodes with that profile are offline, the operation will fail. See the below stack trace from the broker: [Mon Jun 10 08:38:40 2013] [error] [client 127.0.0.1] Premature end of script headers: rest [ pid=29297 thr=139794656430048 file=ext/apache2/Hooks.cpp:834 time=2013-06-10 08:38:40.542 ]: No data received from the backend application (process 27756) within 5000 msec. Either the backend app lication is frozen, or your TimeOut value of 5 seconds is too low. Please check whether your application is frozen, or increase the value of the TimeOut configuration directive. [ pid=27756 thr=7225480 file=utils.rb:176 time=2013-06-10 08:38:40.543 ]: *** Exception OpenShift::NodeException in application (No nodes found.) (process 27756, thread #<Thread:0x00000000dc8110>): from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-msg-broker-mcollective-1.10.1/lib/openshift/mcollective_application_container_proxy.rb:127:in `find_one_impl' from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/lib/openshift/application_container_proxy.rb:26:in `find_one' from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/models/gear.rb:42:in `block in base_filesystem_gb' from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/helpers/cache_helper.rb:24:in `get_cached' from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/models/gear.rb:41:in `base_filesystem_gb' from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/rest_models/rest_embedded_cartridge.rb:134:in `initialize' from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/helpers/rest_model_helper.rb:50:in `new' from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/helpers/rest_model_helper.rb:50:in `get_rest_cartridge' from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/helpers/rest_model_helper.rb:26:in `block (2 levels) in get_application_rest_cartridges' from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual/memory.rb:121:in `block in each' from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual/memory.rb:120:in `each' from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual/memory.rb:120:in `each' from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual.rb:18:in `each' from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/helpers/rest_model_helper.rb:22:in `block in get_application_rest_cartridges' from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/helpers/rest_model_helper.rb:20:in `each' from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/helpers/rest_model_helper.rb:20:in `get_application_rest_cartridges' from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/helpers/rest_model_helper.rb:11:in `get_rest_application' from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/controllers/applications_controller.rb:19:in `block in index' from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/relations/targets/enumerable.rb:442:in `map!' from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/relations/targets/enumerable.rb:442:in `method_missing' from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/relations/referenced/many.rb:395:in `method_missing' from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/controllers/applications_controller.rb:19:in `index'
The broker log file has: Started GET "/broker/rest/domains/funzo/applications?include=cartridges" for 127.0.0.1 at 2013-06-12 12:24:50 -0700 Processing by ApplicationsController#index as JSON Parameters: {"include"=>"cartridges", "domain_id"=>"testdomain"} In /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.9.12/app/helpers/cartridge_cache.rb cartridges method: Error while querying cartridge list. This may be because no node hosts responded. Please ensure you have installed node hosts and they are responding to "mco ping". Exception was: #<OpenShift::NodeException: No nodes found.>
This seems to be fixed. I can list applications with rhc apps when the mcollective service is turned off. Are you still seeing the issue?
[root@broker ~]# rhc app create test ruby-1.9 Application Options ------------------- Namespace: funzo Cartridges: ruby-1.9 Gear Size: default Scaling: no Creating application 'test' ... No nodes available. [root@broker ~]# rhc apps nj1 @ http://nj1-funzo.example.com/ (uuid: 51d429976892dfc286000031) -------------------------------------------------------------------- Created: Jul 03 6:39 AM Gears: 1 (defaults to small) Git URL: ssh://51d429976892dfc286000031.com/~/git/nj1.git/ SSH: 51d429976892dfc286000031.com paypal-nodejs-nginx-1.0 (Node.js + Nginx) ----------------------------------------- Gears: 1 small You have 1 applications [root@broker ~]# service mcollective status mcollectived is stopped
If the gear size q_uota_blocks cache is populated, rhc apps will return the cached application data. If it's not in the cache, then we'll see the trace noted in the original report.
Rather than commit a fix for this specific error condition, we are going to wait for https://github.com/openshift/origin-server/pull/3078 to land and retest.
We are going to pull in the general error handling changes with our 2.0 rebase from origin since there are a lot of other related refacoring in the controller package. Since the broker cache handles this situation and you only see the error if all nodes are offline and the cache isn't populated, we don't think it's necessary to add the localized fix just for the rhc apps invocation.
I think it's reasonable to wait until 2.0
We'll pick this up with the next rebase.