Bug 972856 - Unable to list applications when all nodes of a given profile are offline
Unable to list applications when all nodes of a given profile are offline
Product: OpenShift Container Platform
Classification: Red Hat
Component: Pod (Show other bugs)
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: chris alfonso
libra bugs
Depends On:
  Show dependency treegraph
Reported: 2013-06-10 12:17 EDT by Andy Goldstein
Modified: 2017-03-08 12 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2013-08-15 11:05:02 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Andy Goldstein 2013-06-10 12:17:30 EDT
Description of problem: When all nodes of a given profile are offline (meaning MCollective can't communicate with them), commands such as "rhc apps" and trying to list a user's applications in the web console will fail.

Version-Release number of selected component (if applicable): rubygem-openshift-origin-controller-1.10.1-1.git.97.f50a498.el6op.noarch

How reproducible: 100%

Steps to Reproduce:
1. Configure a node to use whatever profile name you'd like (e.g. small is fine, or pick another one)
2. Create an application using that profile
3. Run "rhc apps" and verify you can see your app in the list
4. Turn off mcollective on the node
5. Run "rhc apps" again

Actual results: an error

Expected results: the list of applications

Additional info:
The broker attempts to get the base disk quota for each application's gear profile. If all nodes with that profile are offline, the operation will fail. See the below stack trace from the broker:

[Mon Jun 10 08:38:40 2013] [error] [client] Premature end of script headers: rest
[ pid=29297 thr=139794656430048 file=ext/apache2/Hooks.cpp:834 time=2013-06-10 08:38:40.542 ]: No data received from the backend application (process 27756) within 5000 msec. Either the backend app
lication is frozen, or your TimeOut value of 5 seconds is too low. Please check whether your application is frozen, or increase the value of the TimeOut configuration directive.
[ pid=27756 thr=7225480 file=utils.rb:176 time=2013-06-10 08:38:40.543 ]: *** Exception OpenShift::NodeException in application (No nodes found.) (process 27756, thread #<Thread:0x00000000dc8110>):
        from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-msg-broker-mcollective-1.10.1/lib/openshift/mcollective_application_container_proxy.rb:127:in `find_one_impl'
        from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/lib/openshift/application_container_proxy.rb:26:in `find_one'
        from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/models/gear.rb:42:in `block in base_filesystem_gb'
        from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/helpers/cache_helper.rb:24:in `get_cached'
        from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/models/gear.rb:41:in `base_filesystem_gb'
        from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/rest_models/rest_embedded_cartridge.rb:134:in `initialize'
        from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/helpers/rest_model_helper.rb:50:in `new'
        from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/helpers/rest_model_helper.rb:50:in `get_rest_cartridge'
        from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/helpers/rest_model_helper.rb:26:in `block (2 levels) in get_application_rest_cartridges'
        from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual/memory.rb:121:in `block in each'
        from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual/memory.rb:120:in `each'
        from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual/memory.rb:120:in `each'
        from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual.rb:18:in `each'
        from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/helpers/rest_model_helper.rb:22:in `block in get_application_rest_cartridges'
        from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/helpers/rest_model_helper.rb:20:in `each'
        from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/helpers/rest_model_helper.rb:20:in `get_application_rest_cartridges'
        from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/helpers/rest_model_helper.rb:11:in `get_rest_application'
        from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/controllers/applications_controller.rb:19:in `block in index'
        from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/relations/targets/enumerable.rb:442:in `map!'
        from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/relations/targets/enumerable.rb:442:in `method_missing'
        from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/relations/referenced/many.rb:395:in `method_missing'
        from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/controllers/applications_controller.rb:19:in `index'
Comment 3 chris alfonso 2013-06-12 15:27:16 EDT
The broker log file has:
Started GET "/broker/rest/domains/funzo/applications?include=cartridges" for at 2013-06-12 12:24:50 -0700
Processing by ApplicationsController#index as JSON
  Parameters: {"include"=>"cartridges", "domain_id"=>"testdomain"}
    In /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.9.12/app/helpers/cartridge_cache.rb cartridges method:
      Error while querying cartridge list. This may be because no node hosts responded.
      Please ensure you have installed node hosts and they are responding to "mco ping".
      Exception was: #<OpenShift::NodeException: No nodes found.>
Comment 4 chris alfonso 2013-07-16 16:43:03 EDT
This seems to be fixed. I can list applications with rhc apps when the mcollective service is turned off. Are you still seeing the issue?
Comment 5 chris alfonso 2013-07-16 16:48:56 EDT
[root@broker ~]# rhc app create test ruby-1.9
Application Options
  Namespace:  funzo
  Cartridges: ruby-1.9
  Gear Size:  default
  Scaling:    no

Creating application 'test' ... No nodes available.
[root@broker ~]# rhc apps
nj1 @ http://nj1-funzo.example.com/ (uuid: 51d429976892dfc286000031)
  Created: Jul 03  6:39 AM
  Gears:   1 (defaults to small)
  Git URL: ssh://51d429976892dfc286000031@nj1-funzo.example.com/~/git/nj1.git/
  SSH:     51d429976892dfc286000031@nj1-funzo.example.com

  paypal-nodejs-nginx-1.0 (Node.js + Nginx)
    Gears: 1 small

You have 1 applications

[root@broker ~]# service mcollective status
mcollectived is stopped
Comment 6 chris alfonso 2013-07-16 17:00:31 EDT
If the gear size q_uota_blocks cache is populated, rhc apps will return the cached application data. If it's not in the cache, then we'll see the trace noted in the original report.
Comment 7 chris alfonso 2013-07-17 12:37:07 EDT
Rather than commit a fix for this specific error condition, we are going to wait for https://github.com/openshift/origin-server/pull/3078 to land and retest.
Comment 8 chris alfonso 2013-07-25 13:40:45 EDT
We are going to pull in the general error handling changes with our 2.0 rebase from origin since there are a lot of other related refacoring in the controller package. Since the broker cache handles this situation and you only see the error if all nodes are offline and the cache isn't populated, we don't think it's necessary to add the localized fix just for the rhc apps invocation.
Comment 9 Andy Goldstein 2013-07-25 15:34:52 EDT
I think it's reasonable to wait until 2.0
Comment 10 Brenton Leanhardt 2013-08-15 11:05:02 EDT
We'll pick this up with the next rebase.

Note You need to log in before you can comment on or make changes to this bug.