Bug 972856 - Unable to list applications when all nodes of a given profile are offline
Summary: Unable to list applications when all nodes of a given profile are offline
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 2.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: chris alfonso
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-06-10 16:17 UTC by Andy Goldstein
Modified: 2017-03-08 17:35 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-08-15 15:05:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Andy Goldstein 2013-06-10 16:17:30 UTC
Description of problem: When all nodes of a given profile are offline (meaning MCollective can't communicate with them), commands such as "rhc apps" and trying to list a user's applications in the web console will fail.


Version-Release number of selected component (if applicable): rubygem-openshift-origin-controller-1.10.1-1.git.97.f50a498.el6op.noarch


How reproducible: 100%


Steps to Reproduce:
1. Configure a node to use whatever profile name you'd like (e.g. small is fine, or pick another one)
2. Create an application using that profile
3. Run "rhc apps" and verify you can see your app in the list
4. Turn off mcollective on the node
5. Run "rhc apps" again

Actual results: an error


Expected results: the list of applications


Additional info:
The broker attempts to get the base disk quota for each application's gear profile. If all nodes with that profile are offline, the operation will fail. See the below stack trace from the broker:

[Mon Jun 10 08:38:40 2013] [error] [client 127.0.0.1] Premature end of script headers: rest
[ pid=29297 thr=139794656430048 file=ext/apache2/Hooks.cpp:834 time=2013-06-10 08:38:40.542 ]: No data received from the backend application (process 27756) within 5000 msec. Either the backend app
lication is frozen, or your TimeOut value of 5 seconds is too low. Please check whether your application is frozen, or increase the value of the TimeOut configuration directive.
[ pid=27756 thr=7225480 file=utils.rb:176 time=2013-06-10 08:38:40.543 ]: *** Exception OpenShift::NodeException in application (No nodes found.) (process 27756, thread #<Thread:0x00000000dc8110>):
        from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-msg-broker-mcollective-1.10.1/lib/openshift/mcollective_application_container_proxy.rb:127:in `find_one_impl'
        from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/lib/openshift/application_container_proxy.rb:26:in `find_one'
        from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/models/gear.rb:42:in `block in base_filesystem_gb'
        from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/helpers/cache_helper.rb:24:in `get_cached'
        from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/models/gear.rb:41:in `base_filesystem_gb'
        from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/rest_models/rest_embedded_cartridge.rb:134:in `initialize'
        from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/helpers/rest_model_helper.rb:50:in `new'
        from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/helpers/rest_model_helper.rb:50:in `get_rest_cartridge'
        from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/helpers/rest_model_helper.rb:26:in `block (2 levels) in get_application_rest_cartridges'
        from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual/memory.rb:121:in `block in each'
        from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual/memory.rb:120:in `each'
        from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual/memory.rb:120:in `each'
        from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual.rb:18:in `each'
        from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/helpers/rest_model_helper.rb:22:in `block in get_application_rest_cartridges'
        from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/helpers/rest_model_helper.rb:20:in `each'
        from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/helpers/rest_model_helper.rb:20:in `get_application_rest_cartridges'
        from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/helpers/rest_model_helper.rb:11:in `get_rest_application'
        from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/controllers/applications_controller.rb:19:in `block in index'
        from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/relations/targets/enumerable.rb:442:in `map!'
        from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/relations/targets/enumerable.rb:442:in `method_missing'
        from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/relations/referenced/many.rb:395:in `method_missing'
        from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.10.1/app/controllers/applications_controller.rb:19:in `index'

Comment 3 chris alfonso 2013-06-12 19:27:16 UTC
The broker log file has:
Started GET "/broker/rest/domains/funzo/applications?include=cartridges" for 127.0.0.1 at 2013-06-12 12:24:50 -0700
Processing by ApplicationsController#index as JSON
  Parameters: {"include"=>"cartridges", "domain_id"=>"testdomain"}
    In /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.9.12/app/helpers/cartridge_cache.rb cartridges method:
      Error while querying cartridge list. This may be because no node hosts responded.
      Please ensure you have installed node hosts and they are responding to "mco ping".
      Exception was: #<OpenShift::NodeException: No nodes found.>

Comment 4 chris alfonso 2013-07-16 20:43:03 UTC
This seems to be fixed. I can list applications with rhc apps when the mcollective service is turned off. Are you still seeing the issue?

Comment 5 chris alfonso 2013-07-16 20:48:56 UTC
[root@broker ~]# rhc app create test ruby-1.9
Application Options
-------------------
  Namespace:  funzo
  Cartridges: ruby-1.9
  Gear Size:  default
  Scaling:    no

Creating application 'test' ... No nodes available.
[root@broker ~]# rhc apps
nj1 @ http://nj1-funzo.example.com/ (uuid: 51d429976892dfc286000031)
--------------------------------------------------------------------
  Created: Jul 03  6:39 AM
  Gears:   1 (defaults to small)
  Git URL: ssh://51d429976892dfc286000031.com/~/git/nj1.git/
  SSH:     51d429976892dfc286000031.com

  paypal-nodejs-nginx-1.0 (Node.js + Nginx)
  -----------------------------------------
    Gears: 1 small

You have 1 applications

[root@broker ~]# service mcollective status
mcollectived is stopped

Comment 6 chris alfonso 2013-07-16 21:00:31 UTC
If the gear size q_uota_blocks cache is populated, rhc apps will return the cached application data. If it's not in the cache, then we'll see the trace noted in the original report.

Comment 7 chris alfonso 2013-07-17 16:37:07 UTC
Rather than commit a fix for this specific error condition, we are going to wait for https://github.com/openshift/origin-server/pull/3078 to land and retest.

Comment 8 chris alfonso 2013-07-25 17:40:45 UTC
We are going to pull in the general error handling changes with our 2.0 rebase from origin since there are a lot of other related refacoring in the controller package. Since the broker cache handles this situation and you only see the error if all nodes are offline and the cache isn't populated, we don't think it's necessary to add the localized fix just for the rhc apps invocation.

Comment 9 Andy Goldstein 2013-07-25 19:34:52 UTC
I think it's reasonable to wait until 2.0

Comment 10 Brenton Leanhardt 2013-08-15 15:05:02 UTC
We'll pick this up with the next rebase.


Note You need to log in before you can comment on or make changes to this bug.