Bug 993440

Summary: REST API gives an undescriptive error and logs a backtrace if an application depends on a feature that no cartridge provides
Product: OKD Reporter: Miciah Dashiel Butler Masters <mmasters>
Component: MasterAssignee: Miciah Dashiel Butler Masters <mmasters>
Status: CLOSED CURRENTRELEASE QA Contact: libra bugs <libra-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 2.xCC: wjiang, xtian
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-30 00:46:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Miciah Dashiel Butler Masters 2013-08-05 23:52:24 UTC
Description of problem:

If an application requires a feature that is not provided by any cartridge, then the REST API returns a generic "The server did not respond correctly" error message and the broker logs a backtrace.

This situation can arise when a cartridge on which an application relies is deleted without first deleting the application.  If the user later tries to show, delete, or perform other operations on the application, the user gets the aforementioned unhelpful error message, and the administrator gets the aforementioned unhelpful backtrace.

This may also happen if the OpenShift administrator changes the nodes from v1 cartridges to v2 cartridges while applications are installed.


Version-Release number of selected component (if applicable):

origin-server master.


How reproducible:

Readily.


Steps to Reproduce:

1. Install an OpenShift PaaS with one or more cartridges installed on the node host or hosts—e.g., openshift-origin-cartridge-python.

2. Create an application that uses one of the installed cartridges—e.g., `rhc app create testapp python`.

3. Delete the cartridge from all nodes—e.g., `oo-admin-cartridge -a erase -n python -v 2.6 -c 0.0.1`—and ensure that stale caches are flushed using /etc/cron.minutely/openshift-facter on the node host or hosts and `oo-admin-cartridge -a erase -n python -v 2.6 -c 0.0.1` on the broker host or hosts.

4. Try to do something with the application—e.g., `rhc app delete testapp`.


Actual results:

At Step 4, the user sees the following:

    $ rhc app delete testapp --confirm
    The server did not respond correctly. This may be an issue with the server configuration or with your connection to the server (such as a Web proxy or firewall). Please verify that you can access the OpenShift server https://broker.example.com/broker/rest/domains/ose/applications/testapp

The administrator will see the following in /var/log/openshift/broker/httpd/error_log:

    [Mon Aug 05 17:31:14 2013] [error] [client 127.0.0.1] Premature end of script headers: rest
    [ pid=1840 thr=140635072976864 file=ext/apache2/Hooks.cpp:841 time=2013-08-05 17:31:14.438 ]: The backend application (process 2144) did not send a valid HTTP response; instead, it sent nothing at all. It is possible that it has crashed; please check whether there are crashing bugs in this application.
    [ pid=2144 thr=16812160 file=utils.rb:176 time=2013-08-05 17:31:14.438 ]: *** Exception NoMethodError in application (undefined method `categories' for nil:NilClass) (process 2144, thread #<Thread:0x00000002011100>):
            from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.9.14/app/rest_models/rest_application.rb:82:in `block in initialize'
            from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.9.14/app/rest_models/rest_application.rb:80:in `each'
            from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.9.14/app/rest_models/rest_application.rb:80:in `initialize'
            from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.9.14/app/helpers/rest_model_helper.rb:8:in `new'
            from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.9.14/app/helpers/rest_model_helper.rb:8:in `get_rest_application'
            from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.9.14/app/controllers/applications_controller.rb:37:in `show'
            from /opt/rh/ruby193/root/usr/share/gems/gems/actionpack-3.2.8/lib/action_controller/metal/implicit_render.rb:4:in `send_action'
(etc.)


Expected results:

At Step 4, the user should see a more helpful error message:

    $ rhc app delete testapp --confirm
    Unable to retrieve application testapp because it uses python-2.6, which is not provided by any installed cartridge

The administrator should likewise see a moderately more indicative error message in /var/log/openshift/broker/httpd/error_log.

Comment 1 Miciah Dashiel Butler Masters 2013-08-05 23:54:02 UTC
Suggested fix:   https://github.com/openshift/origin-server/pull/3298

Comment 2 Lili Nader 2013-08-16 17:44:02 UTC
The fix looks good.

Comment 3 Miciah Dashiel Butler Masters 2013-08-16 17:47:09 UTC
Actually, I'm rewriting it because with recent changes, the steps to reproduce hit the same problem in RestGearGroup or RestGearGroup15 instead of RestApplication, and because Clayton advised me to rescue the UnfulfilledRequirementException in OpenShift::Controller::ApiResponse#render_exception instead of in ApplicationsController#show.

Comment 4 Miciah Dashiel Butler Masters 2013-08-16 17:56:49 UTC
Unfortunately, there's a lot of code duplication surrounding this code.  Would it be best to check whether CartridgeCache.find_cartridge returns nil in RestApplication, RestApplication10, RestApplication13, RestApplication15, RestGearGroup, and RestGearGroup15? It feels stupid to have to make this same change in many places, but I don't think we want to modify find_cartridge to raise the exception, and the alternative would be to create yet another wrapper around find_cartridge.

Comment 5 Lili Nader 2013-08-16 18:50:37 UTC
Probably the best solution is to create a new method in CartridgeCache to throw an exception if cart is nil and then call it instead of CartridgeCache.find_cartridge(feature, app) in all the places where it is not acceptable for cart to be nil

Comment 6 Miciah Dashiel Butler Masters 2013-08-16 18:51:50 UTC
I implemented the fix described in Comment 4 (checking the return value of find_cartridge in all the relevant models), but I cannot test it because of bug 998026.

Comment 7 Miciah Dashiel Butler Masters 2013-08-23 18:34:33 UTC
Pull request updated per comment 5.

Comment 8 Lili Nader 2013-10-01 05:09:45 UTC
Assigning to mmasters since he has been working on it.

Comment 9 openshift-github-bot 2013-10-03 01:42:22 UTC
Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/f25fd33696d84debc0811eba94e1d46c00a99d3a
Add and use find_cartridge_or_raise_exception

Add OpenShift::CartridgeCache::find_cartridge_or_raise_exception, which
wraps find_cartridge and raises OpenShift::OOException if no cartridge
provides the requested feature.

In RestApplication#initialize, RestApplication10#initialize,
RestApplication13#initialize, RestApplication15#initialize,
RestGearGroup#initialize, and RestGearGroup15#initialize, use
find_cartridge_or_raise_exception instead of find_cartridge in place of
repetitive error checking and to perform the error checking in more places.

This commit is related to bug 993440.

Comment 10 Miciah Dashiel Butler Masters 2013-10-03 05:10:34 UTC
I updated and merged the pull request.

In fact, this bug was fixed by agupta's commit https://github.com/openshift/origin-server/commit/28e5454ab7477c36360fe0e68d26ca7724ff503a to fix bug 1005007 and bug 1006526, which was merged while my pull request was waiting for review.  I subsequently got a review from agupta for my pull request and have merged the changes in my pull request that weren't superseded by his changes.  In any case, this bug is fixed now.

Comment 11 weiwei jiang 2013-12-31 05:20:44 UTC
Tried on devenv_4073 and can not reproduce this issue, so verify this issue.

Comment 12 weiwei jiang 2013-12-31 05:24:19 UTC
(In reply to weiwei jiang from comment #11)
> Tried on devenv_4073 and can not reproduce this issue, so verify this issue.
Tried on devenv_4173 and can not reproduce this issue, so verify this issue.

Comment 13 Xiaoli Tian 2013-12-31 08:42:04 UTC
Back to ON_QA to re-test or providing the verifying steps.

Comment 14 weiwei jiang 2014-01-02 11:02:26 UTC
Tried on latest origin, and can not reproduce this issue, so verified this one.

1. setup a origin devenv
2. create a app for example: python-2.7
3. oo-admin-cartridge -l
4. erase cartridge with oo-admin-cartridge -c erase -n python -v 0.0.8
5. check if the app can be control without error.