Description of problem: If an application requires a feature that is not provided by any cartridge, then the REST API returns a generic "The server did not respond correctly" error message and the broker logs a backtrace. This situation can arise when a cartridge on which an application relies is deleted without first deleting the application. If the user later tries to show, delete, or perform other operations on the application, the user gets the aforementioned unhelpful error message, and the administrator gets the aforementioned unhelpful backtrace. This may also happen if the OpenShift administrator changes the nodes from v1 cartridges to v2 cartridges while applications are installed. Version-Release number of selected component (if applicable): origin-server master. How reproducible: Readily. Steps to Reproduce: 1. Install an OpenShift PaaS with one or more cartridges installed on the node host or hosts—e.g., openshift-origin-cartridge-python. 2. Create an application that uses one of the installed cartridges—e.g., `rhc app create testapp python`. 3. Delete the cartridge from all nodes—e.g., `oo-admin-cartridge -a erase -n python -v 2.6 -c 0.0.1`—and ensure that stale caches are flushed using /etc/cron.minutely/openshift-facter on the node host or hosts and `oo-admin-cartridge -a erase -n python -v 2.6 -c 0.0.1` on the broker host or hosts. 4. Try to do something with the application—e.g., `rhc app delete testapp`. Actual results: At Step 4, the user sees the following: $ rhc app delete testapp --confirm The server did not respond correctly. This may be an issue with the server configuration or with your connection to the server (such as a Web proxy or firewall). Please verify that you can access the OpenShift server https://broker.example.com/broker/rest/domains/ose/applications/testapp The administrator will see the following in /var/log/openshift/broker/httpd/error_log: [Mon Aug 05 17:31:14 2013] [error] [client 127.0.0.1] Premature end of script headers: rest [ pid=1840 thr=140635072976864 file=ext/apache2/Hooks.cpp:841 time=2013-08-05 17:31:14.438 ]: The backend application (process 2144) did not send a valid HTTP response; instead, it sent nothing at all. It is possible that it has crashed; please check whether there are crashing bugs in this application. [ pid=2144 thr=16812160 file=utils.rb:176 time=2013-08-05 17:31:14.438 ]: *** Exception NoMethodError in application (undefined method `categories' for nil:NilClass) (process 2144, thread #<Thread:0x00000002011100>): from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.9.14/app/rest_models/rest_application.rb:82:in `block in initialize' from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.9.14/app/rest_models/rest_application.rb:80:in `each' from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.9.14/app/rest_models/rest_application.rb:80:in `initialize' from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.9.14/app/helpers/rest_model_helper.rb:8:in `new' from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.9.14/app/helpers/rest_model_helper.rb:8:in `get_rest_application' from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.9.14/app/controllers/applications_controller.rb:37:in `show' from /opt/rh/ruby193/root/usr/share/gems/gems/actionpack-3.2.8/lib/action_controller/metal/implicit_render.rb:4:in `send_action' (etc.) Expected results: At Step 4, the user should see a more helpful error message: $ rhc app delete testapp --confirm Unable to retrieve application testapp because it uses python-2.6, which is not provided by any installed cartridge The administrator should likewise see a moderately more indicative error message in /var/log/openshift/broker/httpd/error_log.
Suggested fix: https://github.com/openshift/origin-server/pull/3298
The fix looks good.
Actually, I'm rewriting it because with recent changes, the steps to reproduce hit the same problem in RestGearGroup or RestGearGroup15 instead of RestApplication, and because Clayton advised me to rescue the UnfulfilledRequirementException in OpenShift::Controller::ApiResponse#render_exception instead of in ApplicationsController#show.
Unfortunately, there's a lot of code duplication surrounding this code. Would it be best to check whether CartridgeCache.find_cartridge returns nil in RestApplication, RestApplication10, RestApplication13, RestApplication15, RestGearGroup, and RestGearGroup15? It feels stupid to have to make this same change in many places, but I don't think we want to modify find_cartridge to raise the exception, and the alternative would be to create yet another wrapper around find_cartridge.
Probably the best solution is to create a new method in CartridgeCache to throw an exception if cart is nil and then call it instead of CartridgeCache.find_cartridge(feature, app) in all the places where it is not acceptable for cart to be nil
I implemented the fix described in Comment 4 (checking the return value of find_cartridge in all the relevant models), but I cannot test it because of bug 998026.
Pull request updated per comment 5.
Assigning to mmasters since he has been working on it.
Commit pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/f25fd33696d84debc0811eba94e1d46c00a99d3a Add and use find_cartridge_or_raise_exception Add OpenShift::CartridgeCache::find_cartridge_or_raise_exception, which wraps find_cartridge and raises OpenShift::OOException if no cartridge provides the requested feature. In RestApplication#initialize, RestApplication10#initialize, RestApplication13#initialize, RestApplication15#initialize, RestGearGroup#initialize, and RestGearGroup15#initialize, use find_cartridge_or_raise_exception instead of find_cartridge in place of repetitive error checking and to perform the error checking in more places. This commit is related to bug 993440.
I updated and merged the pull request. In fact, this bug was fixed by agupta's commit https://github.com/openshift/origin-server/commit/28e5454ab7477c36360fe0e68d26ca7724ff503a to fix bug 1005007 and bug 1006526, which was merged while my pull request was waiting for review. I subsequently got a review from agupta for my pull request and have merged the changes in my pull request that weren't superseded by his changes. In any case, this bug is fixed now.
Tried on devenv_4073 and can not reproduce this issue, so verify this issue.
(In reply to weiwei jiang from comment #11) > Tried on devenv_4073 and can not reproduce this issue, so verify this issue. Tried on devenv_4173 and can not reproduce this issue, so verify this issue.
Back to ON_QA to re-test or providing the verifying steps.
Tried on latest origin, and can not reproduce this issue, so verified this one. 1. setup a origin devenv 2. create a app for example: python-2.7 3. oo-admin-cartridge -l 4. erase cartridge with oo-admin-cartridge -c erase -n python -v 0.0.8 5. check if the app can be control without error.