+++ This bug was initially created as a clone of Bug #1151494 +++ Description of problem: Since a month ago, my webservice stopped working and I discovered that the problem is the NLTK library on the Python 2.7. I tested to create a new webservice only importing the nltk library and the error occurs, but should not because worked a few months ago. In the log of the application, I see the following message: "Premature end of script headers: wsgi.py" Version-Release number of selected component (if applicable): Python 2.7 NLTK any version How reproducible: 1 - Create an application with Python 2.7 2 - Import NLTK module in the application Steps to Reproduce: 1. 2. 3. Actual results: Internal Server Error The server encountered an internal error or misconfiguration and was unable to complete your request. Please contact the server administrator, root@localhost and inform them of the time the error occurred, and anything you might have done that may have caused the error. More information about this error may be available in the server error log. Expected results: Working. Additional info: --- Additional comment from Michal Fojtik on 2014-10-13 08:32:52 EDT --- Junior: Can you please provide a sample code I can test this bug against? I tried to reproduce this only with importing the nltk at the top of the wsgi file and got: It seems like importing nltk triggers something that blocks execution of the request. I'm also unable to stop apache smoothly after I do first request. I'm not python, neither nltk expert, but shouldn't you setup something first? Like download some sample text/library, etc. --- Additional comment from Junior on 2014-10-13 09:47:45 EDT --- Hello Michal, the nltk work perfectly on the python 2.6 only importing on the code. My webservice had to add the contents of the folder nltk_data but now the error happens as you said, if you open the site, the application freezes for no reason. I know another person who is having the same problem, so I think it was an update that was made in python or in nltk what caused it. Example of basic code. You also need commit the nltk_data folder together with wsgi.py and run this command on the rhc: rhc set-env NLTK_DATA=remote_directory_of_nltk_data_folder -a nameapp --namespace name_of_namespace #!/usr/bin/env python # -*- coding: utf-8 -*- import os virtenv = os.environ['APPDIR'] + '/virtenv/' os.environ['PYTHON_EGG_CACHE'] = os.path.join(virtenv, 'lib/python2.6/site-packages') virtualenv = os.path.join(virtenv, 'bin/activate_this.py') try: execfile(virtualenv, dict(__file__=virtualenv)) except IOError: pass # # IMPORTANT: Put any additional includes below this line. If placed above this # line, it's possible required libraries won't be in your searchable path # import nltk import web from nltk.tokenize import word_tokenize tokenizer = WordPunctTokenizer() urls = ( '/', 'index' ) render = web.template.render('app-root/repo/wsgi/templates/') class index: def GET(self): tokenizer.tokenize('Hello World') return 'Hello World' application = web.application(urls, globals()).wsgifunc() # # Below for testing only # if __name__ == '__main__': from wsgiref.simple_server import make_server httpd = make_server('localhost', 8051, application) # Wait for a single request, serve it and quit. httpd.handle_request() --- Additional comment from Michal Fojtik on 2014-10-16 09:24:04 EDT --- Junior, sorry for this taking that long. I'm currently testing this fix: https://github.com/openshift/origin-server/pull/5879 It seems it fixes the problem with nltk, however, we are still evaluating if it does not break something else. --- Additional comment from Michal Fojtik on 2014-10-16 09:34:18 EDT --- Quote from documentation: ``` Forcing a WSGI application to run within the first interpreter can be necessary when a third party C extension module for Python has used the simplified threading API for manipulation of the Python GIL and thus will not run correctly within any additional sub interpreters created by Python. ``` This explains the 'nltk' which do have C extension.
Verify this bug on puddle 2.2/2014-11-24.3 with package openshift-origin-cartridge-python-1.30.1.1-1.el6op.noarch 1. Create a python-2.7 app rhc app create py27 python-2.7 2. Create a directory named "nltk_data" in the app repo and set env var named "NLTK_DATA" $ mkdir py27/nltk_data $ rhc set-env NLTK_DATA=remote_directory_of_nltk_data_folder -a py27 3. Add 'nltk' and 'web.py' modules into setup.py file to install, like below: from setuptools import setup setup(name='YourAppName', version='1.0', description='OpenShift App', author='Your Name', author_email='example', url='http://www.python.org/sigs/distutils-sig/', install_requires=['nltk','web.py'], ) 4. Overwrite the wsgi.py file as below: #!/usr/bin/env python # -*- coding: utf-8 -*- import os virtenv = os.environ['APPDIR'] + '/virtenv/' os.environ['PYTHON_EGG_CACHE'] = os.path.join(virtenv, 'lib/python2.6/site-packages') virtualenv = os.path.join(virtenv, 'bin/activate_this.py') try: execfile(virtualenv, dict(__file__=virtualenv)) except IOError: pass # # IMPORTANT: Put any additional includes below this line. If placed above this # line, it's possible required libraries won't be in your searchable path # import nltk import web from nltk.tokenize import word_tokenize from nltk.tokenize import WordPunctTokenizer tokenizer = WordPunctTokenizer() urls = ( '/', 'index' ) render = web.template.render('app-root/repo/wsgi/templates/') class index: def GET(self): tokenizer.tokenize('Hello World') return 'Hello World' application = web.application(urls, globals()).wsgifunc() # # Below for testing only # if __name__ == '__main__': from wsgiref.simple_server import make_server httpd = make_server('localhost', 8051, application) # Wait for a single request, serve it and quit. httpd.handle_request() 5. Perform git push git add . && git commit -amp && git push 6. Access the app home page via browser After step 6, the app can be accessed and get "Hello World" text.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2014-1979.html