Bug 1153666 - NLTK doesn't work with Python 2.7 cartridge
Summary: NLTK doesn't work with Python 2.7 cartridge
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: ImageStreams
Version: 2.2.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Jason DeTiberus
QA Contact: libra bugs
URL:
Whiteboard:
Depends On: 1151494
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-10-16 13:42 UTC by Brenton Leanhardt
Modified: 2014-12-10 13:51 UTC (History)
8 users (show)

Fixed In Version: openshift-origin-cartridge-python-1.30.1.1-1
Doc Type: Bug Fix
Doc Text:
An update to Python 2.7 dependencies cause some dependencies using C extensions to not run properly when using the Python 2.7 cartridge, and applications using the cartridge could return an Internal Server Error. This bug fix updates the Python cartridge to set the WSGIApplicationGroupdirective to %{GLOBAL}, which forces a WSGI application to run within the first interpreter. As a result, applications using the cartridge are once again accessible. After applying this update, a cartridge upgrade is required.
Clone Of: 1151494
Environment:
Last Closed: 2014-12-10 13:24:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:1979 0 normal SHIPPED_LIVE Red Hat OpenShift Enterprise 2.2.2 bug fix and enhancement update 2014-12-10 18:23:46 UTC

Description Brenton Leanhardt 2014-10-16 13:42:14 UTC
+++ This bug was initially created as a clone of Bug #1151494 +++
Description of problem:
Since a month ago, my webservice stopped working and I discovered that the problem is the NLTK library on the Python 2.7. I tested to create a new webservice only importing the nltk library and the error occurs, but should not because worked a few months ago.

In the log of the application, I see the following message:

"Premature end of script headers: wsgi.py"

Version-Release number of selected component (if applicable):
Python 2.7
NLTK any version

How reproducible:
1 - Create an application with Python 2.7
2 - Import NLTK module in the application

Steps to Reproduce:
1.
2.
3.

Actual results:

Internal Server Error

The server encountered an internal error or misconfiguration and was unable to complete your request.

Please contact the server administrator, root@localhost and inform them of the time the error occurred, and anything you might have done that may have caused the error.

More information about this error may be available in the server error log.

Expected results:

Working.


Additional info:

--- Additional comment from Michal Fojtik on 2014-10-13 08:32:52 EDT ---

Junior: Can you please provide a sample code I can test this bug against?

I tried to reproduce this only with importing the nltk at the top of the wsgi file and got:


It seems like importing nltk triggers something that blocks execution of the request. I'm also unable to stop apache smoothly after I do first request.

I'm not python, neither nltk expert, but shouldn't you setup something first? Like download some sample text/library, etc.

--- Additional comment from Junior on 2014-10-13 09:47:45 EDT ---

Hello Michal, the nltk work perfectly on the python 2.6 only importing on the code. My webservice had to add the contents of the folder nltk_data but now the error happens as you said, if you open the site, the application freezes for no reason.

I know another person who is having the same problem, so I think it was an update that was made in python or in nltk what caused it.

Example of basic code. You also need commit the nltk_data folder together with wsgi.py and run this command on the rhc: rhc set-env NLTK_DATA=remote_directory_of_nltk_data_folder -a nameapp --namespace name_of_namespace

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os

virtenv = os.environ['APPDIR'] + '/virtenv/'
os.environ['PYTHON_EGG_CACHE'] = os.path.join(virtenv, 'lib/python2.6/site-packages')
virtualenv = os.path.join(virtenv, 'bin/activate_this.py')
try:
    execfile(virtualenv, dict(__file__=virtualenv))
except IOError:
    pass
#
# IMPORTANT: Put any additional includes below this line.  If placed above this
# line, it's possible required libraries won't be in your searchable path
# 
import nltk
import web

from nltk.tokenize import word_tokenize

tokenizer = WordPunctTokenizer()

urls = (
  '/', 'index'
)

render = web.template.render('app-root/repo/wsgi/templates/')

class index:
	
	def GET(self):
		tokenizer.tokenize('Hello World')
		return 'Hello World'

application = web.application(urls, globals()).wsgifunc()

#
# Below for testing only
#
if __name__ == '__main__':
	from wsgiref.simple_server import make_server
	httpd = make_server('localhost', 8051, application)
	# Wait for a single request, serve it and quit.
	httpd.handle_request()

--- Additional comment from Michal Fojtik on 2014-10-16 09:24:04 EDT ---

Junior, sorry for this taking that long. I'm currently testing this fix:

https://github.com/openshift/origin-server/pull/5879

It seems it fixes the problem with nltk, however, we are still evaluating if it does not break something else.

--- Additional comment from Michal Fojtik on 2014-10-16 09:34:18 EDT ---

Quote from documentation:

```
Forcing a WSGI application to run within the first interpreter can be necessary when a third party C extension module for Python has used the simplified threading API for manipulation of the Python GIL and thus will not run correctly within any additional sub interpreters created by Python. 
```

This explains the 'nltk' which do have C extension.

Comment 3 Gaoyun Pei 2014-11-25 06:36:40 UTC
Verify this bug on puddle 2.2/2014-11-24.3 with package openshift-origin-cartridge-python-1.30.1.1-1.el6op.noarch


1. Create a python-2.7 app
rhc app create py27 python-2.7

2. Create a directory named "nltk_data" in the app repo and set env var named "NLTK_DATA"
$ mkdir py27/nltk_data
$ rhc set-env NLTK_DATA=remote_directory_of_nltk_data_folder -a py27

3. Add 'nltk' and 'web.py' modules into setup.py file to install, like below:
from setuptools import setup

setup(name='YourAppName',
      version='1.0',
      description='OpenShift App',
      author='Your Name',
      author_email='example',
      url='http://www.python.org/sigs/distutils-sig/',
      install_requires=['nltk','web.py'],
     )

4. Overwrite the wsgi.py file as below:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os

virtenv = os.environ['APPDIR'] + '/virtenv/'
os.environ['PYTHON_EGG_CACHE'] = os.path.join(virtenv, 'lib/python2.6/site-packages')
virtualenv = os.path.join(virtenv, 'bin/activate_this.py')
try:
    execfile(virtualenv, dict(__file__=virtualenv))
except IOError:
    pass
#
# IMPORTANT: Put any additional includes below this line.  If placed above this
# line, it's possible required libraries won't be in your searchable path
# 
import nltk
import web

from nltk.tokenize import word_tokenize
from nltk.tokenize import WordPunctTokenizer

tokenizer = WordPunctTokenizer()

urls = (
  '/', 'index'
)

render = web.template.render('app-root/repo/wsgi/templates/')

class index:

        def GET(self):
                tokenizer.tokenize('Hello World')
                return 'Hello World'

application = web.application(urls, globals()).wsgifunc()

#
# Below for testing only
#
if __name__ == '__main__':
        from wsgiref.simple_server import make_server
        httpd = make_server('localhost', 8051, application)
        # Wait for a single request, serve it and quit.
        httpd.handle_request()

5. Perform git push
git add . && git commit -amp && git push

6. Access the app home page via browser

After step 6, the app can be accessed and get "Hello World" text.

Comment 5 errata-xmlrpc 2014-12-10 13:24:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2014-1979.html


Note You need to log in before you can comment on or make changes to this bug.