Bug 1151494 - NLTK doesn't work with Python 2.7 cartridge
Summary: NLTK doesn't work with Python 2.7 cartridge
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Image
Version: 1.x
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Michal Fojtik
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks: 1153666
TreeView+ depends on / blocked
 
Reported: 2014-10-10 15:01 UTC by Junior
Modified: 2015-02-18 16:51 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1153666 (view as bug list)
Environment:
Last Closed: 2015-02-18 16:51:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Junior 2014-10-10 15:01:40 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Junior 2014-10-10 16:20:34 UTC
Sorry, I signed up a bug without description. Follows the description of the bug.

 
Description of problem:
Since a month ago, my webservice stopped working and I discovered that the problem is the NLTK library on the Python 2.7. I tested to create a new webservice only importing the nltk library and the error occurs, but should not because worked a few months ago.

In the log of the application, I see the following message:

"Premature end of script headers: wsgi.py"

Version-Release number of selected component (if applicable):
Python 2.7
NLTK any version

How reproducible:
1 - Create an application with Python 2.7
2 - Import NLTK module in the application

Steps to Reproduce:
1.
2.
3.

Actual results:

Internal Server Error

The server encountered an internal error or misconfiguration and was unable to complete your request.

Please contact the server administrator, root@localhost and inform them of the time the error occurred, and anything you might have done that may have caused the error.

More information about this error may be available in the server error log.

Expected results:

Working.


Additional info:

Comment 2 Michal Fojtik 2014-10-13 12:32:52 UTC
Junior: Can you please provide a sample code I can test this bug against?

I tried to reproduce this only with importing the nltk at the top of the wsgi file and got:


It seems like importing nltk triggers something that blocks execution of the request. I'm also unable to stop apache smoothly after I do first request.

I'm not python, neither nltk expert, but shouldn't you setup something first? Like download some sample text/library, etc.

Comment 3 Junior 2014-10-13 13:47:45 UTC
Hello Michal, the nltk work perfectly on the python 2.6 only importing on the code. My webservice had to add the contents of the folder nltk_data but now the error happens as you said, if you open the site, the application freezes for no reason.

I know another person who is having the same problem, so I think it was an update that was made in python or in nltk what caused it.

Example of basic code. You also need commit the nltk_data folder together with wsgi.py and run this command on the rhc: rhc set-env NLTK_DATA=remote_directory_of_nltk_data_folder -a nameapp --namespace name_of_namespace

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os

virtenv = os.environ['APPDIR'] + '/virtenv/'
os.environ['PYTHON_EGG_CACHE'] = os.path.join(virtenv, 'lib/python2.6/site-packages')
virtualenv = os.path.join(virtenv, 'bin/activate_this.py')
try:
    execfile(virtualenv, dict(__file__=virtualenv))
except IOError:
    pass
#
# IMPORTANT: Put any additional includes below this line.  If placed above this
# line, it's possible required libraries won't be in your searchable path
# 
import nltk
import web

from nltk.tokenize import word_tokenize

tokenizer = WordPunctTokenizer()

urls = (
  '/', 'index'
)

render = web.template.render('app-root/repo/wsgi/templates/')

class index:
	
	def GET(self):
		tokenizer.tokenize('Hello World')
		return 'Hello World'

application = web.application(urls, globals()).wsgifunc()

#
# Below for testing only
#
if __name__ == '__main__':
	from wsgiref.simple_server import make_server
	httpd = make_server('localhost', 8051, application)
	# Wait for a single request, serve it and quit.
	httpd.handle_request()

Comment 4 Michal Fojtik 2014-10-16 13:24:04 UTC
Junior, sorry for this taking that long. I'm currently testing this fix:

https://github.com/openshift/origin-server/pull/5879

It seems it fixes the problem with nltk, however, we are still evaluating if it does not break something else.

Comment 5 Michal Fojtik 2014-10-16 13:34:18 UTC
Quote from documentation:

```
Forcing a WSGI application to run within the first interpreter can be necessary when a third party C extension module for Python has used the simplified threading API for manipulation of the Python GIL and thus will not run correctly within any additional sub interpreters created by Python. 
```

This explains the 'nltk' which do have C extension.

Comment 6 openshift-github-bot 2014-10-17 10:38:27 UTC
Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/5f408222ecbe2c57712474892422514fb52acd0b
Bug 1151494 - Add WSGIApplicationGroup directive to wsgi.conf

Comment 7 Junior 2014-10-17 12:33:12 UTC
Hello Michal, I'm trying re-running the webservice and continues doesn't work when I import nltk. I try comment the import and worked, so I believe the problem continues.

I don't have sure if the openshift was updated or not, but after this last post from openshift-github-bot I believe so.

Comment 8 Michal Fojtik 2014-10-17 12:34:39 UTC
Junior, this change is not in production yet. It will be part of the upcoming release.

Comment 9 Junior 2014-10-17 12:37:34 UTC
Ok then. Thanks for the support.

Comment 10 chunchen 2014-10-20 05:55:20 UTC
It's fixed, verified on devenv_5247, please refer to the following results:

1. Create a python-2.7 app
rhc app create py27 python-2.7

2. Create a directory named "nltk_data" in the app repo and set env var named "NLTK_DATA"
$ mkdir py27/nltk_data
$ rhc set-env NLTK_DATA=remote_directory_of_nltk_data_folder -a py27

3. Add 'nltk' and 'web.py' modules into setup.py file to install, like below:
from setuptools import setup

setup(name='YourAppName',
      version='1.0',
      description='OpenShift App',
      author='Your Name',
      author_email='example',
      url='http://www.python.org/sigs/distutils-sig/',
      install_requires=['nltk','web.py'],
     )

4. Overwrite the wsgi.py file as below:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os

virtenv = os.environ['APPDIR'] + '/virtenv/'
os.environ['PYTHON_EGG_CACHE'] = os.path.join(virtenv, 'lib/python2.6/site-packages')
virtualenv = os.path.join(virtenv, 'bin/activate_this.py')
try:
    execfile(virtualenv, dict(__file__=virtualenv))
except IOError:
    pass
#
# IMPORTANT: Put any additional includes below this line.  If placed above this
# line, it's possible required libraries won't be in your searchable path
# 
import nltk
import web

from nltk.tokenize import word_tokenize
from nltk.tokenize import WordPunctTokenizer

tokenizer = WordPunctTokenizer()

urls = (
  '/', 'index'
)

render = web.template.render('app-root/repo/wsgi/templates/')

class index:

        def GET(self):
                tokenizer.tokenize('Hello World')
                return 'Hello World'

application = web.application(urls, globals()).wsgifunc()

#
# Below for testing only
#
if __name__ == '__main__':
        from wsgiref.simple_server import make_server
        httpd = make_server('localhost', 8051, application)
        # Wait for a single request, serve it and quit.
        httpd.handle_request()

5. Perform git push
git add . && git commit -amp && git push

6. Access the app home page via browser

After step 6, the app can be accessed and get "Hello World" text.

Comment 11 Junior 2014-10-20 13:03:53 UTC
Here, the problem continues, when I import nltk in any application with python 2.7, the webservice freezes.
I follow chunchen's tutorial and the problem happens.
My friend have a webservice with python2.7 + nltk and the webservice still not working as well.

Comment 12 Michal Fojtik 2014-10-20 13:08:17 UTC
Junior, this change is not in the production yet. It will be released soon. The issue was verified as fixed by our QA internally, so the problem is solved. You just have to wait for production.

Comment 13 Junior 2014-10-20 13:14:56 UTC
Ok again Michal, I'll monitor the application to know when the update was produced.


Note You need to log in before you can comment on or make changes to this bug.