Bug 1109277 - Problem with RSLP Stemmer in application which uses nltk
Summary: Problem with RSLP Stemmer in application which uses nltk
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Image
Version: 1.x
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Jakub Hadvig
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-06-13 14:41 UTC by Junior
Modified: 2014-06-18 07:54 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-06-17 22:40:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Picture of problem (95.64 KB, image/jpeg)
2014-06-13 14:41 UTC, Junior
no flags Details

Description Junior 2014-06-13 14:41:38 UTC
Created attachment 908598 [details]
Picture of problem

- Description of problem:

I hosted an python webservice application in openshift which uses RSLP Stemmer module of nltk, but the log of service reported that:

"[...] Resource 'stemmers/rslp/step0.pt' not found. Please use the NLTK Downloader to obtain the resource: >>> nltk.download()

Searched in:
 - '/var/lib/openshift/539a61ab5973caa2410000bf/nltk_data'
 - '/usr/share/nltk_data'
 - '/usr/local/share/nltk_data'
 - '/usr/lib/nltk_data'
 - '/usr/local/lib/nltk_data'  [...]  "

I concluded that the module is not installed properly and so I'm reporting the bug.

- How reproducible:
Use the following code snippet:

import nltk
from nltk.stem import RSLPStemmer
stemmer = RSLPStemmer()


- Actual results:
The application not be working.

- Expected results:
The application should be working.

Comment 1 Jakub Hadvig 2014-06-17 22:40:18 UTC
Junior the problem is that the NLTK package by default expect corpus in user home directory. Unfortunatelly, you cannot write to user home, you have to use $OPENSHIFT_DATA_DIR for storing data. To solve this problem do the following:

1. Create an environment variable called NLTK_DATA with value $OPENSHIFT_DATA_DIR. After creating environment variable restart the app using rhc app-restart command.
2. SSH into your application gear using rhc ssh command
3. Activate the virtual environment and download the corpus using the commads shown below.

1.# . $VIRTUAL_ENV/bin/activate
2.# curl https://raw.githubusercontent.com/sloria/TextBlob/dev/textblob/download_corpora.py | python



There was also an blog post which solves your problem.
https://www.openshift.com/blogs/day-9-textblob-finding-sentiments-in-text

Comment 2 Junior 2014-06-18 00:16:52 UTC
Thanks for help, Shekhar. I following your instructions but the URL was broken. However, this feature of create environment variables was useful because I created a folder containing the content of nltk which I needed, and set an environment variable NLTK_DATA for this folder. Again, thanks for the help.

Comment 3 Jakub Hadvig 2014-06-18 07:54:34 UTC
Junior the correct url is:
https://raw.githubusercontent.com/sloria/TextBlob/dev/textblob/download_corpora.py

This one is working and is the right one.

-Jakub


Note You need to log in before you can comment on or make changes to this bug.