Bug 1109277
| Summary: | Problem with RSLP Stemmer in application which uses nltk | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Online | Reporter: | Junior <juniorcaemj> | ||||
| Component: | Image | Assignee: | Jakub Hadvig <jhadvig> | ||||
| Status: | CLOSED NOTABUG | QA Contact: | libra bugs <libra-bugs> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 1.x | CC: | jokerman, mmccomas | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2014-06-17 22:40:18 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Junior the problem is that the NLTK package by default expect corpus in user home directory. Unfortunatelly, you cannot write to user home, you have to use $OPENSHIFT_DATA_DIR for storing data. To solve this problem do the following: 1. Create an environment variable called NLTK_DATA with value $OPENSHIFT_DATA_DIR. After creating environment variable restart the app using rhc app-restart command. 2. SSH into your application gear using rhc ssh command 3. Activate the virtual environment and download the corpus using the commads shown below. 1.# . $VIRTUAL_ENV/bin/activate 2.# curl https://raw.githubusercontent.com/sloria/TextBlob/dev/textblob/download_corpora.py | python There was also an blog post which solves your problem. https://www.openshift.com/blogs/day-9-textblob-finding-sentiments-in-text Thanks for help, Shekhar. I following your instructions but the URL was broken. However, this feature of create environment variables was useful because I created a folder containing the content of nltk which I needed, and set an environment variable NLTK_DATA for this folder. Again, thanks for the help. Junior the correct url is: https://raw.githubusercontent.com/sloria/TextBlob/dev/textblob/download_corpora.py This one is working and is the right one. -Jakub |
Created attachment 908598 [details] Picture of problem - Description of problem: I hosted an python webservice application in openshift which uses RSLP Stemmer module of nltk, but the log of service reported that: "[...] Resource 'stemmers/rslp/step0.pt' not found. Please use the NLTK Downloader to obtain the resource: >>> nltk.download() Searched in: - '/var/lib/openshift/539a61ab5973caa2410000bf/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' [...] " I concluded that the module is not installed properly and so I'm reporting the bug. - How reproducible: Use the following code snippet: import nltk from nltk.stem import RSLPStemmer stemmer = RSLPStemmer() - Actual results: The application not be working. - Expected results: The application should be working.