My Code:
import nltk.data
tokenizer = nltk.data.load('nltk:tokenizers/punkt/english.pickle')
ERROR Message:
[ec2-user@ip-172-31-31-31 sentiment]$ python mapper_local_v1.0.py
Traceback (most recent call last):
File "mapper_local_v1.0.py", line 16, in <module>
tokenizer = nltk.data.load('nltk:tokenizers/punkt/english.pickle')
File "/usr/lib/python2.6/site-packages/nltk/data.py", line 774, in load
opened_resource = _open(resource_url)
File "/usr/lib/python2.6/site-packages/nltk/data.py", line 888, in _open
return find(path_, path + ['']).open()
File "/usr/lib/python2.6/site-packages/nltk/data.py", line 618, in find
raise LookupError(resource_not_found)
LookupError:
Resource u'tokenizers/punkt/english.pickle' not found. Please
use the NLTK Downloader to obtain the resource:
>>>nltk.download()
Searched in:
- '/home/ec2-user/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- u''
I'm trying to run this program in Unix machine:
As per the error message, I logged into python shell from my unix machine then I used the below commands:
import nltk
nltk.download()
and then I downloaded all the available things using d- down loader and l- list options but still the problem persists.
I tried my best to find the solution in internet but I got the same solution what I did as I mentioned in my above steps.
After adding this line of code, the issue will be fixed:
nltk.download('punkt')
For me nothing of the above worked, so I just downloaded all the files by hand from the web site http://www.nltk.org/nltk_data/ and I put them also by hand in a file "tokenizers" inside of "nltk_data" folder. Not a pretty solution but still a solution.
import nltk
nltk.download('punkt')
Open the Python prompt and run the above statements.
The sent_tokenize function uses an instance of PunktSentenceTokenizer from the nltk.tokenize.punkt module. This instance has already been trained and works well for many European languages. So it knows what punctuation and characters mark the end of a sentence and the beginning of a new sentence.
From the shell you can execute:
sudo python -m nltk.downloader punkt
If you want to install the popular NLTK corpora/models:
sudo python -m nltk.downloader popular
If you want to install all NLTK corpora/models:
sudo python -m nltk.downloader all
To list the resources you have downloaded:
python -c 'import os; import nltk; print os.listdir(nltk.data.find("corpora"))'
python -c 'import os; import nltk; print os.listdir(nltk.data.find("tokenizers"))'
For me it got solved by using "nltk:"
http://www.nltk.org/howto/data.html
Failed loading english.pickle with nltk.data.load
sent_tokenizer=nltk.data.load('nltk:tokenizers/punkt/english.pickle')
My issue was that I called nltk.download('all')
as the root user, but the process that eventually used nltk was another user who didn't have access to /root/nltk_data where the content was downloaded.
So I simply recursively copied everything from the download location to one of the paths where NLTK was looking to find it like this:
cp -R /root/nltk_data/ /home/ubuntu/nltk_data
If you're looking to only download the punkt
model:
import nltk
nltk.download('punkt')
If you're unsure which data/model you need, you can install the popular datasets, models and taggers from NLTK:
import nltk
nltk.download('popular')
With the above command, there is no need to use the GUI to download the datasets.
I was getting an error despite importing the following,
import nltk
nltk.download()
but for google colab this solved my issue.
!python3 -c "import nltk; nltk.download('all')"
Add the following lines into your script. This will automatically download the punkt data.
import nltk
nltk.download('punkt')
Go to python console by typing
$ python
in your terminal. Then, type the following 2 commands in your python shell to install the respective packages:
>> nltk.download('punkt') >> nltk.download('averaged_perceptron_tagger')
This solved the issue for me.
Just make sure you are using Jupyter
Notebook and in a notebook, do the following:
import nltk
nltk.download()
Then one popup window will appear (showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml) From that you have to download everything.
Then rerun your code.
Execute the following code:
import nltk
nltk.download()
After this, NLTK downloader will pop out.
Simple nltk.download() will not solve this issue. I tried the below and it worked for me:
in the nltk folder create a tokenizers folder and copy your punkt folder into tokenizers folder.
This will work.! the folder structure needs to be as shown in the picture
The same thing happened to me recently, you just need to download the "punkt" package and it should work.
When you execute "list" (l) after having "downloaded all the available things", is everything marked like the following line?:
[*] punkt............... Punkt Tokenizer Models
If you see this line with the star, it means you have it, and nltk should be able to load it.
You need to rearrange your folders
Move your tokenizers
folder into nltk_data
folder.
This doesn't work if you have nltk_data
folder containing corpora
folder containing tokenizers
folder
To add to alvas' answer, you can download only the punkt
corpus:
nltk.download('punkt')
Downloading all
sounds like overkill to me. Unless that's what you want.
I faced same issue. After downloading everything, still 'punkt' error was there. I searched package on my windows machine at C:\Users\vaibhav\AppData\Roaming\nltk_data\tokenizers and I can see 'punkt.zip' present there. I realized that somehow the zip has not been extracted into C:\Users\vaibhav\AppData\Roaming\nltk_data\tokenizers\punk. Once I extracted the zip, it worked like music.
Source: Stackoverflow.com