Failed loading english pickle with nltk data load

Question

When trying to load the punkt tokenizer     import nltk data tokenizer   nltk data load  nltk tokenizers punkt english pickle        a LookupError was raised    gt  LookupError    gt                                                                                gt  Resource  tokenizers punkt english pickle  not found   Please use the NLTK Downloader to obtain the resource  nltk download      Searched in   gt          -  C   Users  Martinos nltk data   gt          -  C   nltk data   gt          -  D   nltk data   gt          -  E   nltk data   gt          -  E   Python26  nltk data   gt          -  E   Python26  lib  nltk data   gt          -  C   Users  Martinos  AppData  Roaming  nltk data   gt

User · Answer

nltk download   will not solve this issue  I tried the below and it worked for me   in the     AppData Roaming nltk data tokenizers  folder  extract downloaded punkt zip folder at the same location

User · Answer

This is what worked for me just now     Do this in a separate python interpreter session  since you only have to do it once import nltk nltk download  punkt      Do this in your ipython notebook or analysis script from nltk tokenize import word tokenize  sentences          Mr  Green killed Colonel Mustard in the study with the candlestick  Mr  Green is not a very nice fellow         Professor Plum has a green plant in his study         Miss Scarlett watered Professor Plum s green plant while he was away from his office last week      sentences tokenized      for s in sentences      sentences tokenized append word tokenize s     sentences tokenized is a list of a list of tokens      Mr     Green    killed    Colonel    Mustard    in    the    study    with    the    candlestick         Mr     Green    is    not    a    very    nice    fellow           Professor    Plum    has    a    green    plant    in    his    study           Miss    Scarlett    watered    Professor    Plum     s    green    plant    while    he    was    away    from    his    office    last    week           The sentences were taken from the example ipython notebook accompanying the book  Mining the Social Web  2nd Edition

User · Answer

you just need to go to python console and type- gt     import nltk    press enter and retype- gt     nltk download      and then a interface will come  Just search for download button and press it  It will install all the required items and will take time  Give the time and just try it again  Your problem will get solved

User · Answer

Check if you have all NLTK libraries

User · Answer

I had similar issue when using an assigned folder for multiple downloads  and I had to append the data path manually   single download  can be achived as followed  works   import os as  os from nltk corpus import stopwords from nltk import download as nltk download  nltk download  stopwords   download dir  os path join get project root path     temp    raise on error True   stop words  list   stopwords words  english      This code works  meaning that nltk remembers the download path passed in the download fuction  On the other nads if I download a subsequent package I get similar error as described by user   Multiple downloads raise an error   import os as  os  from nltk corpus import stopwords from nltk tokenize import word tokenize  from nltk import download as nltk download  nltk download   stopwords    punkt    download dir  os path join get project root path     temp    raise on error True   print stopwords words  english    print word tokenize  I am trying to find the download path 99         Error   Resource punkt not found    Please use the NLTK Downloader to obtain the resource   import nltk   nltk download  punkt    Now if I append the ntlk data path with my download path  it works   import os as  os  from nltk corpus import stopwords from nltk tokenize import word tokenize  from nltk import download as nltk download from nltk data import path as nltk path   nltk path append   os path join get project root path     temp      nltk download   stopwords    punkt    download dir  os path join get project root path     temp    raise on error True   print stopwords words  english    print word tokenize  I am trying to find the download path 99        This works    Not sure why works in one case but not the other  but error message seems to imply that it doesn t check into the download folder the second time  NB  using windows8 1 python3 7 nltk3 5

User · Answer

From bash command line  run     python -c  import nltk  nltk download  punkt

User · Answer

Simple nltk download   will not solve this issue  I tried the below and it worked for me   in the nltk folder create a tokenizers folder and copy your punkt folder into tokenizers folder   This will work   the folder structure needs to be as shown in the picture 1

User · Answer

The main reason why you see that error is nltk couldn t find punkt package  Due to the size of nltk suite  all available packages are not downloaded by default when one installs it  You can download punkt package like this  import nltk nltk download  punkt    from nltk import word tokenize sent tokenize  If you do not pass any argument to the download function  it downloads all packages i e chunkers  grammars  misc  sentiment  taggers  corpora  help  models  stemmers  tokenizers  nltk download    The above function saves packages to a specific directory  You can find that directory location from comments here  https   github com nltk nltk blob 67ad86524d42a3a86b1f5983868fd2990b59f1ba nltk downloader py L1051

User · Answer

This Works for me    gt  gt  gt  import nltk  gt  gt  gt  nltk download     In windows you will also get nltk downloader

User · Answer

i came across this problem when i was trying to do pos tagging in nltk   the way i got it correct is by making a new directory along with corpora directory named  taggers  and copying max pos tagger in directory taggers    hope it works for you too  best of luck with it

User · Answer

nltk have its pre-trained tokenizer models  Model is downloading from internally predefined web sources and stored at path of installed nltk package while executing following possible function calls   E g  1 tokenizer   nltk data load  nltk tokenizers punkt english pickle    E g  2 nltk download  punkt    If you call above sentence in your code  Make sure you have internet connection without any firewall protections   I would like to share some more better alter-net way to resolve above issue with more better deep understandings   Please follow following steps and enjoy english word tokenization using nltk   Step 1  First download the  english pickle  model following web path   Goto link  http   www nltk org nltk data   and click on  download  at option  107  Punkt Tokenizer Models   Step 2  Extract the downloaded  punkt zip  file and find the  english pickle  file from it and place in C drive   Step 3  copy paste following code and execute   from nltk data import load from nltk tokenize treebank import TreebankWordTokenizer  sentences          Mr  Green killed Colonel Mustard in the study with the candlestick  Mr  Green is not a very nice fellow         Professor Plum has a green plant in his study         Miss Scarlett watered Professor Plum s green plant while he was away from his office last week      tokenizer   load  file C  english pickle   treebank word tokenize   TreebankWordTokenizer   tokenize  wordToken      for sent in sentences      subSentToken          for subSent in tokenizer tokenize sent           subSentToken extend  token for token in treebank word tokenize subSent         wordToken append subSentToken   for token in wordToken      print token   Let me know  if you face any problem

User · Answer

On Jenkins this can be fixed by adding following like of code to Virtualenv Builder under Build tab   python -m nltk downloader punkt

User · Answer

The punkt tokenizers data is quite large at over 35 MB  this can be a big deal if like me you are running nltk in an environment such as lambda that has limited resources   If you only need one or perhaps a few language tokenizers you can drastically reduce the size of the data by only including those languages  pickle files   If all you only need to support English then your nltk data size can be reduced to 407 KB  for the python 3 version    Steps   Download the nltk punkt data  https   raw githubusercontent com nltk nltk data gh-pages packages tokenizers punkt zip Somewhere in your environment create the folders  nltk data tokenizers punkt  if using python 3 add another folder PY3 so that your new directory structure looks like nltk data tokenizers punkt PY3  In my case I created these folders at the root of my project    Extract the zip and move the  pickle files for the languages you want to support into the punkt folder you just created  Note  Python 3 users should use the pickles from the PY3 folder  With your language files loaded it should look something like  example-folder-stucture Now you just need to add your nltk data folder to the search paths  assuming your data is not in one of the pre-defined search paths  You can add your data using either the environment variable NLTK DATA  path to your nltk data   You can also add a custom path at runtime in python by doing    from nltk import data data path       path to your nltk data     NOTE  If you don t need to load in the data at runtime or bundle the data with your code  it would be best to create your nltk data folders at the built-in locations that nltk looks for

User · Answer

In Python-3 6 I can see the suggestion in the traceback  That s quite helpful  Hence I will say you guys to pay attention to the error you got  most of the time answers are within that problem        And then as suggested by other folks here either using python terminal or using a command like python -c  import nltk  nltk download  wordnet    we can install them on the fly  You just need to run that command once and then it will save the data locally in your home directory

User · Answer

I had this same problem  Go into a python shell and type    gt  gt  gt  import nltk  gt  gt  gt  nltk download     Then an installation window appears  Go to the  Models  tab and select  punkt  from under the  Identifier  column  Then click Download and it will install the necessary files  Then it should work

User · Answer

In Spyder  go to your active shell and download nltk using below 2 commands  import nltk nltk download   Then you should see NLTK downloader window open as below  Go to  Models  tab in this window and click on  punkt  and download  punkt

[python] Failed loading english.pickle with nltk.data.load

Examples related to python

Examples related to jenkins

Examples related to nltk