There is a script to download 3rd party data in scripts/download_third_party_data.py Currently it downloads: - The stanford POS and NES tagger - punktokenizer