• Docs
  • GitHub
  • Docs
  • GitHub

Extract n-grams from documents and forecast emergence.

This python-based app is designed to extract popular or emergent n-grams/terms (words or short phrases) from free text within a large (>1,000) corpus of documents. Example corpora of granted patent document abstracts are included for testing purposes.

Install pyGrams
Getting started
Read the technical report
pyGrams

TFIDF caching.

Cache TFIDF calculations on large documents for faster analysis.

Timeseries analysis.

Observe and nowcast terms using timeseries analysis to highlight emerging and declining terms.

Handles large document collections.

Useful for large document collections such as patents or research papers.

pyGrams

Data Science Campus

GitHub Twitter GitHub Pages template adapted from Facebook
Contribute to this project on GitHub