pygrams

Extract n-grams from documents and forecast emergence.

This python-based app is designed to extract popular or emergent n-grams/terms (words or short phrases) from free text within a large (>1,000) corpus of documents. Example corpora of granted patent document abstracts are included for testing purposes.

TFIDF caching.

Cache TFIDF calculations on large documents for faster analysis.

Timeseries analysis.

Observe and nowcast terms using timeseries analysis to highlight emerging and declining terms.

Handles large document collections.

Useful for large document collections such as patents or research papers.

Data Science Campus

GitHub Twitter GitHub Pages template adapted from Facebook

Contribute to this project on GitHub