ngram: Fast n-Gram 'Tokenization'

An n-gram is a sequence of n "words" taken, in order, from a body of text. This is a collection of utilities for creating, displaying, summarizing, and "babbling" n-grams. The 'tokenization' and "babbling" are handled by very efficient C code, which can even be built as its own standalone library. The babbler is a simple Markov chain. The package also offers a vignette with complete example 'workflows' and information about the utilities offered in the package.

Version: 3.2.3
Depends: R (≥ 3.0.0)
Imports: methods
Published: 2023-12-10
DOI: 10.32614/CRAN.package.ngram
Author: Drew Schmidt [aut, cre], Christian Heckendorf [aut]
Maintainer: Drew Schmidt <wrathematics at>
License: BSD 2-clause License + file LICENSE
NeedsCompilation: yes
Citation: ngram citation info
Materials: README ChangeLog
CRAN checks: ngram results


Reference manual: ngram.pdf
Vignettes: Guide to the ngram Package


Package source: ngram_3.2.3.tar.gz
Windows binaries: r-devel:, r-release:, r-oldrel:
macOS binaries: r-release (arm64): ngram_3.2.3.tgz, r-oldrel (arm64): ngram_3.2.3.tgz, r-release (x86_64): ngram_3.2.3.tgz, r-oldrel (x86_64): ngram_3.2.3.tgz
Old sources: ngram archive

Reverse dependencies:

Reverse imports: discoverableresearch, MadanTextNetwork, revtools, ulex
Reverse suggests: daiR


Please use the canonical form to link to this page.