The *fastverse* is a suite of complementary high-performance packages for statistical computing and data manipulation in R. Developed independently by various people, *fastverse* packages jointly contribute to the objectives of:

- Speeding up R through heavy use of compiled code (C, C++, Fortran)
- Enabling more complex statistical and data manipulation operations in R
- Reducing the number of dependencies required for advanced computing in R

The `fastverse`

package is a meta-package providing utilities for easy installation, loading and management of these packages. It is an extensible framework that allows users to (permanently) add or remove packages to create a ‘verse’ of packages suiting their general needs, or even create separate ‘verses’ of their own.

*fastverse* packages are jointly attached with `library(fastverse)`

, and several functions starting with `fastverse_`

help manage dependencies, detect namespace conflicts, add/remove packages from the *fastverse* and update packages. The **vignette** provides a concise overview of the package.

The *fastverse* installs with 4 core packages^{1} (5 dependencies in total) which provide broad C/C++ based statistical and data manipulation functionality and have carefully managed APIs.

**data.table**: Enhanced data frame class with concise data manipulation framework offering powerful aggregation, flexible split-apply-combine computing, reshaping, (rolling) joins, rolling statistics, set operations on tables, fast csv read/write, and various utilities such as transposition of data.**collapse**: Fast grouped and weighted statistical computations, time series and panel data transformations, list-processing, data manipulation functions, summary statistics and various utilities such as support for variable labels. Class-agnostic framework designed to work with vectors, matrices, data frames, lists and related classes including*xts*,*data.table*,*tibble*,*plm*,*sf*.**kit**: Parallel (row-wise) statistical functions, vectorized and nested switches, and some utilities such as efficient partial sorting.**magrittr**: Efficient pipe operators and aliases for enhanced R programming and code un-nesting.

```
# Install the CRAN version
install.packages("fastverse")
# Install (Windows/Mac binaries) from R-universe
install.packages("fastverse", repos = "https://fastverse.r-universe.dev")
# Install from GitHub (requires compilation)
remotes::install_github("fastverse/fastverse")
```

*Note* that the GitHub/r-universe version is not a development version, development takes place in the ‘development’ branch.

Users can, via the `fastverse_entend()`

function, freely attach extension packages. Setting `permanent = TRUE`

adds these packages to the core *fastverse*. Another option is adding a `.fastverse`

config file with packages to the project directory. Separate verses can be created with `fastverse_child()`

. See the **vignette** for details.

High-performing packages for different data manipulation and statistical computing topics are suggested below. The total (recursive) dependency count is indicated for each package.

**xts**and**zoo**: Fast and reliable matrix-based time series classes providing fully identified ordered observations and various utilities for plotting and computations (1 dependency).**roll**: Fast rolling and expanding window functions for vectors and matrices (3 dependencies).*Notes*:*xts*/*zoo*objects are preserved by*roll*functions and by*collapse*’s time series and data transformation functions^{2}. As*xts*/*zoo*objects are matrices, all*matrixStats*functions apply to them as well.*xts*objects can also easily be converted to and from*data.table*, which also has some fast rolling functions like`frollmean`

and`frollapply`

.

**anytime**: Anything to ‘POSIXct’ or ‘Date’ converter (2 dependencies).**fasttime**: Fast parsing of strings to ‘POSIXct’ (0 dependencies).**nanotime**: Provides a coherent set of temporal types and functions with nanosecond precision -

based on the ‘integer64’ class (7 dependencies).**clock**: Comprehensive library for date-time manipulations using a new family of orthogonal date-time classes (durations, time points, zoned-times, and calendars) (6 dependencies).**timechange**: Efficient manipulation of date-times accounting for time zones and daylight saving times (1 dependency).*Notes*: Date and time variables are preserved in many*data.table*and*collapse*operations.*data.table*additionally offers an efficient integer based date class ‘IDate’ with some supporting functionality.*xts*and*zoo*also provide various functions to transform dates, and*zoo*provides classes ‘yearmon’ and ‘yearqtr’ for convenient computation with monthly and quarterly data. Package*mondate*also provides a class ‘mondate’ for monthly data. Many users also find**lubridate**convenient for ‘POSIX-’ and ‘Date’ based computations.

**stringi**: Main R package for fast, correct, consistent, and convenient string/text manipulation (backend to*stringr*and*snakecase*) (0 dependencies).**stringfish**: Fast computation of common (base R) string operations using the ALTREP system (2 dependencies).**stringdist**: Fast computation of string distance metrics, matrices, and fuzzy matching (0 dependencies).*Notes*: At least two packages offer convenient wrappers around the rather rich*stringi*API:**stringr**provides simple, consistent wrappers for common string operations, based on*stringi*(3 dependencies), and**snakecase**converts strings into any case, based on*stringi*and*stringr*(4 dependencies).

**matrixStats**: Efficient row-and column-wise (weighted) statistics on matrices and vectors, including computations on subsets of rows and columns (0 dependencies).**Rfast**and**Rfast2**: Heterogeneous sets of fast functions for statistics, estimation and data manipulation operating on vectors and matrices (4-5 dependencies).**vctrs**: Computational backend of the*tidyverse*that provides many basic programming functions for R vectors (including lists and data frames) implemented in C (such as sorting, matching, replicating, unique values, concatenating, splitting etc. of vectors). These are often significantly faster than base R equivalents, but generally not as aggressively optimized as some equivalents found in*collapse*or*data.table*(4 dependencies).**parallelDist**: Multi-threaded distance matrix computation (3 dependencies).**coop**: Fast implementations of the covariance, correlation, and cosine similarity (0 dependencies).**rsparse**: Implements many algorithms for statistical learning on sparse matrices - matrix factorizations, matrix completion, elastic net regressions, factorization machines (8 dependencies). See also package**MatrixExtra**.**fastmatrix**provides a small set of functions written in C or Fortran providing fast computation of some matrices and operations useful in statistics (0 dependencies).**matrixTests**efficient execution of multiple statistical hypothesis tests on rows and columns of matrices (1 dependency).**rrapply**: The`rrapply()`

function extends base`rapply()`

by including a condition or predicate function for the application of functions and diverse options to prune or aggregate the result (0 dependencies).**dqrng**: Fast uniform, normal or exponential random numbers and random sampling (i.e. faster`runif`

,`rnorm`

,`rexp`

,`sample`

and`sample.int`

functions) (3 dependencies).**fastmap**: Fast implementation of data structures based on C++, including a key-value store (`fastmap`

), stack (`faststack`

), and queue (`fastqueque`

) (0 dependencies).**fastmatch**: A faster`match()`

function (drop-in replacement for`base::match`

, and`base::%in%`

), that keeps the hash table in memory for much faster repeated lookups (0 dependencies).**hutilscpp**provides C++ implementations of some frequently used utility functions in R (4 dependencies).*Notes*:*Rfast*has a number of like-named functions to*matrixStats*. These are simpler but typically faster and support multi-threading. Some highly efficient statistical functions can also be found scattered across various other packages, notable to mention here are*Hmisc*(60 dependencies) and*DescTools*(17 dependencies).

**sf**: Leading framework for geospatial computing and manipulation in R, offering a simple and flexible spatial data frame and supporting functionality (12 dependencies).**s2**: Provides R bindings for Google’s s2 C++ library for high-performance geometric calculations on the sphere (3D, geographic/geodetic CRS). Used as a backend to*sf*for calculations on geometries with geographic/geodetic CRS, but using*s2*directly can provide substantial performance gains (2 dependencies).**geos**: Provides an R API to the Open Source Geometry Engine (GEOS) C-library, which can be used to very efficiently manipulate planar (2D/flat/projected CRS) geometries, and a vector format with which to efficiently store ‘GEOS’ geometries. Used as a backend to*sf*for calculations on geometries with projected CRS, but using*geos*directly can provide substantial performance gains (2 dependencies).**stars**: Spatiotemporal data (raster and vector) in the form of dense arrays, with space and time being array dimensions (16 dependencies).**terra**: Methods for spatial data analysis with raster and vector data. Processing of very large (out of memory) files is supported (1 dependency).**exactextractr**: Provides fast extraction from raster datasets using polygons. Notably, it is much faster than*terra*for computing summary statistics of raster layers within polygons (17 dependencies).**geodist**: Provides very fast calculation of geodesic distances (0 dependencies).**dggridR**: Provides discrete global grids for R: allowing accurate partitioning of the earths surface into equally sized grid cells of different shapes and sizes (30 dependencies). The source project is not well maintained, and users are strongly encouraged to install this fork (version 3.1+) which fixes a major bug on Mac and introduces a*collapse*backend for faster grid materialization (14 dependencies).**cppRouting**: Algorithms for routing and solving the traffic assignment problem, including calculation of distances, shortest paths and isochrones on weighted graphs using several (optimized) variants of Dijkstra’s algorithm (4 dependencies).**igraph**: Provides and R port to the*igraph*C library for complex network analysis and graph theory (11 dependencies).*Notes*:*collapse*can be used for efficient manipulation and computations on*sf*data frames.*sf*also offers tight integration with*dplyr*. Another efficient routing package is*dodgr*(45 dependencies).*sfnetworks*allows network analysis combining*sf*and*igraph*(42 dependencies) and functions for network cleaning (partly taken from tidygraph which also wraps*igraph*).*stplanr*facilitates sustainable transport planning with R, including very useful helpers such as`overline()`

to turn a set of linestrings (routes) into a network (45 dependencies).

**dygraphs**: Interface to ‘Dygraphs’ interactive time series charting library (12 dependencies).**lattice**: Trellis graphics for R (0 dependencies).**grid**: The grid graphics package (0 dependencies).**tinyplot**provides a lightweight extension of the base R graphics system, with support for automatic grouping, legends, facets, and various other enhancements (0 dependencies).**ggplot2**: Create elegant data visualizations using the Grammar of Graphics (27 dependencies).**scales**: Scale functions for visualizations (11 dependencies).*Notes:**latticeExtra*provides extra graphical utilities base on*lattice*.*gridExtra*provides miscellaneous functions for*grid*graphics (and consequently for*ggplot2*which is based on*grid*).*gridtext*provides improved text rendering support for*grid*graphics. Many packages offer*ggplot2*extensions, (typically starting with ‘gg’) such as*ggExtra*,*ggalt*,*ggforce*,*ggh4x*,*ggmap*,*ggtext*,*ggthemes*,*ggrepel*,*ggridges*,*ggfortify*,*ggstatsplot*,*ggeffects*,*ggsignif*,*GGally*,*ggcorrplot*,*ggdendro*, etc.. Users in desperate need for greater performance may also find the (unmaintained) lwplot package useful that provides a faster and lighter version of*ggplot2*with*data.table*backend.

**r-polars**provides an R-port to the impressively fast polars DataFrame’s library written in Rust (1 dependencies).*Notes*: Package**tidypolars**provides a*tidyverse*-style wrapper around*r-polars*.

**fst**: A compressed data file format that is very fast to read and write. Full random access in both rows and columns allows reading subsets from a ‘.fst’ file (2 dependencies).**qs**provides a lightning-fast and complete replacement for the`saveRDS`

and`readRDS`

functions in R. It supports general R objects with attributes and references - at similar speeds to*fst*- but does not provide on-disk random access to data subsets like*fst*(4 dependencies).**arrow**provides both a low-level interface to the Apache Arrow C++ library (a multi-language toolbox for accelerated data interchange and in-memory processing) including fast reading / writing delimited files, efficient storage of data as`.parquet`

or`.feather`

files, efficient (lazy) queries and computations, and sharing data between R and Python (14 dependencies). It provides methods for several*dplyr*functions allowing highly efficient data manipulation on arrow datasets. Check out the useR2022 workshop on working with larger than memory data with apache arrow in R, and the apache arrow R cookbook as well as the awesome-arrow-r repository.**duckdb**: DuckDB is a high-performance analytical database system that can be used on in-memory or out-of memory data (including csv,`.parquet`

files, arrow datasets, and it’s own`.duckdb`

format), and that provides a rich SQL dialect and optimized query execution for data analysis (1 dependency). It can also be used with the*dbplyr*package that translates*dplyr*code to SQL. This Article by Christophe Nicault (October 2022) demonstrates the integration of*duckdb*with R and*arrow*. Also see the official docs.**vroom**provides fast reading of delimited files (23 dependencies).*Notes*:*data.table*provides`fread`

and`fwrite`

for fast reading of delimited files.

**nCompiler**: Compiles R functions to C++, and covers basic math, distributions, vectorized math and linear algebra, as well as basic control flow. R and Compiled C++ functions can also be jointly utilized in the a class ‘nClass’ that inherits from R6. An in-progress user-manual provides an overview of the package.**ast2ast**: Also compiles R functions to C++, and is very straightforward to use (it has a single function`translate()`

to compile R functions), but less flexible than nCompiler (e.g. it currently does not support linear algebra). Available on CRAN (6 dependencies).**odin**: Implements R to C translation and compilation, but specialized for differential equation solving problems. Available on CRAN (8 dependencies).**armacmp**translates linear algebra code written in R to C++ using the Armadillo Template Library. The package can also be used to write mathematical optimization routines that are translated and optimized in C++ using*RcppEnsmallen*.**r2c**provides compilation of R functions to be applied over many groups (e.g. grouped bivariate linear regression etc.).**FastR**is a high-performance implementation of the entire R programming language, that can JIT compile R code to run on the Graal VM.**inline**allows users to write C, C++ or Fortran functions and compile them directly to an R function for use within the R session. Available on CRAN (0 dependencies).*Notes*: Many of these projects are experimental and not available as CRAN packages.

**tidypolars**is a python library built on top of polars that gives access to methods and functions familiar to R tidyverse users.**Tidier.jl**provides a Julia implementation of the tidyverse mini-language in Julia. Powered by the DataFrames.jl library.

**R’s C API**is the most natural way to extend R and does not require additional packages. It is further documented in the Writing R Extensions Manual, the R Internals Manual, the**r-internals**repository and sometimes referred to in the R Blog (and some other Blogs on the web). Users willing to extend R in this way should familiarize themselves with R’s garbage collection and PROTECT Errors.**Rcpp**provides seamless R and C++ integration, and is widely used to extend R with C++. Compared to the C API compile time is slower and object files are larger, but users don’t need to worry about garbage collection and can use modern C++ as well as a rich set of R-flavored functions and classes (0 dependencies).**cpp11**provides a simpler, header-only R binding to C++ that allows faster compile times and several other enhancements (0 dependencies).**tidyCpp**provides a tidy C++ wrapping of the C API of R - to make the C API more amenable to C++ programmers (0 dependencies).**JuliaCall**Provides an R interface to the Julia programming language (11 dependencies). Other interfaces are provided by XRJulia (2 dependencies) and JuliaConnectoR (0 dependencies).**rextendr**provides an R interface to the Rust programming language (29 dependencies).**rJava**provides an R interface to Java (0 dependencies).*Notes*: There are many Rcpp extension packages binding R to powerful C++ libraries, such as linear algebra through*RcppArmadillo*and*RcppEigen*, thread-safe parallelism through*RcppParallel*etc.

**tidytable**: A tidy interface to*data.table*that is*rlang*compatible. Quite comprehensive implementation of*dplyr*,*tidyr*and*purr*functions. Package uses a class*tidytable*that inherits from*data.table*. The`dt()`

function makes*data.table*syntax pipeable (12 total dependencies).**dtplyr**: A tidy interface to*data.table*built around lazy evaluation i.e. users need to call`as.data.table()`

,`as.data.frame()`

or`as_tibble()`

to access the results. Lazy evaluation holds the potential of generating more performant*data.table*code (20 dependencies).**tidyfst**: Tidy verbs for fast data manipulation. Covers*dplyr*and some*tidyr*functionality. Functions have`_dt`

suffix and preserve*data.table*object. A cheatsheet is provided (7 dependencies).**tidyft**: Tidy verbs for fast data operations by reference. Best for big data manipulation on out of memory data using facilities provided by*fst*(7 dependencies).**tidyfast**: Fast tidying of data. Covers*tidyr*functionality,`dt_`

prefix, preserves*data.table*object (2 dependencies).**maditr**: Fast data aggregation, modification, and filtering with pipes and*data.table*. Minimal implementation with functions`let()`

and`take()`

for most common data manipulation tasks. Also provides Excel-like lookup functions (2 dependencies).**table.express**also o builds*data.table*expressions from*dplyr*verbs, without executing them eagerly. Similar to*dtplyr*but less mature (17 dependencies).*Notes*: These packages are wrappers around*data.table*and do not introduce own compiled code.

- See the High-Performance and Parallel Computing Task View and the futureverse.

Please notify me of any other packages you think should be included here. Such packages should be well designed, top-performing, low-dependency, and, with few exceptions, provide own compiled code. Please note that the *fastverse* focuses on general purpose statistical computing and data manipulation, thus I won’t include fast packages to estimate specific kinds of models here (of which R also has a great many).