Skip to contents

Installing Rust

First you need to have an updated Rust installation. Go to this site to learn how to install Rust.

Installing rustytools

You will need to have the devtools package installed…

devtools::install_github("furlan-lab/rustytools")

Compilation errors with openblas

When building R packages that depend on BLAS/LAPACK functionality (e.g., via openblas-src or RcppEigen), you may encounter linker errors such as:

ld: -lto_library library filename must be 'libLTO.dylib'
OpenBLAS build failed: Subprocess returns with non-zero status: 2

or

 thread 'main' panicked at /private/var/folders/63/z3dzfmg53g31179qxhwt5cq00000gn/T/RtmpIhMjfg/R.INSTALL17d5e6ffc8bf0/rustytools/src/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/openblas-src-0.10.11/build.rs:218:13:
  OpenBLAS build failed: Subprocess returns with non-zero status: 2

These errors typically arise because openblas-src is attempting to compile OpenBLAS from source using a GCC-fortran toolchain that injects a -lto_library flag incompatible with the Apple linker. However, the openblas-src crate in Rust or package builds in R defaults to building its own copy of OpenBLAS, invoking gcc/gfortran and ultimately passing -lto_library to ld.

Because Apple’s ld expects -lto_library libLTO.dylib (not -lto_library /usr/local/...), the link step fails.

Alternatively you may have non-standard installation sites of openblas, such as those on high-performance clusters.

At the moment, we have found only one foolproof method of compilation.

First, make sure you have openblas installed on your system. Second, clone rustytools in a shell in a directory you would like to install the source file i.e.:

cd yourdir/
git clone https://github.com/furlan-lab/rustytools.git

Then you have a few options:

Easiest - declare system variable, then install repo
export OPENBLAS_DIR=/your_openblas_dir

export OPENBLAS_DIR=/opt/homebrew/opt/openblas
# On Apple Silicon this is typically: /opt/homebrew/opt/openblas
# On Intel Macs this is typically: /usr/local/opt/openblas
# On an HPC this might be something like: /app/software/OpenBLAS/0.3.27-GCC-13.3.0

Then compile rustytools

cd rustytools
R CMD install .

Project-level .Renviron

Edit the file named .Renviron in your package root:

# On Apple Silicon:
export OPENBLAS_DIR=/opt/homebrew/opt/openblas
# On Intel Macs:
export OPENBLAS_DIR=/usr/local/opt/openblas
User-level .Renviron

Create/edit ~/.Renviron so that the variables are always available, even in non-interactive sessions (e.g., R CMD INSTALL, RStudio Build pane):

export OPENBLAS_DIR=/opt/homebrew/opt/openblas
# On Intel Macs:
export OPENBLAS_DIR=/usr/local/opt/openblas
Changing error message

If the nature of the error message changes to something like that seen below, delete the target directory in rustytools repo and try again

        cargo build --lib --release --manifest-path=./rust/Cargo.toml --target-dir ./rust/target
   Compiling openblas-src v0.10.11
   Compiling lax v0.15.0
error: could not find native static library `openblas`, perhaps an -L flag is missing?

error: could not compile `openblas-src` (lib) due to 1 previous error

Overview of Sequence Alignment

This vignette introduces the align() function from the rustytools package, which provides a Rust-backed, high-performance implementation of the Smith-Waterman algorithm. This algorithm supports global, local, semiglobal, and fully custom alignment modes using affine gap penalties.

The underlying engine is a generalized variant of the Smith-Waterman algorithm provided by the bio-edit crate, which offers:

  • Match/mismatch scoring via custom or predefined functions
  • Affine gap penalties (gap open + gap extension)
  • Flexible boundary behavior via customizable clipping penalties

Alignment Modes

rustytools supports three classic modes and a flexible custom mode:

  • Global alignment: No clipping is allowed; aligns full length of both sequences.
  • Local alignment: Finds the highest-scoring subsequences.
  • Semiglobal alignment: Global on one sequence, local on the other (e.g., full query to partial reference).
  • Custom alignment: Full control over boundary conditions via clip penalties.

Example: Semiglobal Alignment

Here is an example aligning a query sequence to a longer reference:

library(rustytools)

query <- "ACCGTGGAT"
reference <- "AAAAACCGTTGAT"

# Perform semiglobal alignment
score <- align(query, reference, atype = "semi-global", verbose = F)
score
## [1] 7

This returns an alignment score that considers:

  • Matches = +1
  • Mismatches = -1
  • Gap open = -5
  • Gap extend = -1
  • No penalty for clipping the start/end of the reference

Custom Alignment Configuration

Custom alignments allow full specification of prefix/suffix penalties. Example:

score <- align(
  query,
  reference,
  atype = "custom",
  match_score = 1,
  mismatch_score = -3,
  gap_open = -5,
  gap_extend = -1,
  xclip_prefix = -10, # allow prefix skips in query
  xclip_suffix = -9999, # require alignment to end
  yclip_prefix = 0,     # allow local match to ref
  yclip_suffix = 0
)
score

This setting mimics semiglobal alignment where the query can skip leading bases but must align to the end.

Use Cases in Genomics

Common applications include:

  • Mapping short variable sequences (e.g. CDR3s) to consensus references
  • Comparing VDJ segments to known clone sequences
  • Imputing mutations into reference haplotypes

See the sequence alignment vignette for an example using single cell RNA seq data in B cell leukemia

Summary

The alignment engine in rustytools is highly flexible and efficient. It is particularly well-suited to bioinformatics tasks involving fuzzy or partial sequence matching. For large datasets or many pairwise alignments, this Rust backend provides a dramatic speed advantage over traditional R-native solutions.

Other Tools

rustytools includes additional fast backends for:

  • FASTA reading – random access to large genome files See the Fasta vignette
  • MAGIC – imputation of scRNA-seq using diffusion maps See the MAGIC vignette
  • PCHA – archetypal decomposition of matrices See the PCHA vignette

Each tool is designed for high performance, especially on large-scale single-cell or genome-wide data.

Support and Citation

Developed by the Furlan Lab at Fred Hutchinson Cancer Center.

For questions or issues, please file a GitHub issue or contact the maintainers directly.