Getting Started with rustytools
introduction.Rmd
Installing Rust
First you need to have an updated Rust installation. Go to this site to learn how to install Rust.
Installing rustytools
You will need to have the devtools package installed…
devtools::install_github("furlan-lab/rustytools")
Compilation errors with openblas
When building R packages that depend on BLAS/LAPACK functionality (e.g., via openblas-src or RcppEigen), you may encounter linker errors such as:
ld: -lto_library library filename must be 'libLTO.dylib'
OpenBLAS build failed: Subprocess returns with non-zero status: 2
or
thread 'main' panicked at /private/var/folders/63/z3dzfmg53g31179qxhwt5cq00000gn/T/RtmpIhMjfg/R.INSTALL17d5e6ffc8bf0/rustytools/src/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/openblas-src-0.10.11/build.rs:218:13:
OpenBLAS build failed: Subprocess returns with non-zero status: 2
These errors typically arise because openblas-src is
attempting to compile OpenBLAS from source using a GCC-fortran toolchain
that injects a -lto_library
flag incompatible with the
Apple linker. However, the openblas-src crate in Rust
or package builds in R defaults to building its own copy of OpenBLAS,
invoking gcc
/gfortran
and ultimately passing
-lto_library
to ld
.
Because Apple’s ld
expects
-lto_library libLTO.dylib
(not
-lto_library /usr/local/...
), the link step fails.
Alternatively you may have non-standard installation sites of openblas, such as those on high-performance clusters.
At the moment, we have found only one foolproof method of compilation.
First, make sure you have openblas installed on your system. Second, clone rustytools in a shell in a directory you would like to install the source file i.e.:
cd yourdir/
git clone https://github.com/furlan-lab/rustytools.git
Then you have a few options:
Easiest - declare system variable, then install repo
export OPENBLAS_DIR=/your_openblas_dir
export OPENBLAS_DIR=/opt/homebrew/opt/openblas
# On Apple Silicon this is typically: /opt/homebrew/opt/openblas
# On Intel Macs this is typically: /usr/local/opt/openblas
# On an HPC this might be something like: /app/software/OpenBLAS/0.3.27-GCC-13.3.0
Then compile rustytools
cd rustytools
R CMD install .
Project-level .Renviron
Edit the file named .Renviron
in your package root:
# On Apple Silicon:
export OPENBLAS_DIR=/opt/homebrew/opt/openblas
# On Intel Macs:
export OPENBLAS_DIR=/usr/local/opt/openblas
User-level .Renviron
Create/edit ~/.Renviron
so that the variables are always
available, even in non-interactive sessions (e.g.,
R CMD INSTALL
, RStudio Build pane):
Changing error message
If the nature of the error message changes to something like that seen below, delete the target directory in rustytools repo and try again
cargo build --lib --release --manifest-path=./rust/Cargo.toml --target-dir ./rust/target
Compiling openblas-src v0.10.11
Compiling lax v0.15.0
error: could not find native static library `openblas`, perhaps an -L flag is missing?
error: could not compile `openblas-src` (lib) due to 1 previous error
Overview of Sequence Alignment
This vignette introduces the align()
function from the
rustytools
package, which provides a Rust-backed,
high-performance implementation of the Smith-Waterman algorithm. This
algorithm supports global, local, semiglobal, and fully custom alignment
modes using affine gap penalties.
The underlying engine is a generalized variant of the Smith-Waterman
algorithm provided by the bio-edit
crate, which offers:
- Match/mismatch scoring via custom or predefined functions
- Affine gap penalties (gap open + gap extension)
- Flexible boundary behavior via customizable clipping penalties
Alignment Modes
rustytools
supports three classic modes and a flexible
custom mode:
- Global alignment: No clipping is allowed; aligns full length of both sequences.
- Local alignment: Finds the highest-scoring subsequences.
- Semiglobal alignment: Global on one sequence, local on the other (e.g., full query to partial reference).
- Custom alignment: Full control over boundary conditions via clip penalties.
Example: Semiglobal Alignment
Here is an example aligning a query sequence to a longer reference:
library(rustytools)
query <- "ACCGTGGAT"
reference <- "AAAAACCGTTGAT"
# Perform semiglobal alignment
score <- align(query, reference, atype = "semi-global", verbose = F)
score
## [1] 7
This returns an alignment score that considers:
- Matches = +1
- Mismatches = -1
- Gap open = -5
- Gap extend = -1
- No penalty for clipping the start/end of the reference
Custom Alignment Configuration
Custom alignments allow full specification of prefix/suffix penalties. Example:
score <- align(
query,
reference,
atype = "custom",
match_score = 1,
mismatch_score = -3,
gap_open = -5,
gap_extend = -1,
xclip_prefix = -10, # allow prefix skips in query
xclip_suffix = -9999, # require alignment to end
yclip_prefix = 0, # allow local match to ref
yclip_suffix = 0
)
score
This setting mimics semiglobal alignment where the query can skip leading bases but must align to the end.
Use Cases in Genomics
Common applications include:
- Mapping short variable sequences (e.g. CDR3s) to consensus references
- Comparing VDJ segments to known clone sequences
- Imputing mutations into reference haplotypes
See the sequence alignment vignette for an example using single cell RNA seq data in B cell leukemia
Summary
The alignment engine in rustytools
is highly flexible
and efficient. It is particularly well-suited to bioinformatics tasks
involving fuzzy or partial sequence matching. For large datasets or many
pairwise alignments, this Rust backend provides a dramatic speed
advantage over traditional R-native solutions.
Other Tools
rustytools
includes additional fast backends for:
- FASTA reading – random access to large genome files See the Fasta vignette
- MAGIC – imputation of scRNA-seq using diffusion maps See the MAGIC vignette
- PCHA – archetypal decomposition of matrices See the PCHA vignette
Each tool is designed for high performance, especially on large-scale single-cell or genome-wide data.
Support and Citation
Developed by the Furlan Lab at Fred Hutchinson Cancer Center.
For questions or issues, please file a GitHub issue or contact the maintainers directly.