PitchHut
Log in / Sign up
noodles
1 views
Efficiently handle bioinformatics data with Rust's noodles library.
Pitch

Introducing noodles, a suite of Rust libraries tailored for bioinformatics file formats like BAM, VCF, and FASTQ. Designed for accuracy and compliance, noodles incorporates asynchronous I/O and optional features to enhance performance. Whether you're in research or data processing, integrate noodles effortlessly into your projects and streamline your bioinformatics workflows.

Description

noodles is a robust set of bioinformatics I/O libraries written in Rust, designed to facilitate the handling of diverse bioinformatics file formats with compliance to specifications where applicable. Currently, this library supports a wide array of formats including BAM 1.6, BCF 2.2, BED, BGZF, CRAM 3.0/3.1, CSI, FASTA, FASTQ, GFF3, GTF 2.2, htsget 1.3, refget 2.0, SAM 1.6, tabix, and VCF 4.3/4.4.

Key Features

  • Modular Architecture: The noodles library is split into multiple crates by file format, allowing users to include only the necessary components in their projects.

  • Experimentation Ready: Though still considered experimental, noodles offers early versions for use in projects, ensuring cutting-edge capabilities for developers.

  • Top-level Meta Crate: Simplify dependencies by using the top-level noodles crate and enabling specific features for desired formats. For instance, adding BAM support can be done easily with:

    cargo add noodles --features bam
    
  • Flexible Imports: Each feature can be imported conveniently, allowing for enhanced code organization. For example:

    use noodles::bam;
    
  • Asynchronous I/O Support: Leverage asynchronous I/O capabilities by enabling optional features with the async flag across a range of file formats including BAM, BCF, and VCF.

  • High Performance Compression with libdeflate: Enhance encoding and decoding performance on DEFLATE streams with libdeflate for formats like BGZF and CRAM.

Usage Examples

To explore the potential of the noodles library, users can run various examples found within individual crates. For example, after cloning the repository, execute the following to view available examples:

cargo run --release --example

This will list executable examples which can be run with specific parameters, such as:

cargo run --release --example bam_write > sample.bam
cargo run --release --example bam_read_header sample.bam

For developers interested in bioinformatics data processing, noodles offers a compelling solution that combines performance with ease of use, making it a valuable addition to any Rust-based bioinformatics toolkit.