PitchHut
Log in / Sign up
rust-bfield
11 views
Efficiently store and retrieve key-value pairs with the B-field.
Pitch

B-field is an innovative probabilistic data structure designed for optimal key-value storage, offering precise lookups with minimal storage footprint. Ideal for applications like bioinformatics, it allows managing billions of nucleotide associations while maintaining low error rates and high efficiency. Explore the potential of B-field through our robust Rust implementation.

Description

rust-bfield is a cutting-edge implementation of the B-field probabilistic key-value data structure, designed primarily for storing associations between keys and values with extraordinary efficiency. Inspired by the mathematical principles of Bloom filters, B-fields provide reliable insertion and lookup operations while managing error rates effectively.

Key Features:

  • High Efficiency: The B-field allows the storage of associations using only 6-7 bytes per (kmer, value) pair for up to 100,000 unique values, catering to applications in various fields, including bioinformatics.
  • Accuracy in Lookups: While B-field lookups guarantee accurate returns for any previously inserted keys, they may return false positives for keys not stored in the structure. This tunable error feature allows optimizations as per application needs.
  • Space Requirements: B-fields excel in memory efficiency, enabling the storage of vast datasets with minimal overhead. For example, one can store a billion k-mers (DNA or RNA segments) along with bacterial taxonomic IDs in as little as 6.93 GB.

Implementation Details:

The B-field structure is unique in that it does not directly store keys or values, facilitating high space efficiency while managing errors such as false positives and indeterminacy errors effectively. Key operational capabilities include:

  • Creating a B-field using customizable parameters for optimized performance.
  • Inserting key-value pairs iteratively to build the dataset.
  • Efficiently loading existing B-fields for querying.

Example Usage:

Here is a simple usage snippet of how to create and work with a B-field:

use bfield::BField;

let tmp_dir = tempfile::tempdir().unwrap();

let mut bfield: BField<String> = BField::create(
  "/tmp/",
  "bfield",
  1_000_000,
  10,
  39,
  4,
  0.1,
  0.025,
  4,
  false,
  String::new()
).expect("Failed to build B-field");

for p in 0..4u32 {
    for i in 0..10_000u32 {
        bfield.insert(&i.to_be_bytes().to_vec(), i, p as usize);
    }
}

Limitations:

While rust-bfield is powerful, it currently has some limitations:

  • The implementation primarily supports u32 values, though mapping to other types is possible.
  • Parameter selection is manual at present, which might be addressed in future updates for improved usability.

For more detailed insights, users can refer to the extensive documentation hosted on docs.rs.

Conclusion:

Whether you are working on bioinformatics projects or other data-heavy applications, rust-bfield offers a highly efficient, space-saving solution that maximizes performance while minimizing storage requirements, promising significant savings and reliability for your data handling needs.