Bend is a high-level programming language designed for massively parallel computing. Enjoy the expressive power of Python and Haskell while effortlessly scaling across thousands of GPU cores. With Bend, you can focus on your code without the complexity of explicit parallelism, ensuring faster performance for your applications.
Bend - A high-level, massively parallel programming language designed for users seeking to harness the power of parallel computing with ease and efficiency.
Overview
Bend is similar in feel and feature set to expressive languages like Python and Haskell, offering fast object allocations and full support for higher-order functions with closures, unrestricted recursion, and continuations. The language scales seamlessly like CUDA, enabling it to run on massively parallel hardware such as GPUs, achieving nearly linear acceleration based on core count without the need for complex parallelism annotations, such as thread creation or lock management. Bend operates on the robust HVM2 runtime, enhancing its capabilities.
Key Features
- Performance Scaling: Bend is tailored for exceptional scaling performance, accommodating over 10,000 concurrent threads and ensuring efficient resource utilization. While the current version may exhibit lower single-core performance, advancements in code generation and optimization techniques are expected to deliver substantial improvements.
- NVIDIA GPU Support: Currently, Bend only supports NVIDIA GPUs, making it an ideal choice for users invested in NVIDIA's hardware ecosystem.
- Mac and Linux Support: Although Windows support is in development, users can utilize WSL2 as an alternative for running Bend.
Getting Started with Bend
Running a Bend program is straightforward. You can execute your Bend scripts using the C interpreter for parallel execution:
bend run <file.bend>
Or utilize the Rust interpreter for sequential execution:
bend run-rs <file.bend>
For maximum parallel performance, leverage the CUDA interpreter:
bend run-cu <file.bend>
Example Code Snippets
Here's how you can sum numbers using a sequential approach:
# Sequential Sum Function
def Sum(start, target):
if start == target:
return start
else:
return start + Sum(start + 1, target)
# Main function
def main():
return Sum(1, 1_000_000)
In contrast, for a parallelizable approach, consider:
# Parallel Sum Function
def Sum(start, target):
if start == target:
return start
else:
half = (start + target) / 2
left = Sum(start, half)
right = Sum(half + 1, target)
return left + right
# Main function
def main():
return Sum(1, 1_000_000)
Performance Showcase
Experience the impressive performance of Bend through examples such as sorting algorithms. For instance, the Bitonic Sorter executed on a GPU like the NVIDIA RTX 4090 demonstrates significant speed improvements:
- CPU execution (Apple M3 Max): 12.15 seconds
- C execution: 0.96 seconds
- CUDA execution: 0.21 seconds
Additional Resources
For further information on Bend, check out:
- HVM2 Technology Paper for an in-depth understanding of the runtime.
- Access guides and documentation on GUIDE.md and features at FEATURES.md.
- Join the community at HigherOrderCO and engage with peers on Discord!