Bend - Bend: Unleashing Next-Level Parallel Programming with Ease

Bend

7 views

Bend: Unleashing Next-Level Parallel Programming with Ease

Pitch

Bend is a high-level programming language designed for massively parallel computing. Enjoy the expressive power of Python and Haskell while effortlessly scaling across thousands of GPU cores. With Bend, you can focus on your code without the complexity of explicit parallelism, ensuring faster performance for your applications.

Description

Bend - A high-level, massively parallel programming language designed for users seeking to harness the power of parallel computing with ease and efficiency.

Overview

Bend is similar in feel and feature set to expressive languages like Python and Haskell, offering fast object allocations and full support for higher-order functions with closures, unrestricted recursion, and continuations. The language scales seamlessly like CUDA, enabling it to run on massively parallel hardware such as GPUs, achieving nearly linear acceleration based on core count without the need for complex parallelism annotations, such as thread creation or lock management. Bend operates on the robust HVM2 runtime, enhancing its capabilities.

Key Features

Performance Scaling: Bend is tailored for exceptional scaling performance, accommodating over 10,000 concurrent threads and ensuring efficient resource utilization. While the current version may exhibit lower single-core performance, advancements in code generation and optimization techniques are expected to deliver substantial improvements.
NVIDIA GPU Support: Currently, Bend only supports NVIDIA GPUs, making it an ideal choice for users invested in NVIDIA's hardware ecosystem.
Mac and Linux Support: Although Windows support is in development, users can utilize WSL2 as an alternative for running Bend.

Getting Started with Bend

Running a Bend program is straightforward. You can execute your Bend scripts using the C interpreter for parallel execution:

bend run <file.bend>

Or utilize the Rust interpreter for sequential execution:

bend run-rs <file.bend>

For maximum parallel performance, leverage the CUDA interpreter:

bend run-cu <file.bend>

Example Code Snippets

Here's how you can sum numbers using a sequential approach:

# Sequential Sum Function
def Sum(start, target):
  if start == target:
    return start
  else:
    return start + Sum(start + 1, target)

# Main function
def main():
  return Sum(1, 1_000_000)

In contrast, for a parallelizable approach, consider:

# Parallel Sum Function
def Sum(start, target):
  if start == target:
    return start
  else:
    half = (start + target) / 2
    left = Sum(start, half)
    right = Sum(half + 1, target)
    return left + right

# Main function
def main():
  return Sum(1, 1_000_000)

Performance Showcase

Experience the impressive performance of Bend through examples such as sorting algorithms. For instance, the Bitonic Sorter executed on a GPU like the NVIDIA RTX 4090 demonstrates significant speed improvements:

CPU execution (Apple M3 Max): 12.15 seconds
C execution: 0.96 seconds
CUDA execution: 0.21 seconds

Additional Resources

For further information on Bend, check out:

HVM2 Technology Paper for an in-depth understanding of the runtime.
Access guides and documentation on GUIDE.md and features at FEATURES.md.
Join the community at HigherOrderCO and engage with peers on Discord!