Practical Llama 3 provides a straightforward Java implementation for advanced AI inference in one single file. Compatible with Llama 3.1 and 3.2, it features minimal dependencies and optimizes performance using GraalVM. Ideal for developers looking to enhance their compiler skills and engage in cutting-edge AI development, this tool opens doors for efficient integration and experimentation.
Llama3.java is a cutting-edge implementation of the Llama 3 inference model designed for Java developers. Built as a single, dependency-free file, this project allows for seamless integration of Llama 3, 3.1, and 3.2 capabilities into Java applications. Experience the best of what modern AI can offer with high-performance features including:
- GGUF Format Parser: Efficiently handle model formats within your Java environment.
- Llama 3 Tokenizer: Utilize a tokenizer based on the minbpe algorithm for optimal text processing.
- Advanced Inference: Execute inference tasks employing Grouped-Query Attention, with support for both Llama 3.1's ad-hoc RoPE scaling and Llama 3.2's tie word embeddings.
- Quantization Support: Leverage Q8_0 and Q4_0 quantizations to enhance model efficiency.
- Vector API Integration: Accelerate matrix-vector multiplication routines using Java's Vector API.
- User-Friendly CLI: Operate in two distinct modes —
--chat
for interactive conversations and--instruct
for commanding the model. - GraalVM Native Image: Compile to native applications for ultrafast startup times and reduced memory requirements.
- AOT Model Preloading: Achieve instant inference with minimal time-to-first-token (TTFT) by preloading models.
This project is a powerful successor to llama2.java, building on the foundation laid by llama2.c created by Andrej Karpathy and his renowned educational resources. Additionally, Llama3.java serves as a platform for experimenting with compiler optimization features, particularly leveraging Graal compiler enhancements.
Interactive Features
Visit our showcase to see the interactive --chat
mode in action!
Engaging Presentation
Learn more in our presentation titled "Practical LLM inference in modern Java" presented at Devoxx Belgium, 2024.
Harness the full potential of AI inference in your Java projects with Llama3.java. Whether for testing advanced features or driving applications, this repository empowers developers to integrate sophisticated AI capabilities efficiently.