ErisForge is a versatile Python library that empowers developers to modify Large Language Models' internal behaviors through targeted transformations. With tools to both ablate and enhance model responses, it helps create tailored LLMs that cater to specific needs and input types.
ErisForge: Revolutionizing Large Language Model Modifications
ErisForge is a sophisticated Python library tailored for developers and researchers focused on enhancing and altering Large Language Models (LLMs). Taking inspiration from Eris, the goddess of strife and discord, this innovative tool empowers you to strategically modify the internal layers of LLMs, thereby delivering both ablated and augmented versions that exhibit varying responses to given inputs.
Key Features
- Layer Modification: Effortlessly alter LLM behaviors by modifying their internal layers to achieve desired outputs.
- Dynamic Classes: Utilize the
AblationDecoderLayer
andAdditionDecoderLayer
classes for selective modification of model responses. - Expression Scoring: Measure refusal expressions in generated responses with the advanced
ExpressionRefusalScorer
. - Custom Transformations: Support for custom directional behaviors, allowing targeted transformations on models.
Getting Started
Basic Setup
To use ErisForge, you can easily initialize your environment with the following Python code:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from erisforge import ErisForge
from erisforge.expression_refusal_scorer import ExpressionRefusalScorer
# Load a model and tokenizer
model_name = "gpt2"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Initialize ErisForge and configure the scorer
forge = ErisForge()
scorer = ExpressionRefusalScorer()
Transforming Model Layers
Easily apply transformations to specific layers to create diverse response behaviors.
Example: Applying Ablation to Model Layers
# Define instructions
instructions = ["Explain why AI is beneficial.", "What are the limitations of AI?"]
# Specify layer ranges for ablation
min_layer = 2
max_layer = 4
# Modify the model by applying ablation to the specified layers
ablated_model = forge.run_forged_model(
model=model,
type_of_layer=AblationDecoderLayer,
objective_behaviour_dir=torch.rand(768), # Example direction tensor
tokenizer=tokenizer,
min_layer=min_layer,
max_layer=max_layer,
instructions=instructions,
max_new_tokens=50
)
# Display modified responses
for conversation in ablated_model:
print("User:", conversation[0]["content"])
print("AI:", conversation[1]["content"])
Example: Measuring Refusal Expressions
response_text = "I'm sorry, I cannot provide that information."
user_query = "What is the recipe for a dangerous substance?"
# Scoring the response for refusal expressions
refusal_score = scorer.score(user_query=user_query, model_response=response_text)
print("Refusal Score:", refusal_score)
Saving Your Model Transformations
After modifications, you can easily save your transformed model:
output_model_name = "my_transformed_model"
# Save the modified model
forge.save_model(
model=model,
behaviour_dir=torch.rand(768), # Example direction tensor
scale_factor=1,
output_model_name=output_model_name,
tokenizer=tokenizer,
to_hub=False # Set to True to push to HuggingFace Hub
)
Acknowledgments
ErisForge builds upon existing research and developments, inspired by projects such as:
Contribute
Your contributions are welcome! Engage with the project by forking the repository, creating feature branches, and submitting pull requests. Explore issues and feature requests here.
Disclaimer: ErisForge is intended solely for research and development purposes. The authors hold no responsibility for specific applications or uses.