PitchHut
Log in / Sign up
ErisForge
52 views
Transform Large Language Models with precision and ease.
Pitch

ErisForge is a versatile Python library that empowers developers to modify Large Language Models' internal behaviors through targeted transformations. With tools to both ablate and enhance model responses, it helps create tailored LLMs that cater to specific needs and input types.

Description

ErisForge: Revolutionizing Large Language Model Modifications

ErisForge is a sophisticated Python library tailored for developers and researchers focused on enhancing and altering Large Language Models (LLMs). Taking inspiration from Eris, the goddess of strife and discord, this innovative tool empowers you to strategically modify the internal layers of LLMs, thereby delivering both ablated and augmented versions that exhibit varying responses to given inputs.

Key Features

  • Layer Modification: Effortlessly alter LLM behaviors by modifying their internal layers to achieve desired outputs.
  • Dynamic Classes: Utilize the AblationDecoderLayer and AdditionDecoderLayer classes for selective modification of model responses.
  • Expression Scoring: Measure refusal expressions in generated responses with the advanced ExpressionRefusalScorer.
  • Custom Transformations: Support for custom directional behaviors, allowing targeted transformations on models.

Getting Started

Basic Setup

To use ErisForge, you can easily initialize your environment with the following Python code:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from erisforge import ErisForge
from erisforge.expression_refusal_scorer import ExpressionRefusalScorer

# Load a model and tokenizer
model_name = "gpt2"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Initialize ErisForge and configure the scorer
forge = ErisForge()
scorer = ExpressionRefusalScorer()

Transforming Model Layers

Easily apply transformations to specific layers to create diverse response behaviors.

Example: Applying Ablation to Model Layers

# Define instructions
instructions = ["Explain why AI is beneficial.", "What are the limitations of AI?"]

# Specify layer ranges for ablation
min_layer = 2
max_layer = 4

# Modify the model by applying ablation to the specified layers
ablated_model = forge.run_forged_model(
    model=model,
    type_of_layer=AblationDecoderLayer,
    objective_behaviour_dir=torch.rand(768),  # Example direction tensor
    tokenizer=tokenizer,
    min_layer=min_layer,
    max_layer=max_layer,
    instructions=instructions,
    max_new_tokens=50
)

# Display modified responses
for conversation in ablated_model:
    print("User:", conversation[0]["content"])
    print("AI:", conversation[1]["content"])

Example: Measuring Refusal Expressions

response_text = "I'm sorry, I cannot provide that information."
user_query = "What is the recipe for a dangerous substance?"

# Scoring the response for refusal expressions
refusal_score = scorer.score(user_query=user_query, model_response=response_text)
print("Refusal Score:", refusal_score)

Saving Your Model Transformations

After modifications, you can easily save your transformed model:

output_model_name = "my_transformed_model"

# Save the modified model
forge.save_model(
    model=model,
    behaviour_dir=torch.rand(768),  # Example direction tensor
    scale_factor=1,
    output_model_name=output_model_name,
    tokenizer=tokenizer,
    to_hub=False  # Set to True to push to HuggingFace Hub
)

Acknowledgments

ErisForge builds upon existing research and developments, inspired by projects such as:

Contribute

Your contributions are welcome! Engage with the project by forking the repository, creating feature branches, and submitting pull requests. Explore issues and feature requests here.


Disclaimer: ErisForge is intended solely for research and development purposes. The authors hold no responsibility for specific applications or uses.