PitchHut logo
EncypherAI
by encypherai
Effortlessly embed and extract metadata for AI-generated content.
Pitch

EncypherAI is a Python package designed to embed and extract metadata in text using Unicode variation selectors. This innovative approach enables provenance tracking, timestamp verification, and custom data embedding without compromising the readability of AI-generated content, making it an invaluable tool for managing and verifying AI outputs.

Description

EncypherAI: Embedding Metadata for AI-Generated Content

EncypherAI is a versatile Python package designed to seamlessly embed and extract metadata in text generated by AI models, leveraging Unicode variation selectors. This innovative approach ensures that textual readability remains unaffected while enabling vital functionalities such as:

  • Provenance Tracking: Easily identify which AI model produced a specific piece of text.
  • Timestamp Verification: Establish a clear record of when text was generated.
  • Custom Metadata: Integrate additional information necessary for specific use cases.
  • Streaming Support: Compatible with both streaming and non-streaming output from language models (LLMs).

Features and Functionality

The core functionality of EncypherAI simplifies the encoding and extraction of metadata, characterized by the following highlights:

Basic Usage

Encode metadata effortlessly with the following example:

from encypher.core.unicode_metadata import UnicodeMetadata
import time

# Encode metadata into text
encoded_text = UnicodeMetadata.embed_metadata(
    text="This is a sample text generated by an AI model.",
    model_id="gpt-4",
    timestamp=int(time.time()),  # Current Unix timestamp
    target="whitespace"  # Embed in whitespace characters
)

# Extract metadata from encoded text
metadata = UnicodeMetadata.extract_metadata(encoded_text)
print(f"Model: {metadata.get('model_id')}")
print(f"Timestamp: {metadata.get('timestamp')}")

Alternative Encoding Method

For those seeking flexibility, the MetadataEncoder class offers a straightforward way to encode additional fields while ensuring data integrity:

from encypher.core.metadata_encoder import MetadataEncoder
import time

# Initialize encoder with optional HMAC secret key
encoder = MetadataEncoder(secret_key="your-secret-key")

# Encode metadata
metadata = {
    "model_id": "gpt-4",
    "timestamp": int(time.time()),  # Current Unix timestamp
    "custom_field": "custom value"
}
encoded_text = encoder.encode_metadata(
    text="This is a sample text generated by an AI model.",
    metadata=metadata
)

# Decode and verify metadata
is_valid, extracted_metadata, clean_text = encoder.verify_text(encoded_text)
if is_valid:
    print(f"Model: {extracted_metadata.get('model_id')}")
    print(f"Timestamp: {extracted_metadata.get('timestamp')}")
    print(f"Custom field: {extracted_metadata.get('custom_field')}")

Metadata Target Options

EncypherAI provides multiple options for embedding metadata to suit various preferences, including:

  • whitespace: Embed within whitespace characters (default for minimal visibility).
  • punctuation: Embed in punctuation marks.
  • first_letter: Place within the first letter of each word.
  • last_letter: Position within the last letter of each word.
  • all_characters: Embed in all characters (not recommended due to visibility).
  • none: Disable embedding for testing or debugging purposes.

Security Features

Ensuring the integrity of embedded metadata is paramount. EncypherAI employs HMAC (Hash-Based Message Authentication Code) verification techniques to protect against tampering and confirm authenticity. Use the following snippet to validate data:

# Example of verifying metadata with HMAC
from encypher.core.unicode_metadata import UnicodeMetadata

encoder = UnicodeMetadata()  # Uses secret key from environment variables
encoded_text = "AI-generated text with embedded metadata..."

# Returns (is_valid, metadata)
is_valid, metadata = encoder.extract_metadata(encoded_text)

if is_valid:
    print(f"Verified metadata: {metadata}")
else:
    print("Warning: Metadata has been tampered with!")

FastAPI Integration and CLI Support

For developers, EncypherAI also provides the ability to integrate with FastAPI, alongside an easy-to-use command-line interface for encoding and decoding metadata efficiently.

For those interested in contributing or needing support, refer to the guidelines outlined in the documentation and engage with the community via GitHub issues.

0 comments

No comments yet.

Sign in to be the first to comment.