EncypherAI is a Python package designed to embed and extract metadata in text using Unicode variation selectors. This innovative approach enables provenance tracking, timestamp verification, and custom data embedding without compromising the readability of AI-generated content, making it an invaluable tool for managing and verifying AI outputs.
EncypherAI: Embedding Metadata for AI-Generated Content
EncypherAI is a versatile Python package designed to seamlessly embed and extract metadata in text generated by AI models, leveraging Unicode variation selectors. This innovative approach ensures that textual readability remains unaffected while enabling vital functionalities such as:
- Provenance Tracking: Easily identify which AI model produced a specific piece of text.
- Timestamp Verification: Establish a clear record of when text was generated.
- Custom Metadata: Integrate additional information necessary for specific use cases.
- Streaming Support: Compatible with both streaming and non-streaming output from language models (LLMs).
Features and Functionality
The core functionality of EncypherAI simplifies the encoding and extraction of metadata, characterized by the following highlights:
Basic Usage
Encode metadata effortlessly with the following example:
from encypher.core.unicode_metadata import UnicodeMetadata
import time
# Encode metadata into text
encoded_text = UnicodeMetadata.embed_metadata(
text="This is a sample text generated by an AI model.",
model_id="gpt-4",
timestamp=int(time.time()), # Current Unix timestamp
target="whitespace" # Embed in whitespace characters
)
# Extract metadata from encoded text
metadata = UnicodeMetadata.extract_metadata(encoded_text)
print(f"Model: {metadata.get('model_id')}")
print(f"Timestamp: {metadata.get('timestamp')}")
Alternative Encoding Method
For those seeking flexibility, the MetadataEncoder class offers a straightforward way to encode additional fields while ensuring data integrity:
from encypher.core.metadata_encoder import MetadataEncoder
import time
# Initialize encoder with optional HMAC secret key
encoder = MetadataEncoder(secret_key="your-secret-key")
# Encode metadata
metadata = {
"model_id": "gpt-4",
"timestamp": int(time.time()), # Current Unix timestamp
"custom_field": "custom value"
}
encoded_text = encoder.encode_metadata(
text="This is a sample text generated by an AI model.",
metadata=metadata
)
# Decode and verify metadata
is_valid, extracted_metadata, clean_text = encoder.verify_text(encoded_text)
if is_valid:
print(f"Model: {extracted_metadata.get('model_id')}")
print(f"Timestamp: {extracted_metadata.get('timestamp')}")
print(f"Custom field: {extracted_metadata.get('custom_field')}")
Metadata Target Options
EncypherAI provides multiple options for embedding metadata to suit various preferences, including:
whitespace
: Embed within whitespace characters (default for minimal visibility).punctuation
: Embed in punctuation marks.first_letter
: Place within the first letter of each word.last_letter
: Position within the last letter of each word.all_characters
: Embed in all characters (not recommended due to visibility).none
: Disable embedding for testing or debugging purposes.
Security Features
Ensuring the integrity of embedded metadata is paramount. EncypherAI employs HMAC (Hash-Based Message Authentication Code) verification techniques to protect against tampering and confirm authenticity. Use the following snippet to validate data:
# Example of verifying metadata with HMAC
from encypher.core.unicode_metadata import UnicodeMetadata
encoder = UnicodeMetadata() # Uses secret key from environment variables
encoded_text = "AI-generated text with embedded metadata..."
# Returns (is_valid, metadata)
is_valid, metadata = encoder.extract_metadata(encoded_text)
if is_valid:
print(f"Verified metadata: {metadata}")
else:
print("Warning: Metadata has been tampered with!")
FastAPI Integration and CLI Support
For developers, EncypherAI also provides the ability to integrate with FastAPI, alongside an easy-to-use command-line interface for encoding and decoding metadata efficiently.
For those interested in contributing or needing support, refer to the guidelines outlined in the documentation and engage with the community via GitHub issues.
No comments yet.
Sign in to be the first to comment.