Xilt is a powerful utility designed for parsing Common and Combined Log Format (CLF) log files and storing them in SQLite databases. Optimized for concurrency, it allows for efficient handling of large log datasets, enabling deeper analysis and insights with minimal effort while providing intuitive command-line usage.
xilt is a robust utility designed to parse Common Log Format (CLF) and Combined Log Format log files, efficiently storing the parsed data in an SQLite database for in-depth analysis. This tool is optimized with concurrency in mind, allowing for faster processing of large log files.
Features
- Efficient Parsing: Supports parsing of extensive log files, ensuring minimal processing time.
- SQLite Integration: Stores parsed log entries in an SQLite database, enabling seamless data analysis.
- Concurrency Support: Built to utilize Go's concurrency features, allowing optimal performance during log processing.
Usage Instructions
To use xilt, simply run the following command:
xilt [logFilePath] [dbFilePath]
Where logFilePath
is the path to the log file (default: ./access.log
) and dbFilePath
is the path where the SQLite database will be created (default: ./logs.db
).
Command-Line Flags
xilt offers several command-line options to customize the log parsing:
$ xilt -h
Usage of xilt:
-avgLogSize float Defines the average size of one log in MB. Used for calculating the number of goroutines to spin up. (default 0.001)
-batchSize int Defines the batch size. Used for calculating the number of goroutines to spin up. (default 5000)
-i Indicates whether indexes should be created in the parsed logs' table.
-maxMemUsage int Defines the maximum allowed memory usage in Megabytes. Used for calculating the number of goroutines to spin up. (default 100)
-v Activates verbose mode.
Creating Indexes
When the -i
flag is utilized, the following indexes will be created to enhance data retrieval performance:
- IP
- Method
- Referer
- Route
- TimestampUTC
- TimestampUTC + IP
This feature may evolve over time, with improvements based on user feedback and additional configurations.
Example Log Entry
Here is an example of a typical log entry:
192.168.1.101 user-identifier alice [12/Nov/2023:14:23:45 -0400] "POST /index.html?query=search HTTP/1.1" 404 1234 "https://example.com" "Mozilla/5.0"
After running xilt with the index flag:
$ xilt -i logs.txt logs.db
The output reflects the successful parsing and indexing of the log entries, making them readily accessible for further analysis.
Performance Benchmarking
In testing with a substantial log file containing over 17 million entries, xilt demonstrates significant efficiency.
- Without Indexes: Average parsing time of 1 minute and 55 seconds.
- With Indexes: Average time increases to approximately 4 minutes due to index creation, but benefits long-term query performance.
Memory Performance
Using default settings, xilt achieves an average memory usage of around 85 MB while processing large datasets, striking a balance between performance and resource consumption.
Future Enhancements
Future updates may include:
- Support for JSON log formats.
- Parsing of individual parameters with dedicated column creation for unique entries.
- Further optimization of memory usage settings.
- Enhanced test coverage to ensure reliability and performance.
No comments yet.
Sign in to be the first to comment.