HTQL offers a simple, SQL-like query language tailored for effortlessly extracting and manipulating data from HTML documents. Designed for integration with existing SQL adapters, it allows users to select and filter elements directly from local files or remote URLs, ensuring a streamlined data extraction process.
HTQL, or Hyper Text Query Language, is an innovative SQL-like query language specifically designed for extracting meaningful data from HTML structures. Its user-friendly syntax makes it a breeze to integrate with other SQL adapters and applications, enhancing your data extraction capabilities.
Key Features
- SQL-like Syntax: HTQL employs a familiar SQL syntax, allowing users to seamlessly transition to HTML data querying without needing to learn a new language.
- Versatile Data Retrieval: Effortlessly extract specific elements from HTML documents, apply filters, and gather structured data in a standardized format.
- Remote Query Support: Extend your data extraction capabilities by querying data directly from remote HTML documents with simple URL specifications.
Usage Examples
HTQL's straightforward commands enable various operations on HTML data. Here are some fundamental usage examples to get you started:
Basic Select
Use the SELECT
statement to pull elements from local HTML files:
SELECT * FROM ./test.html -- Select all elements from a local file
SELECT p, div, h2 FROM ./test.html -- Select specific elements (p, div, h2)
SELECT * FROM ./test.html WHERE attributes.class = 'title'
SELECT * FROM ./test.html WHERE attributes IS NOT NULL
SELECT * FROM ./test.html WHERE attributes.class = 'title' OR attributes.id = 'content'
SELECT span FROM ./test.html WHERE attributes.class = 'title' AND attributes.id = 'content'
SELECT span FROM ./test.html WHERE attributes.class = 'title' AND NOT attributes.id = 'content'
Select from Remote URL
HTQL also allows querying directly from remote HTML documents, enabling direct access to online content:
SELECT p, div, h2 FROM https://example.com
In summary, HTQL offers a powerful yet simple solution for anyone looking to extract structured data from HTML. Whether you're working on data scraping, analysis, or integration with other systems, HTQL streamlines the process, making data extraction more efficient and intuitive.