PitchHut
Log in / Sign up
nokolexbor
44 views
A high-performance HTML5 parser for Ruby using Lexbor.
Pitch

nokolexbor is an efficient HTML5 parser written in Ruby, leveraging the power of Lexbor. It provides seamless support for CSS selectors and XPath, making it an ideal tool for developers who require fast and reliable HTML parsing in their applications.

Description

Nokolexbor is a high-performance HTML5 parser for Ruby that serves as a drop-in replacement for the popular Nokogiri library. Designed with speed in mind, Nokolexbor boasts an impressive 5.2x speed increase for HTML parsing and up to 997x faster CSS selector processing compared to Nokogiri, making it an essential tool for developers looking to optimize their parsing tasks.

This powerful gem is built on the efficient Lexbor engine and is compatible with both CSS selectors and XPath, offering a familiar API that mirrors that of Nokogiri. This means you can easily switch to Nokolexbor without significant code changes.

Key Features:

  • Nokogiri-Compatible APIs: You can use Nokolexbor seamlessly if you're familiar with Nokogiri.
  • High-performance Parsing: Enjoy rapid HTML parsing and DOM manipulation.
  • CSS Selectors Engine: Leverage a fast CSS selectors engine for efficient node searches.
  • XPath Support: Integrated XPath search engine for advanced querying capabilities.
  • Text Nodes Support: Directly select text nodes using ::text within your CSS selectors.

Quick Start

Here’s a brief example of how to get started with Nokolexbor:

require 'nokolexbor'
require 'open-uri'

# Parse an HTML document
 doc = Nokolexbor::HTML(URI.open('https://github.com/serpapi/nokolexbor'))

# Search for specific nodes using CSS selectors
doc.css('#readme h1', 'article h2', 'p[dir=auto]').each do |node|
  puts node.content
end

# Search for text nodes
doc.css('#readme p > ::text').each do |text|
  puts text.content
end

# Search nodes using XPath
doc.xpath('//div[@id="readme"]//h1', '//article//h2').each do |node|
  puts node.content
end

Searching Methods Overview:

  • CSS Selectors (css / at_css): Based on Lexbor, significantly faster than libxml2 methods. Use ::text for text nodes.
  • XPath (xpath / at_xpath): Remains consistent with Nokogiri’s API for XPath syntax.
  • Nokogiri CSS: If Nokogiri is installed, enables mixed syntax selection with better compatibility.

Performance Benchmarks

Tests show Nokolexbor dramatically outperforms Nokogiri in both parsing speed and CSS selector execution. For instance:

OperationNokolexbor (iters/s)Nokogiri (iters/s)
Parsing487.693.5
at_css50,798.850.9
css7,437.652.3
at_xpath57.153.2
xpath51.558.4

By choosing Nokolexbor, you can achieve unparalleled parsing efficiency alongside an intuitive API—perfect for building responsive applications that handle large volumes of HTML data. Experience the swift parsing capabilities of Nokolexbor and take your Ruby projects to another level.