Skip to content

@origints/html

HTML parsing with CSS selector queries, attribute extraction, and Markdown conversion.

Terminal window
npm install @origints/html @origints/core
  • Parse HTML with source position tracking
  • CSS selector queries (via hast-util-select)
  • Type-safe extractors for elements and attributes
  • Convert HTML to Markdown
  • Navigation API for tree traversal
import { Planner, loadFile, run } from '@origints/core'
import { parseHtml } from '@origints/html'
const plan = new Planner()
.in(loadFile('page.html'))
.mapIn(parseHtml())
.emit((out, $) => out
.add('title', $.select('h1').text())
.add('href', $.select('a').attr('href'))
)
.compile()
const result = await run(plan, { readFile, registry })
// result.value: { title: 'Welcome', href: '/about' }
const plan = new Planner()
.in(loadFile('page.html'))
.mapIn(parseHtml())
.emit((out, $) => out
.add('items', $.select('ul').selectAll('li', node => node.text()))
)
.compile()
.emit((out, $) => out
.add('links', $.selectAll('a', node => ({
kind: 'object',
properties: {
href: node.attr('href'),
text: node.text(),
},
})))
)
.emit((out, $) => out
.add('sections', $.select('main').children(node => node.text()))
)
import { parseHtmlImpl, HtmlNode } from '@origints/html'
const node = parseHtmlImpl.execute(htmlString) as HtmlNode
const title = node.select('h1')
if (title.ok) console.log(title.value.text())
const items = node.selectAll('li')
for (const item of items) console.log(item.text())
import { parseHtmlImpl, toMarkdown } from '@origints/html'
const node = parseHtmlImpl.execute(htmlContent) as HtmlNode
const markdown = toMarkdown(node)
ExportDescription
parseHtml(options?)Transform AST for Planner.mapIn()
parseHtmlImplSync transform implementation
parseHtmlAsyncImplAsync transform implementation
registerHtmlTransforms(registry)Register HTML transforms
HtmlNodeNavigable wrapper with CSS selector support
toMarkdown(node)Convert HTML to Markdown
toJson(node, options?)Convert to JSON

MIT