Two-Phase Architecture
Separate planning from execution. Build immutable plans with no I/O, then run them against actual data with full provenance.
Traditional data extraction pipelines silently coerce types, swallow errors, and make it impossible to trace how a value was derived. When something goes wrong, you’re left guessing.
OriginTS treats data extraction like compilation: build an immutable plan, execute it with full lineage tracking, and get structured failures instead of exceptions.
Two-Phase Architecture
Separate planning from execution. Build immutable plans with no I/O, then run them against actual data with full provenance.
Extraction System
A unified ExtractSpec type works across all formats — JSON, XLSX, CSV, YAML, HTML, Markdown, TOML. One API to learn, any format to extract.
Full Provenance
Every transformation step is recorded. Trace exactly how any output was derived — even when execution fails.
Explicit Failures
Seven structured failure kinds replace silent coercions and thrown exceptions. Fail fast, fail clearly.
import { Planner, load, run } from '@origints/core'
const plan = new Planner() .in(load({ name: 'Alice', age: 30 })) .emit((out, $) => out .add('name', $.get('name').string()) .add('age', $.get('age').number()) ) .compile()
const result = await run(plan)
if (result.ok) { console.log(result.value) // { name: 'Alice', age: 30 }}OriginTS supports extraction from multiple data formats, each as a separate package:
CSV
RFC 4180 parsing, header detection, column-by-name access, predicate-based filtering.
XLSX
Workbook navigation, cell predicates, eachSlice iteration, header-relative column lookup.
YAML
Single and multi-document parsing, anchor/alias preservation, full source tracking.
HTML
CSS selector queries, attribute extraction, Markdown conversion.
Plus Markdown, TOML, and Mammoth (DOCX).