Extraction System
All data extraction — regardless of format — uses a single generic type. JSON, XLSX, CSV, YAML, HTML, Markdown, and TOML all produce the same spec shape.
ExtractSpec
Section titled “ExtractSpec”interface ExtractSpec<S = unknown, E = unknown> { readonly kind: 'extract' readonly format: string // 'json', 'xlsx', 'csv', 'yaml', ... readonly steps: readonly S[] // format-specific navigation readonly extract: E // format-specific terminal extraction}Each format fills in its own step and extraction types:
// JSON{ kind: 'extract', format: 'json', steps: ['user', 'name'], extract: 'string' }
// XLSX{ kind: 'extract', format: 'xlsx', steps: [{ kind: 'sheet', name: 'Sales' }, { kind: 'cell', ref: 'B2' }], extract: 'number' }Spec executors
Section titled “Spec executors”Each format registers an executor that handles its navigation steps and terminal extraction. JSON is not a special case — it’s just another registered executor.
registerSpecExecutor('json', jsonExecutor)registerSpecExecutor('xlsx', xlsxExecutor)registerSpecExecutor('csv', csvExecutor)executeSpec() dispatches on spec.format to the registered executor.
Compositional specs
Section titled “Compositional specs”Specs compose into larger structures:
| Spec | Purpose |
|---|---|
ExtractSpec | Terminal extraction from a data source |
ArraySpec | Map over a collection |
ObjectSpec | Construct an object from named properties |
LiteralSpec | Constant value |
MatchSpec | Conditional extraction based on runtime predicates |
ConcatSpec | Concatenate multiple array results |
PanicSpec | Signal an unrecoverable condition |
TrySpec | Ordered fallback chain |
MapSpec | Transform an extracted value |
GuardSpec | Validate an extracted value |
The full Spec union:
type Spec = | ExtractSpec | ArraySpec | ObjectSpec | LiteralSpec | MatchSpec | PanicSpec | ConcatSpec | TrySpec | MapSpec | GuardSpecConcatSpec
Section titled “ConcatSpec”ConcatSpec concatenates multiple ArraySpec results into a single flat array. Useful for combining data from separate regions of a document.
import { concat } from '@origints/core'
concat( header.down().eachSlice('down', hasData, investmentRow('realized')), totalRealized.down().eachSlice('down', hasData, investmentRow('unrealized')),)MatchSpec
Section titled “MatchSpec”MatchSpec enables conditional extraction based on runtime predicates. Evaluate the data at a given path, match against cases, and extract differently depending on the result. A PanicSpec can serve as the default to signal unrecoverable conditions.
Building specs
Section titled “Building specs”In practice, you rarely construct spec objects directly. The builder API ($ parameter in .emit()) provides a fluent interface:
.emit((out, $) => out .add('name', $.get('name').string()) // ExtractSpec .add('items', $.get('items').array(i => i.string())) // ArraySpec .addLiteral('version', '1.0.0') // LiteralSpec)Each format package provides its own builder that produces ExtractSpec instances with the right step and extraction types.