Skip to content

Lineage, Source Maps & Benchmarking

OriginTS tracks the provenance of every value through its lineage system. Every transformation step is recorded. Lineage is available regardless of success or failure.

A provenance graph built during execution. Records inputs, outputs, and transformations at each step.

import { run } from '@origints/core'
const result = await run(plan)
// result.lineage is always available, even on failure

formatLineage produces a structured summary of each execution step with optional data previews:

import { formatLineage } from '@origints/core'
const formatted = formatLineage(result.lineage, plan.ast)
// formatted.steps[0].type === 'source'
// formatted.steps[0].output === '{"name":"Alice"}'

Step types include source, transform, emit, merge, and match.

formatLineageAsString returns a multi-line string suitable for logging or debugging:

import { formatLineageAsString } from '@origints/core'
const str = formatLineageAsString(result.lineage, plan.ast)
console.log(str)
// Pipeline Execution (3 steps)
// Step 1: SOURCE ...
// Step 2: TRANSFORM ...
// Step 3: EMIT ...

Derive a flat record mapping each output path to its origin. Two modes are available:

Uses [*] wildcards for array positions. No execution needed.

import { deriveSourceMap, formatSourceMap } from '@origints/core'
const map = deriveSourceMap(plan.ast)
// map['name'] === { source: 'direct', transforms: [], inputPath: ['name'], extractType: 'string' }
const formatted = formatSourceMap(map)
// formatted['name'] === 'direct → name → as string'

Concrete array indices, only actually-run branches. Requires lineage from execution.

import { deriveRuntimeSourceMap } from '@origints/core'
const runtimeMap = deriveRuntimeSourceMap(result.lineage, plan.ast)

Source maps are remapped through mapOut transforms, so output paths correctly trace back through grouping, renaming, and other structural changes.

Enable per-node execution timing by passing benchmark: true in the run context. When enabled, run() wraps each AST node’s execution with performance.now() calls and includes structured timing data in the result.

import { run, formatBenchmark } from '@origints/core'
const result = await run(plan, { benchmark: true })
// result.benchmark is populated on both success and failure
if (result.benchmark) {
console.log(formatBenchmark(result.benchmark))
}

The Benchmark object contains:

  • totalMs — wall-clock time for the entire run() call
  • nodes — one entry per AST node with nodeId, kind, description, and durationMs
  • phases — aggregated timing per node kind (source, transform, emit, mapOut, etc.)

Emit nodes include an extractions array with per-extraction timing (one entry per .add() call). MapOut nodes include a transforms array with per-transform timing (one entry per output transform spec).

formatBenchmark produces a human-readable string:

Benchmark (245.3ms total)
────────────────────────────────────
#1 source 12.1ms Direct input
#2 transform 180.4ms Transform core:parseXlsx
#3 emit 48.2ms Emit: companies, soiInvestments
companies 32.1ms
soiInvestments 16.1ms
#4 mapOut 4.6ms MapOut: lookup
lookup 4.6ms
By phase:
source 12.1ms (1 node)
transform 180.4ms (1 node)
emit 48.2ms (1 node)
mapOut 4.6ms (1 node)

When benchmark is not set or is false, no performance.now() calls are made and no benchmark data is collected. The benchmark field on the result will be undefined.