@origints/mammoth
DOCX to semantic HTML or plain text conversion with custom style mapping and configurable image handling.
Installation
Section titled “Installation”npm install @origints/mammoth @origints/coreFeatures
Section titled “Features”- Convert DOCX to semantic HTML
- Convert DOCX to plain text
- Custom style mapping for headings, lists, and more
- Configurable image handling
- Conversion warnings and messages
Usage with Planner
Section titled “Usage with Planner”Convert DOCX to HTML
Section titled “Convert DOCX to HTML”import { Planner, loadFile, run } from '@origints/core'import { docxToHtml } from '@origints/mammoth'
const plan = new Planner() .in(loadFile('document.docx')) .mapIn(docxToHtml()) .emit((out, $) => out .add('html', $.get('html').string()) ) .compile()
const result = await run(plan, { readFile, registry })// result.value: { html: '<h1>Title</h1><p>Content...</p>' }Custom style mapping
Section titled “Custom style mapping”const plan = new Planner() .in(loadFile('report.docx')) .mapIn(docxToHtml({ styleMap: [ "p[style-name='Title'] => h1.document-title", "p[style-name='Heading 1'] => h1", "p[style-name='Heading 2'] => h2", "p[style-name='Quote'] => blockquote", ], idPrefix: 'doc-', })) .emit((out, $) => out .add('content', $.get('html').string()) ) .compile()Extract plain text
Section titled “Extract plain text”import { docxToText } from '@origints/mammoth'
const plan = new Planner() .in(loadFile('document.docx')) .mapIn(docxToText()) .emit((out, $) => out .add('text', $.get('text').string()) ) .compile()Omit images
Section titled “Omit images”const plan = new Planner() .in(loadFile('document.docx')) .mapIn(docxToHtml({ imageHandling: 'omit' })) .emit((out, $) => out.add('html', $.get('html').string())) .compile()Standalone usage
Section titled “Standalone usage”import * as fs from 'fs'import { docxToHtmlImpl, docxToTextImpl } from '@origints/mammoth'
const buffer = fs.readFileSync('document.docx')
const htmlResult = await docxToHtmlImpl.execute(buffer)console.log(htmlResult.html)
for (const msg of htmlResult.messages) { console.warn(msg.message)}
const textResult = await docxToTextImpl.execute(buffer)console.log(textResult.text)| Export | Description |
|---|---|
docxToHtml(options?) | Transform AST for HTML conversion |
docxToText(options?) | Transform AST for text conversion |
docxToHtmlImpl | Async HTML conversion implementation |
docxToTextImpl | Async text conversion implementation |
registerMammothTransforms(registry) | Register mammoth transforms |
License
Section titled “License”MIT