Gathering detailed insights and metrics for hast-util-to-mdast
Gathering detailed insights and metrics for hast-util-to-mdast
Gathering detailed insights and metrics for hast-util-to-mdast
Gathering detailed insights and metrics for hast-util-to-mdast
utility to transform hast (HTML) to mdast (markdown)
npm install hast-util-to-mdast
92.3
Supply Chain
99.6
Quality
84
Maintenance
100
Vulnerability
100
License
Module System
Min. Node Version
Typescript Support
Node Version
NPM Version
38 Stars
302 Commits
16 Forks
8 Watching
1 Branches
19 Contributors
Updated on 19 Nov 2024
JavaScript (100%)
Cumulative downloads
Total Downloads
Last day
-2.8%
17,307
Compared to previous day
Last week
1%
100,795
Compared to previous week
Last month
23.7%
420,118
Compared to previous month
Last year
52.7%
3,067,517
Compared to previous year
14
19
hast utility to transform to mdast.
This package is a utility that takes a hast (HTML) syntax tree as input and turns it into an mdast (markdown) syntax tree.
This project is useful when you want to turn HTML to markdown.
The mdast utility mdast-util-to-hast
does the inverse of
this utility.
It turns markdown into HTML.
The rehype plugin rehype-remark
wraps this utility to also
turn HTML to markdown at a higher-level (easier) abstraction.
This package is ESM only. In Node.js (version 16+), install with npm:
1npm install hast-util-to-mdast
In Deno with esm.sh
:
1import {toMdast} from 'https://esm.sh/hast-util-to-mdast@10'
In browsers with esm.sh
:
1<script type="module"> 2 import {toMdast} from 'https://esm.sh/hast-util-to-mdast@10?bundle' 3</script>
Say we have the following example.html
:
1<h2>Hello <strong>world!</strong></h2>
…and next to it a module example.js
:
1import fs from 'node:fs/promises' 2import {fromHtml} from 'hast-util-from-html' 3import {toMdast} from 'hast-util-to-mdast' 4import {toMarkdown} from 'mdast-util-to-markdown' 5 6const html = String(await fs.readFile('example.html')) 7const hast = fromHtml(html, {fragment: true}) 8const mdast = toMdast(hast) 9const markdown = toMarkdown(mdast) 10 11console.log(markdown)
…now running node example.js
yields:
1## Hello **world!**
This package exports the identifiers defaultHandlers
,
defaultNodeHandlers
, and
toMdast
.
There is no default export.
toMdast(tree[, options])
Transform hast to mdast.
mdast tree (MdastNode
).
defaultHandlers
Default handlers for elements (Record<string, Handle>
).
Each key is an element name, each value is a Handle
.
defaultNodeHandlers
Default handlers for nodes (Record<string, NodeHandle>
).
Each key is a node type, each value is a NodeHandle
.
Handle
Handle a particular element (TypeScript type).
state
(State
)
— info passed around about the current stateelement
(Element
)
— element to transformparent
(HastParent
)
— parent of element
mdast node or nodes (Array<MdastNode> | MdastNode | undefined
).
NodeHandle
Handle a particular node (TypeScript type).
state
(State
)
— info passed around about the current statenode
(any
)
— node to transformparent
(HastParent
)
— parent of node
mdast node or nodes (Array<MdastNode> | MdastNode | undefined
).
Options
Configuration (TypeScript type).
newlines
Keep line endings when collapsing whitespace (boolean
, default: false
).
The default collapses to a single space.
checked
Value to use for a checked checkbox or radio input (string
, default: [x]
).
unchecked
Value to use for an unchecked checkbox or radio input (string
, default:
[ ]
).
quotes
List of quotes to use (Array<string>
, default: ['"']
).
Each value can be one or two characters. When two, the first character determines the opening quote and the second the closing quote at that level. When one, both the opening and closing quote are that character.
The order in which the preferred quotes appear determines which quotes to use at
which level of nesting.
So, to prefer ‘’
at the first level of nesting, and “”
at the second, pass
['‘’', '“”']
.
If <q>
s are nested deeper than the given amount of quotes, the markers wrap
around: a third level of nesting when using ['«»', '‹›']
should have double
guillemets, a fourth single, a fifth double again, etc.
document
Whether the given tree represents a complete document (boolean
, default:
undefined
).
Applies when the tree
is a root
node.
When the tree represents a complete document, then things are wrapped in
paragraphs when needed, and otherwise they’re left as-is.
The default checks for whether there’s mixed content: some phrasing nodes
and some non-phrasing nodes.
handlers
Object mapping tag names to functions handling the corresponding elements
(Record<string, Handle>
).
Merged into the defaults.
See Handle
.
nodeHandlers
Object mapping node types to functions handling the corresponding nodes
(Record<string, NodeHandle>
).
Merged into the defaults.
See NodeHandle
.
State
Info passed around about the current state (TypeScript type).
patch
((from: HastNode, to: MdastNode) => undefined
)
— copy a node’s positional infoone
((node: HastNode, parent: HastParent | undefined) => Array<MdastNode> | MdastNode | undefined
)
— transform a hast node to mdastall
((parent: HastParent) => Array<MdastContent>
)
— transform the children of a hast parent to mdasttoFlow
((nodes: Array<MdastContent>) => Array<MdastFlowContent>
)
— transform a list of mdast nodes to flowtoSpecificContent
(<ParentType>(nodes: Array<MdastContent>, build: (() => ParentType)) => Array<ParentType>
)
— turn arbitrary content into a list of a particular node typeresolve
((url: string | null | undefined) => string
)
— resolve a URL relative to a baseoptions
(Options
)
— user configurationelementById
(Map<string, Element>
)
— elements by their id
handlers
(Record<string, Handle>
)
— applied element handlers (see Handle
)nodeHandlers
(Record<string, NodeHandle>
)
— applied node handlers (see NodeHandle
)baseFound
(boolean
)
— whether a <base>
element was seenfrozenBaseUrl
(string | undefined
)
— href
of <base>
, if anyinTable
(boolean
)
— whether we’re in a tableqNesting
(number
)
— how deep we’re in <q>
sIt’s possible to exclude something from within HTML when turning it into
markdown, by wrapping it in an element with a data-mdast
attribute set to
'ignore'
.
For example:
1<p><strong>Strong</strong> and <em data-mdast="ignore">emphasis</em>.</p>
Yields:
1**Strong** and .
It’s also possible to pass a handler to ignore nodes.
For example, to ignore em
elements, pass handlers: {'em': function () {}}
:
1<p><strong>Strong</strong> and <em>emphasis</em>.</p>
Yields:
1**Strong** and .
The goal of this project is to map HTML to plain and readable markdown.
That means that certain elements are ignored (such as <svg>
) or “downgraded”
(such as <video>
to links).
You can change this by passing handlers.
Say we have the following file example.html
:
1<p> 2 Some text with 3 <svg viewBox="0 0 1 1" width="1" height="1"><rect fill="black" x="0" y="0" width="1" height="1" /></svg> 4 a graphic… Wait is that a dead pixel? 5</p>
This can be achieved with example.js
like so:
1/** 2 * @import {Html} from 'mdast' 3 */ 4 5import fs from 'node:fs/promises' 6import {fromHtml} from 'hast-util-from-html' 7import {toHtml} from 'hast-util-to-html' 8import {toMdast} from 'hast-util-to-mdast' 9import {toMarkdown} from 'mdast-util-to-markdown' 10 11const html = String(await fs.readFile('example.html')) 12const hast = fromHtml(html, {fragment: true}) 13const mdast = toMdast(hast, { 14 handlers: { 15 svg(state, node) { 16 /** @type {Html} */ 17 const result = {type: 'html', value: toHtml(node, {space: 'svg'})} 18 state.patch(node, result) 19 return result 20 } 21 } 22}) 23const markdown = toMarkdown(mdast) 24 25console.log(markdown)
Yields:
1Some text with <svg viewBox="0 0 1 1" width="1" height="1"><rect fill="black" x="0" y="0" width="1" height="1"></rect></svg> a graphic… Wait is that a dead pixel?
The algorithm used in this project is very powerful.
It supports all HTML elements, including ancient elements (xmp
) and obscure
ones (base
).
It’s particularly good at forms, media, and around implicit and explicit
paragraphs (see HTML Standard, A. van Kesteren; et al. WHATWG § 3.2.5.4
Paragraphs), such as:
1<article> 2 An implicit paragraph. 3 <h1>An explicit paragraph.</h1> 4</article>
Yields:
1An implicit paragraph. 2 3# An explicit paragraph.
HTML is handled according to WHATWG HTML (the living standard), which is also followed by browsers such as Chrome and Firefox.
This project creates markdown according to GFM, which is a standard that’s
based on CommonMark but adds the strikethrough (~like so~
) and tables
(| Table header | …
) amongst some alternative syntaxes.
The input syntax tree format is hast. Any HTML that can be represented in hast is accepted as input. The output syntax tree format is mdast.
When <table>
elements or <del>
, <s>
, and <strike>
exist in the HTML,
then the GFM nodes table
and delete
are used.
This utility does not generate definitions or references, or syntax extensions
such as footnotes, frontmatter, or math.
This package is fully typed with TypeScript.
It exports the additional types Handle
,
NodeHandle
,
Options
,
and State
.
Projects maintained by the unified collective are compatible with maintained versions of Node.js.
When we cut a new major release, we drop support for unmaintained versions of
Node.
This means we try to keep the current release line, hast-util-to-mdast@^10
,
compatible with Node.js 16.
Use of hast-util-to-mdast
is safe by default.
hast-util-to-nlcst
— transform hast to nlcsthast-util-to-xast
— transform hast to xastSee contributing.md
in syntax-tree/.github
for ways to get
started.
See support.md
for ways to get help.
This project has a code of conduct. By interacting with this repository, organization, or community you agree to abide by its terms.
No vulnerabilities found.
Reason
no dangerous workflow patterns detected
Reason
12 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 10
Reason
no binaries found in the repo
Reason
0 existing vulnerabilities detected
Reason
license file detected
Details
Reason
security policy file detected
Details
Reason
Found 2/30 approved changesets -- score normalized to 0
Reason
detected GitHub workflow tokens with excessive permissions
Details
Reason
dependency not pinned by hash detected -- score normalized to 0
Details
Reason
no effort to earn an OpenSSF best practices badge detected
Reason
project is not fuzzed
Details
Reason
branch protection not enabled on development/release branches
Details
Reason
SAST tool is not run on all commits -- score normalized to 0
Details
Score
Last Scanned on 2024-11-18
The Open Source Security Foundation is a cross-industry collaboration to improve the security of open source software (OSS). The Scorecard provides security health metrics for open source projects.
Learn More