parse-entities
Parse HTML character references.
Contents
What is this?
This is a small and powerful decoder of HTML character references (often called
entities).
When should I use this?
You can use this for spec-compliant decoding of character references.
It’s small and fast enough to do that well.
You can also use this when making a linter, because there are different warnings
emitted with reasons for why and positional info on where they happened.
Install
This package is ESM only.
In Node.js (version 14.14+, 16.0+), install with npm:
npm install parse-entities
In Deno with esm.sh
:
import {parseEntities} from 'https://esm.sh/parse-entities@3'
In browsers with esm.sh
:
<script type="module">
import {parseEntities} from 'https://esm.sh/parse-entities@3?bundle'
</script>
Use
import {parseEntities} from 'parse-entities'
console.log(parseEntities('alpha & bravo')))
// => alpha & bravo
console.log(parseEntities('charlie ©cat; delta'))
// => charlie ©cat; delta
console.log(parseEntities('echo © foxtrot ≠ golf 𝌆 hotel'))
// => echo © foxtrot ≠ golf 𝌆 hotel
API
This package exports the identifier parseEntities
.
There is no default export.
parseEntities(value[, options])
Parse HTML character references.
options
Configuration (optional).
options.additional
Additional character to accept (string?
, default: ''
).
This allows other characters, without error, when following an ampersand.
options.attribute
Whether to parse value
as an attribute value (boolean?
, default: false
).
This results in slightly different behavior.
options.nonTerminated
Whether to allow nonterminated references (boolean
, default: true
).
For example, ©cat
for ©cat
.
This behavior is compliant to the spec but can lead to unexpected results.
options.position
Starting position
of value
(Position
or Point
, optional).
Useful when dealing with values nested in some sort of syntax tree.
The default is:
{line: 1, column: 1, offset: 0}
options.warning
Error handler (Function?
).
options.text
Text handler (Function?
).
options.reference
Reference handler (Function?
).
options.warningContext
Context used when calling warning
('*'
, optional).
options.textContext
Context used when calling text
('*'
, optional).
options.referenceContext
Context used when calling reference
('*'
, optional)
Returns
string
— decoded value
.
function warning(reason, point, code)
Error handler.
Parameters
this
(*
) — refers to warningContext
when given to parseEntities
reason
(string
) — human readable reason for emitting a parse error
point
(Point
) — place where the error occurred
code
(number
) — machine readable code the error
The following codes are used:
Code | Example | Note |
---|
1 | foo & bar | Missing semicolon (named) |
2 | foo { bar | Missing semicolon (numeric) |
3 | Foo &bar baz | Empty (named) |
4 | Foo &# | Empty (numeric) |
5 | Foo &bar; baz | Unknown (named) |
6 | Foo € baz | Disallowed reference |
7 | Foo � baz | Prohibited: outside permissible unicode range |
function text(value, position)
Text handler.
Parameters
this
(*
) — refers to textContext
when given to parseEntities
value
(string
) — string of content
position
(Position
) — place where value
starts and ends
function reference(value, position, source)
Character reference handler.
Parameters
this
(*
) — refers to referenceContext
when given to parseEntities
value
(string
) — decoded character reference
position
(Position
) — place where source
starts and ends
source
(string
) — raw source of character reference
Types
This package is fully typed with TypeScript.
It exports the additional types Options
, WarningHandler
,
ReferenceHandler
, and TextHandler
.
Compatibility
This package is at least compatible with all maintained versions of Node.js.
As of now, that is Node.js 14.14+ and 16.0+.
It also works in Deno and modern browsers.
Security
This package is safe: it matches the HTML spec to parse character references.
Related
Contribute
Yes please!
See How to Contribute to Open Source.
License
MIT © Titus Wormer