Gathering detailed insights and metrics for @hutechtechnical/nobis-ex-dolor-reprehenderit
Gathering detailed insights and metrics for @hutechtechnical/nobis-ex-dolor-reprehenderit
npm install @hutechtechnical/nobis-ex-dolor-reprehenderit
Typescript
Module System
Node Version
NPM Version
66.4
Supply Chain
48.1
Quality
75.9
Maintenance
100
Vulnerability
100
License
Cumulative downloads
Total Downloads
Last day
-25%
3
Compared to previous day
Last week
-9.1%
10
Compared to previous week
Last month
160%
52
Compared to previous month
Last year
0%
711
Compared to previous year
16
The tiny, regex powered, lenient, almost spec-compliant JavaScript tokenizer that never fails.
1const jsTokens = require("@hutechtechnical/nobis-ex-dolor-reprehenderit"); 2 3const jsString = 'JSON.stringify({k:3.14**2}, null /*replacer*/, "\\t")'; 4 5Array.from(jsTokens(jsString), (token) => token.value).join("|"); 6// JSON|.|stringify|(|{|k|:|3.14|**|2|}|,| |null| |/*replacer*/|,| |"\t"|)
npm install @hutechtechnical/nobis-ex-dolor-reprehenderit
1import jsTokens from "@hutechtechnical/nobis-ex-dolor-reprehenderit"; 2// or: 3const jsTokens = require("@hutechtechnical/nobis-ex-dolor-reprehenderit");
1jsTokens(string, options?)
Option | Type | Default | Description |
---|---|---|---|
jsx | boolean | false | Enable JSX support. |
This package exports a generator function, jsTokens
, that turns a string of JavaScript code into token objects.
For the empty string, the function yields nothing (which can be turned into an empty list). For any other input, the function always yields something, even for invalid JavaScript, and never throws. Concatenating the token values reproduces the input.
The package is very close to being fully spec compliant (it passes all but 3 of test262-parser-tests), but has taken a couple of shortcuts. See the following sections for limitations of some tokens.
1// Loop over tokens: 2for (const token of jsTokens("hello, !world")) { 3 console.log(token); 4} 5 6// Get all tokens as an array: 7const tokens = Array.from(jsTokens("hello, !world"));
Spec: ECMAScript Language: Lexical Grammar + Additional Syntax
1export default function jsTokens(input: string): Iterable<Token>; 2 3type Token = 4 | { type: "StringLiteral"; value: string; closed: boolean } 5 | { type: "NoSubstitutionTemplate"; value: string; closed: boolean } 6 | { type: "TemplateHead"; value: string } 7 | { type: "TemplateMiddle"; value: string } 8 | { type: "TemplateTail"; value: string; closed: boolean } 9 | { type: "RegularExpressionLiteral"; value: string; closed: boolean } 10 | { type: "MultiLineComment"; value: string; closed: boolean } 11 | { type: "SingleLineComment"; value: string } 12 | { type: "HashbangComment"; value: string } 13 | { type: "IdentifierName"; value: string } 14 | { type: "PrivateIdentifier"; value: string } 15 | { type: "NumericLiteral"; value: string } 16 | { type: "Punctuator"; value: string } 17 | { type: "WhiteSpace"; value: string } 18 | { type: "LineTerminatorSequence"; value: string } 19 | { type: "Invalid"; value: string };
Spec: StringLiteral
If the ending "
or '
is missing, the token has closed: false
. JavaScript strings cannot contain (unescaped) newlines, so unclosed strings simply end at the end of the line.
Escape sequences are supported, but may be invalid. For example, "\u"
is matched as a StringLiteral even though it contains an invalid escape.
Examples:
1"string" 2'string' 3"" 4'' 5"\"" 6'\'' 7"valid: \u00a0, invalid: \u" 8'valid: \u00a0, invalid: \u' 9"multi-\ 10line" 11'multi-\ 12line' 13" unclosed 14' unclosed
Spec: NoSubstitutionTemplate / TemplateHead / TemplateMiddle / TemplateTail
A template without interpolations is matched as is. For, example:
`abc`
: NoSubstitutionTemplate`abc
: NoSubstitutionTemplate with closed: false
A template with interpolations is matched as many tokens. For example, `head${1}middle${2}tail`
is matched as follows (apart from the two NumericLiterals):
`head${
: TemplateHead}middle${
: TemplateMiddle}tail`
: TemplateTailTemplateMiddle is optional, and TemplateTail can be unclosed. For example, `head${1}tail
(note the missing ending `
):
`head${
: TemplateHead}tail
: TemplateTail with closed: false
Templates can contain unescaped newlines, so unclosed templates go on to the end of input.
Just like for StringLiteral, templates can also contain invalid escapes. `\u`
is matched as a NoSubstitutionTemplate even though it contains an invalid escape. Also note that in tagged templates, invalid escapes are not syntax errors: x`\u`
is syntactically valid JavaScript.
Spec: RegularExpressionLiteral
Regex literals may contain invalid regex syntax. They are still matched as regex literals.
If the ending /
is missing, the token has closed: false
. JavaScript regex literals cannot contain newlines (not even escaped ones), so unclosed regex literals simply end at the end of the line.
According to the specification, the flags of regular expressions are IdentifierParts (unknown and repeated regex flags become errors at a later stage).
Differentiating between regex and division in JavaScript is really tricky. @hutechtechnical/nobis-ex-dolor-reprehenderit looks at the previous token to tell them apart. As long as the previous tokens are valid, it should do the right thing. For invalid code, @hutechtechnical/nobis-ex-dolor-reprehenderit might be confused and start matching division as regex or vice versa.
Examples:
1/a/ 2/a/gimsuy 3/a/Inva1id 4/+/ 5/[/]\//
Spec: MultiLineComment
If the ending */
is missing, the token has closed: false
. Unclosed multi-line comments go on to the end of the input.
Examples:
1/* comment */ 2/* console.log( 3 "commented", out + code); 4 */ 5/**/ 6/* unclosed
Spec: SingleLineComment
Examples:
1// comment 2// console.log("commented", out + code); 3//
Spec: HashbangComment
Note that a HashbangComment can only occur at the very start of the string that is being tokenized. Anywhere else you will likely get an Invalid token #
followed by a Punctuator token !
.
Examples:
1#!/usr/bin/env node 2#! console.log("commented", out + code); 3#!
Spec: IdentifierName
Keywords, reserved words, null
, true
, false
, variable names and property names.
Examples:
1if 2for 3var 4instanceof 5package 6null 7true 8false 9Infinity 10undefined 11NaN 12$variab1e_name 13π 14℮ 15ಠ_ಠ 16\u006C\u006F\u006C\u0077\u0061\u0074
Spec: PrivateIdentifier
Any IdentifierName
preceded by a #
.
Examples:
1#if 2#for 3#var 4#instanceof 5#package 6#null 7#true 8#false 9#Infinity 10#undefined 11#NaN 12#$variab1e_name 13#π 14#℮ 15#ಠ_ಠ 16#\u006C\u006F\u006C\u0077\u0061\u0074
Spec: NumericLiteral
Examples:
10 21.5 31 41_000 512e9 60.123e-32 70xDead_beef 80b110 912n 1007 1109.5
Spec: Punctuator + DivPunctuator + RightBracePunctuator
All possible values:
1&& || ??
2-- ++
3. ?.
4< <= > >=
5!= !== == ===
6 + - % & | ^ / * ** << >> >>>
7= += -= %= &= |= ^= /= *= **= <<= >>= >>>=
8( ) [ ] { }
9! ? : ; , ~ ... =>
Spec: WhiteSpace
Unlike the specification, multiple whitespace characters in a row are matched as one token, not one token per character.
Spec: LineTerminatorSequence
CR, LF and CRLF, plus \u2028
and \u2029
.
Spec: n/a
Single code points not matched in another token.
Examples:
1# 2@ 3💩
Spec: JSX Specification
1export default function jsTokens( 2 input: string, 3 options: { jsx: true }, 4): Iterable<Token | JSXToken>; 5 6export declare type JSXToken = 7 | { type: "JSXString"; value: string; closed: boolean } 8 | { type: "JSXText"; value: string } 9 | { type: "JSXIdentifier"; value: string } 10 | { type: "JSXPunctuator"; value: string } 11 | { type: "JSXInvalid"; value: string };
Token
and runs of JSXToken
.JSXToken
can also contain WhiteSpace, LineTerminatorSequence, MultiLineComment and SingleLineComment.Spec: "
JSXDoubleStringCharacters "
+ '
JSXSingleStringCharacters '
If the ending "
or '
is missing, the token has closed: false
. JSX strings can contain unescaped newlines, so unclosed JSX strings go on to the end of input.
Note that JSX don’t support escape sequences as part of the token grammar. A "
or '
always closes the string, even with a backslash before.
Examples:
"string"
'string'
""
''
"\"
'\'
"multi-
line"
'multi-
line'
" unclosed
' unclosed
Spec: JSXText
Anything but <
, >
, {
and }
.
Spec: JSXIdentifier
Examples:
1div 2class 3xml 4x-element 5x------ 6$htm1_element 7ಠ_ಠ
Spec: n/a
All possible values:
1< 2> 3/ 4. 5: 6= 7{ 8}
Spec: n/a
Single code points not matched in another token.
Examples in JSX tags:
11 2` 3+ 4, 5# 6@ 7💩
All possible values in JSX children:
1> 2}
The intention is to always support the latest ECMAScript version whose feature set has been finalized.
Currently, ECMAScript 2023 is supported.
Section B: Additional ECMAScript Features for Web Browsers of the spec is optional if the ECMAScript host is not a web browser, and specifies some additional syntax. Section C: The Strict Mode of ECMAScript disallows certain syntax in Strict Mode.
5<!--x
as 5 < !(--x)
rather than as 5 //x
./
and ending /
, so this is supported.Supporting TypeScript is not an explicit goal, but @hutechtechnical/nobis-ex-dolor-reprehenderit and Babel both tokenize this TypeScript fixture and this TSX fixture the same way, with one edge case:
1type A = Array<Array<string>> 2type B = Array<Array<Array<string>>>
Both lines above should end with a couple of >
tokens, but @hutechtechnical/nobis-ex-dolor-reprehenderit instead matches the >>
and >>>
operators.
JSX is supported: jsTokens("<p>Hello, world!</p>", { jsx: true })
.
@hutechtechnical/nobis-ex-dolor-reprehenderit should work in any JavaScript runtime that supports Unicode property escapes.
Here are a couple of tricky cases:
1// Case 1: 2switch (x) { 3 case x: {}/a/g; 4 case x: {}<div>x</div>/g; 5} 6 7// Case 2: 8label: {}/a/g; 9label: {}<div>x</div>/g; 10 11// Case 3: 12(function f() {}/a/g); 13(function f() {}<div>x</div>/g);
This is what they mean:
1// Case 1: 2switch (x) { 3 case x: 4 { 5 } 6 /a/g; 7 case x: 8 { 9 } 10 <div>x</div> / g; 11} 12 13// Case 2: 14label: { 15} 16/a/g; 17label: { 18} 19<div>x</div> / g; 20 21// Case 3: 22(function f() {}) / a / g; 23(function f() {}) < div > x < /div>/g;
But @hutechtechnical/nobis-ex-dolor-reprehenderit thinks they mean:
1// Case 1: 2switch (x) { 3 case x: 4 ({}) / a / g; 5 case x: 6 ({}) < div > x < /div>/g; 7} 8 9// Case 2: 10label: ({}) / a / g; 11label: ({}) < div > x < /div>/g; 12 13// Case 3: 14function f() {} 15/a/g; 16function f() {} 17<div>x</div> / g;
In other words, @hutechtechnical/nobis-ex-dolor-reprehenderit:
This happens because @hutechtechnical/nobis-ex-dolor-reprehenderit looks at the previous token when deciding between regex and division or JSX and comparison. In these cases, the previous token is }
, which either means “end of block” (→ regex/JSX) or “end of object literal” (→ division/comparison). How does @hutechtechnical/nobis-ex-dolor-reprehenderit determine if the }
belongs to a block or an object literal? By looking at the token before the matching {
.
In case 1 and 2, that’s a :
. A :
usually means that we have an object literal or ternary:
1let some = weird ? { value: {}/a/g } : {}/a/g;
But :
is also used for case
and labeled statements.
One idea is to look for case
before the :
as an exception to the rule, but it’s not so easy:
1switch (x) { 2 case weird ? true : {}/a/g: {}/a/g 3}
The first {}/a/g
is a division, while the second {}/a/g
is an empty block followed by a regex. Both are preceded by a colon with a case
on the same line, and it does not seem like you can distinguish between the two without implementing a parser.
Labeled statements are similarly difficult, since they are so similar to object literals:
1{ 2 label: {}/a/g 3} 4 5({ 6 key: {}/a/g 7})
Finally, case 3 ((function f() {}/a/g);
) is also difficult, because a )
before a {
means that the {
is part of a block, and blocks are usually statements:
1if (x) { 2} 3/a/g; 4 5function f() {} 6/a/g;
But function expressions are of course not statements. It’s difficult to tell an function expression from a function statement without parsing.
Luckily, none of these edge cases are likely to occur in real code.
@hutechtechnical/nobis-ex-dolor-reprehenderit advertises that it “never fails”. Tell you what, it can fail on extreme inputs. The regex engine of the runtime can eventually give up. @hutechtechnical/nobis-ex-dolor-reprehenderit has worked around it to some extent by changing its regexes to be easier on the regex engine. To solve completely, @hutechtechnical/nobis-ex-dolor-reprehenderit would have to stop using regex, but then it wouldn’t be tiny anymore which is the whole point. Luckily, only extreme inputs can fail, hopefully ones you’ll never encounter.
For example, if you try to parse the string literal "\n\n\n"
but with 10 million \n
instead of just 3, the regex engine gives up with RangeError: Maximum call stack size exceeded
(or similar). Try it out:
1Array.from(require("@hutechtechnical/nobis-ex-dolor-reprehenderit")(`"${"\\n".repeat(1e7)}"`));
(Yes, that is the regex engine of the runtime giving up. @hutechtechnical/nobis-ex-dolor-reprehenderit has no recursive functions.)
However, if you repeat a
instead of \n
10 million times ("aaaaaa…"
), it works:
1Array.from(require("@hutechtechnical/nobis-ex-dolor-reprehenderit")(`"${"a".repeat(1e7)}"`));
That’s good, because it’s much more common to have lots of non-escapes in a row in a big string literal, than having mostly escapes. (Obfuscated code might have only escapes though.)
I’ve seen Safari give up instead of throwing an error.
In Safari, Chrome, Firefox and Node.js the following code successfully results in a match:
1/(#)(?:a|b)+/.exec("#" + "a".repeat(1e5));
But for the following code (with 1e7
instead of 1e5
), the runtimes differ:
1/(#)(?:a|b)+/.exec("#" + "a".repeat(1e7));
RangeError: Maximum call stack size exceeded
(or similar).null
(at the time of writing), silently giving up on matching the regex. It’s kind of lying that the regex did not match, while in reality it would given enough computing resources.This means that in Safari, @hutechtechnical/nobis-ex-dolor-reprehenderit might not fail but instead give you unexpected tokens.
With @babel/parser for comparison. Node.js 21.6.1 on a MacBook Pro M1 (Sonoma).
Lines of code | Size | @hutechtechnical/nobis-ex-dolor-reprehenderit@8.0.3 | @babel/parser@7.23.9 |
---|---|---|---|
~100 | ~4.0 KiB | ~2 ms | ~10 ms |
~1 000 | ~39 KiB | ~5 ms | ~27 ms |
~10 000 | ~353 KiB | ~44 ms | ~108 ms |
~100 000 | ~5.1 MiB | ~333 ms | ~2.0 s |
~2 400 000 | ~138 MiB | ~7 s | ~4 m 9 s (*) |
(*) Required increasing the Node.js the memory limit (I set it to 8 GiB).
See benchmark.js if you want to run benchmarks yourself.
No vulnerabilities found.
No security vulnerabilities found.