npmpackage.info

Gathering detailed insights and metrics for refa

Other packages similar to refa

regexp-ast-analysis

0.7.1

A library for analysing JS RegExp

use-click-document

1.0.0

on document click react-hooks

react-child-by-ref-decorator

3.0.1

ECMA-6 Decorator for separate React children by ref property

jest-golden-master

0.1.0

Golden master utils for Jest

Gathering detailed insights and metrics for refa

refa

A library for finite automata and regular expressions in the context of JS RegExp

0.12.1

MIT

TypeScript

22,685,195 3.4

Installations

npm install refa

Pull Requests

Open

1

Total

45

Closed

3

Merged

41

Issues

Open

4

Total

34

Closed

30

Releases

v0.12.1

Published on 13 Sept 2023

v0.12.0

Published on 04 Sept 2023

v0.11.0

Published on 29 Mar 2023

0.10.0

v0.10.0

Published on 19 Oct 2021

0.9.1

v0.9.1

Published on 26 Jul 2021

0.9.0

v0.9.0

Published on 15 Jul 2021

View all 18 releases

Developer

RunDevelopment

Developer Guide

BETA

Module System

CommonJS

Min. Node Version

^12.0.0 || ^14.0.0 || >=16.0.0

Typescript Support

Yes

Node Version

20.5.1

NPM Version

8.19.4 Statistics

22 Stars

502 Commits

3 Forks

2 Watching

3 Branches

4 Contributors

Updated on 23 Nov 2024

Languages

TypeScript (99.79%)

JavaScript (0.21%)

Total Downloads

Cumulative downloads

Total Downloads

22,685,195

Last day

-6.6%

124,015

Compared to previous day

Last week

665,704

Compared to previous week

Last month

26.4%

2,650,010

Compared to previous month

Last year

232.5%

14,941,629

Compared to previous year

Daily Downloads

Weekly Downloads

Monthly Downloads

Yearly Downloads

Versions

Regular Expressions and Finite Automata (refa)

A library for regular expressions (RE) and finite automata (FA) in the context of Javascript RegExp.

About

refa is a general library for DFA, NFA, and REs of formal regular languages. It also includes methods to easily convert from JS RegExp to the internal RE AST and vice versa.

Installation

Get refa from NPM:

npm i --save refa

yarn add refa

Features

Conversions
- RE AST to NFA and ENFA (assertions are not implemented yet)
- DFA, NFA, and ENFA can all be converted into each other
- DFA, NFA, and ENFA to RE AST
DFA, NFA, and ENFA operations
- Construction from other FA, the intersection of two FA, or a finite set of words
- Print as DOT or Mermaid.
- Test whether a word is accepted
- Test whether the accepted language is the empty set/a finite set
- Accept all prefixes/suffixes of a language
DFA specific operations
- Minimization
- Complement
- Structural equality
NFA and ENFA specific operations
- Union and Concatenation with other FA
- Quantification
- Reverse
AST transformations
- Simplify and change the AST of a regex
- Remove assertions
JavaScript RegExp
- RegExp to RE AST and RE AST to RegExp
  - All flags are fully supported
  - Unicode properties
  - Change flags
  - Limited support for simple backreferences

See the API documentation for a complete list of all currently implemented operations.

RE AST format

refa uses its own AST format to represent regular expressions. The RE AST format is language agnostic and relatively simple.

It supports:

Concatenation (e.g. ab)
Alternation (e.g. a|b)
Quantifiers (greedy and lazy) (e.g. a{4,6}, a{2,}?, a?, a*)
Assertions (e.g. (?=a), (?<!a))
Characters/character sets (represented by interval sets)
Unknowns (elements that cannot be represented otherwise. E.g. backreferences)

Some features like atomic groups and capturing groups are not supported (but might be added in the future).

For information on how to parse JS RegExp and convert RE AST to JS RegExp, see the JS namespace.

Universal characters

refa does not use JavaScript strings represent characters or a sequences of characters. Instead it uses integers to represent characters (see the Char type) and arrays of numbers to represent words/strings (see the Word type).

This means that any text encoding can be used.

The Words namespace contains functions to convert JavaScript data into refa-compatible words and characters.

For the sets of characters, the CharSet class is used.

General limitations

This library will never be able to support some modern features of regex engines such as backreferences and recursion because these features, generally, cannot be be represented by a DFA or NFA.

Usage examples

refa is a relatively low-level library. It only provides the basic building blocks. In the following examples, JS RegExps are used a lot so we will define a few useful helper function beforehand.

1import { DFA, FiniteAutomaton, JS, NFA } from "refa";
2
3function toNFA(regex: RegExp): NFA {
4	const { expression, maxCharacter } = JS.Parser.fromLiteral(regex).parse();
5	return NFA.fromRegex(expression, { maxCharacter });
6}
7function toDFA(regex: RegExp): DFA {
8	return DFA.fromFA(toNFA(regex));
9}
10function toRegExp(fa: FiniteAutomaton): RegExp {
11	const literal = JS.toLiteral(fa.toRegex());
12	return new RegExp(literal.source, literal.flags);
13}

toNFA parses the given RegExp and constructs a new NFA from the parsed AST.
toDFA constructs a new NFA from the RegExp first and then converts that NFA into a new DFA.
toRegex takes an FA (= NFA or DFA) and converts it into a RegExp.

Testing whether a word is accepted

1import { Words } from "refa";
2
3const regex = /\w+\d+/;
4const nfa = toNFA(regex);
5
6console.log(nfa.test(Words.fromStringToUTF16("abc")));
7// => false
8console.log(nfa.test(Words.fromStringToUTF16("123")));
9// => true
10console.log(nfa.test(Words.fromStringToUTF16("abc123")));
11// => true
12console.log(nfa.test(Words.fromStringToUTF16("123abc")));
13// => false

Finding the intersection of two JS RegExps

1const regex1 = /a+B+c+/i;
2const regex2 = /Ab*C\d?/;
3
4const intersection = NFA.fromIntersection(toNFA(regex1), toNFA(regex2));
5
6console.log(toRegExp(intersection));
7// => /Ab+C/

Finding the complement of a JS RegExp

1const regex = /a+b*/i;
2
3const dfa = toDFA(regex);
4dfa.complement();
5
6console.log(toRegExp(dfa));
7// => /(?:(?:[^A]|A+(?:[^AB]|B+[^B]))[^]*)?/i

Converting a JS RegExp to an NFA

In the above examples, we have been using the toNFA helper function to parse and convert RegExps. This function assumes that the given RegExp is a pure regular expression without assertions and backreferences and will throw an error if the assumption is not met.

However, the JS parser and NFA.fromRegex provide some options to work around and even solve this problem.

Backreferences

Firstly, the parser will automatically resolve simple backreferences. Even toNFA will do this since it's on by default:

1console.log(toRegExp(toNFA(/("|').*?\1/)));
2// => /".*"|'.*'/i

But it will throw an error for non-trivial backreferences that cannot be resolved:

1toNFA(/(#+).*\1|foo/);
2// Error: Backreferences are not supported.

The only way to parse the RegExp despite unresolvable backreferences is to remove the backreferences. This means that the result will be imperfect but it might still be useful.

1const regex = /(#+).*\1|foo/;
2const { expression } =
3	JS.Parser.fromLiteral(regex).parse({ backreferences: "disable" });
4
5console.log(JS.toLiteral(expression));
6// => { source: 'foo', flags: '' }

Note that the foo alternative is kept because it is completely unaffected by the unresolvable backreferences.

Assertions

While the parser and AST format can handle assertions, the NFA construction cannot.

1const regex = /\b(?!\d)\w+\b|->/;
2const { expression, maxCharacter } = JS.Parser.fromLiteral(regex).parse();
3
4console.log(JS.toLiteral(expression));
5// => { source: '\\b(?!\\d)\\w+\\b|->', flags: 'i' }
6
7NFA.fromRegex(expression, { maxCharacter });
8// Error: Assertions are not supported yet.

Similarly to backreferences, we can let the parser remove them:

1const regex = /\b(?!\d)\w+\b|->/;
2const { expression, maxCharacter } =
3	JS.Parser.fromLiteral(regex).parse({ assertions: "disable" });
4
5console.log(JS.toLiteral(expression));
6// => { source: '->', flags: 'i' }
7
8const nfa = NFA.fromRegex(expression, { maxCharacter });
9console.log(toRegExp(nfa));
10// => /->/i

Or we can let the NFA construction method remove them:

1const regex = /\b(?!\d)\w+\b|->/;
2const { expression, maxCharacter } = JS.Parser.fromLiteral(regex).parse();
3
4console.log(JS.toLiteral(expression));
5// => { source: '\\b(?!\\d)\\w+\\b|->', flags: 'i' }
6
7const nfa = NFA.fromRegex(expression, { maxCharacter }, { assertions: "disable" });
8console.log(toRegExp(nfa));
9// => /->/i

Prefer using the parser to remove assertions if possible. The parser is quite clever and will optimize based on that assertions can be removed resulting in faster parse times.

However, simply removing assertions is not ideal since they are a lot more common than backreferences. To work around this, refa has AST transformers. AST transformers can make changes to a given AST. While each transformer is rather simple, they can also work together to accomplish more complex tasks. Applying and removing assertions is one such task.

The simplest transformer to remove assertions (among other things) is the simplify transformer. It will inline expressions, remove dead branches, apply/remove assertions, optimize quantifiers, and more.

1import { JS, NFA, Transformers, transform } from "refa";
2
3const regex = /\b(?!\d)\w+\b|->/;
4const { expression, maxCharacter } = JS.Parser.fromLiteral(regex).parse();
5console.log(JS.toLiteral(expression));
6// => { source: '\\b(?!\\d)\\w+\\b|->', flags: '' }
7
8const modifiedExpression = transform(Transformers.simplify(), expression);
9console.log(JS.toLiteral(modifiedExpression));
10// => { source: '(?<!\\w)[A-Z_]\\w*(?!\\w)|->', flags: 'i' }
11
12// Most assertions have been removed but the patterns are still equivalent.
13// The only assertions left assert characters beyond the edge of the pattern.
14// Removing those assertions is easy but slightly changes the pattern.
15
16const finalExpression = transform(Transformers.patternEdgeAssertions({ remove: true }), modifiedExpression);
17console.log(JS.toLiteral(finalExpression));
18// => { source: '[A-Z_]\\w*|->', flags: 'i' }
19
20const nfa = NFA.fromRegex(finalExpression, { maxCharacter });
21console.log(JS.toLiteral(nfa.toRegex()));
22// => { source: '->|[A-Z_]\\w*', flags: 'i' }

AST transformers can handle a lot of assertions, but there are limitations. Transformers cannot handle assertions that are too complex or require large-scale changes to the AST. Let's take a look at a few examples:

1import { JS, Transformers, transform } from "refa";
2
3function simplify(regex: RegExp): void {
4  const { expression } = JS.Parser.fromLiteral(regex).parse();
5
6  const simplifiedExpression = transform(Transformers.simplify(), expression);
7
8  const literal = JS.toLiteral(simplifiedExpression);
9  console.log(new RegExp(literal.source, literal.flags));
10}
11
12simplify(/\b(?!\d)\b\w+\b\s*\(/);
13// => /(?<!\w)[A-Z_]\w*\s*\(/i
14simplify(/(?:^|@)\b\w+\b/);
15// => /(?:^|@)\w+(?!\w)/
16simplify(/"""(?:(?!""").)*"""/s);
17// => /"""(?:"{0,2}[^"])*"""/
18simplify(/"""((?!""")(?:[^\\]|\\"))*"""/);
19// => /"""(?:"{0,2}(?:[^"\\]|\\"))*"""/
20simplify(/<title>(?:(?!<\/title>).)*<\/title>/s);
21// => /<title>(?:[^<]|<+(?:[^/<]|\/(?!title>)))*<+\/title>/
22simplify(/^```$.*?^```$/ms);
23// => /^```[\n\r\u2028\u2029](?:[^]*?[\n\r\u2028\u2029])??```$/m

Note

Transformers.simplify is very aggressive when it comes to assertions. It will try to remove assertions whenever possible even if it means that the overall AST will become more complex (within some limits). This may result in longer/more complex regexes, but it will also allow NFA and ENFA to support many more regexes.

No vulnerabilities found.

10

Dangerous-Workflow

Determines if the project's GitHub Action workflows avoid dangerous patterns.

10

License

Determines if the project has defined a license.

10

Binary-Artifacts

Determines if the project has generated executable (binary) artifacts in the source repository.

5

Pinned-Dependencies

Determines if the project has declared and pinned the dependencies of its build process.

Reason

dependency not pinned by hash detected -- score normalized to 5

Details

Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/deploy-docs.yml:13: update your workflow using https://app.stepsecurity.io/secureworkflow/RunDevelopment/refa/deploy-docs.yml/master?enable=pin
Warn: third-party GitHubAction not pinned by hash: .github/workflows/deploy-docs.yml:24: update your workflow using https://app.stepsecurity.io/secureworkflow/RunDevelopment/refa/deploy-docs.yml/master?enable=pin
Warn: third-party GitHubAction not pinned by hash: .github/workflows/deploy-docs.yml:36: update your workflow using https://app.stepsecurity.io/secureworkflow/RunDevelopment/refa/deploy-docs.yml/master?enable=pin
Warn: third-party GitHubAction not pinned by hash: .github/workflows/deploy-docs.yml:45: update your workflow using https://app.stepsecurity.io/secureworkflow/RunDevelopment/refa/deploy-docs.yml/master?enable=pin
Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/nodejs.yml:19: update your workflow using https://app.stepsecurity.io/secureworkflow/RunDevelopment/refa/nodejs.yml/master?enable=pin
Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/nodejs.yml:21: update your workflow using https://app.stepsecurity.io/secureworkflow/RunDevelopment/refa/nodejs.yml/master?enable=pin
Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/nodejs.yml:38: update your workflow using https://app.stepsecurity.io/secureworkflow/RunDevelopment/refa/nodejs.yml/master?enable=pin
Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/nodejs.yml:40: update your workflow using https://app.stepsecurity.io/secureworkflow/RunDevelopment/refa/nodejs.yml/master?enable=pin
Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/nodejs.yml:51: update your workflow using https://app.stepsecurity.io/secureworkflow/RunDevelopment/refa/nodejs.yml/master?enable=pin
Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/nodejs.yml:53: update your workflow using https://app.stepsecurity.io/secureworkflow/RunDevelopment/refa/nodejs.yml/master?enable=pin
Info: 0 out of 7 GitHub-owned GitHubAction dependencies pinned
Info: 0 out of 3 third-party GitHubAction dependencies pinned
Info: 4 out of 4 npmCommand dependencies pinned

3

Vulnerabilities

Determines if the project has open, known unfixed vulnerabilities.

0

Maintained

Determines if the project is "actively maintained".

0

Code-Review

Determines if the project requires human code review before pull requests (aka merge requests) are merged.

0

CII-Best-Practices

Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.

0

Security-Policy

Determines if the project has published a security policy.

0

Token-Permissions

Determines if the project's workflows follow the principle of least privilege.

0

SAST

Determines if the project uses static code analysis.

0

Fuzzing

Determines if the project uses fuzzing.

Score

3.4

/10

Last Scanned on 2024-11-25

The Open Source Security Foundation is a cross-industry collaboration to improve the security of open source software (OSS). The Scorecard provides security health metrics for open source projects.

Learn More

Other packages similar to refa

Other packages similar to refa

refa

Installations

Pull Requests

1

45

3

41

Issues

4

34

30

Releases

Developer

RunDevelopment

Developer Guide

CommonJS

^12.0.0 || ^14.0.0 || >=16.0.0

Yes

20.5.1

8.19.4

Statistics

Languages

Total Downloads

22,685,195

Daily Downloads

Weekly Downloads

Monthly Downloads

Yearly Downloads

Dependencies

Dev Dependencies

Regular Expressions and Finite Automata (refa)

About

Installation

Features

RE AST format

Universal characters

General limitations

Usage examples

Testing whether a word is accepted

Finding the intersection of two JS RegExps

Finding the complement of a JS RegExp

Converting a JS RegExp to an NFA

Backreferences

Assertions

10

10

10

5

3

0

0

0

0

0

0

0

3.4

/10