Gathering detailed insights and metrics for @gerhobbelt/xregexp
Gathering detailed insights and metrics for @gerhobbelt/xregexp
Gathering detailed insights and metrics for @gerhobbelt/xregexp
Gathering detailed insights and metrics for @gerhobbelt/xregexp
npm install @gerhobbelt/xregexp
Typescript
Module System
Node Version
NPM Version
66.8
Supply Chain
99.3
Quality
75.6
Maintenance
100
Vulnerability
100
License
JavaScript (99.67%)
TypeScript (0.16%)
HTML (0.08%)
Shell (0.08%)
Total Downloads
945,160
Last Day
25
Last Week
683
Last Month
4,971
Last Year
79,814
3 Stars
790 Commits
1 Forks
3 Watching
10 Branches
1 Contributors
Latest Version
4.4.0-32
Package Id
@gerhobbelt/xregexp@4.4.0-32
Size
458.05 kB
NPM Version
6.14.6
Node Version
12.18.4
Publised On
07 Nov 2020
Cumulative downloads
Total Downloads
Last day
-89.2%
25
Compared to previous day
Last week
-46.1%
683
Compared to previous week
Last month
-17.2%
4,971
Compared to previous month
Last year
-71.2%
79,814
Compared to previous year
1
18
XRegExp provides augmented (and extensible) JavaScript regular expressions. You get modern syntax and flags beyond what browsers support natively. XRegExp is also a regex utility belt with tools to make your grepping and parsing easier, while freeing you from regex cross-browser inconsistencies and other annoyances.
XRegExp supports all native ES6 regular expression syntax. It supports ES5+ browsers, and you can use it with Node.js or as a RequireJS module.
XRegExp compiles to native RegExp
objects. Therefore regexes built with XRegExp perform just as fast as native regular expressions. There is a tiny extra cost when compiling a pattern for the first time.
1// Using named capture and flag x for free-spacing and line comments 2const date = XRegExp( 3 `(?<year> [0-9]{4} ) -? # year 4 (?<month> [0-9]{2} ) -? # month 5 (?<day> [0-9]{2} ) # day`, 'x'); 6 7// XRegExp.exec gives you named backreferences on the match result 8let match = XRegExp.exec('2017-02-22', date); 9match.year; // -> '2017' 10 11// It also includes optional pos and sticky arguments 12let pos = 3; 13const result = []; 14while (match = XRegExp.exec('<1><2><3>4<5>', /<(\d+)>/, pos, 'sticky')) { 15 result.push(match[1]); 16 pos = match.index + match[0].length; 17} 18// result -> ['2', '3'] 19 20// XRegExp.replace allows named backreferences in replacements 21XRegExp.replace('2017-02-22', date, '$<month>/$<day>/$<year>'); 22// -> '02/22/2017' 23XRegExp.replace('2017-02-22', date, (match) => { 24 return `${match.month}/${match.day}/${match.year}`; 25}); 26// -> '02/22/2017' 27 28// XRegExps compile to RegExps and work perfectly with native methods 29date.test('2017-02-22'); 30// -> true 31 32// The only caveat is that named captures must be referenced using 33// numbered backreferences if used with native methods 34'2017-02-22'.replace(date, '$2/$3/$1'); 35// -> '02/22/2017' 36 37// Use XRegExp.forEach to extract every other digit from a string 38const evens = []; 39XRegExp.forEach('1a2345', /\d/, (match, i) => { 40 if (i % 2) evens.push(+match[0]); 41}); 42// evens -> [2, 4] 43 44// Use XRegExp.matchChain to get numbers within <b> tags 45XRegExp.matchChain('1 <b>2</b> 3 <B>4 \n 56</B>', [ 46 XRegExp('(?is)<b>.*?</b>'), 47 /\d+/ 48]); 49// -> ['2', '4', '56'] 50 51// You can also pass forward and return specific backreferences 52const html = 53 `<a href="http://xregexp.com/">XRegExp</a> 54 <a href="http://www.google.com/">Google</a>`; 55XRegExp.matchChain(html, [ 56 {regex: /<a href="([^"]+)">/i, backref: 1}, 57 {regex: XRegExp('(?i)^https?://(?<domain>[^/?#]+)'), backref: 'domain'} 58]); 59// -> ['xregexp.com', 'www.google.com'] 60 61// Merge strings and regexes, with updated backreferences 62XRegExp.union(['m+a*n', /(bear)\1/, /(pig)\1/], 'i', {conjunction: 'or'}); 63// -> /m\+a\*n|(bear)\1|(pig)\2/i
These examples give the flavor of what's possible, but XRegExp has more syntax, flags, methods, options, and browser fixes that aren't shown here. You can also augment XRegExp's regular expression syntax with addons (see below) or write your own. See xregexp.com for details.
You can either load addons individually, or bundle all addons with XRegExp by loading xregexp-all.js
from https://unpkg.com/xregexp/xregexp-all.js.
If not using xregexp-all.js
, first include the Unicode Base script and then one or more of the addons for Unicode blocks, categories, properties, or scripts.
Then you can do this:
1// Test the Unicode category L (Letter) 2const unicodeWord = XRegExp('^\\pL+$'); 3unicodeWord.test('Русский'); // -> true 4unicodeWord.test('日本語'); // -> true 5unicodeWord.test('العربية'); // -> true 6 7// Test some Unicode scripts 8XRegExp('^\\p{Hiragana}+$').test('ひらがな'); // -> true 9XRegExp('^[\\p{Latin}\\p{Common}]+$').test('Über Café.'); // -> true
By default, \p{…}
and \P{…}
support the Basic Multilingual Plane (i.e. code points up to U+FFFF
). You can opt-in to full 21-bit Unicode support (with code points up to U+10FFFF
) on a per-regex basis by using flag A
. This is called astral mode. You can automatically add flag A
for all new regexes by running XRegExp.install('astral')
. When in astral mode, \p{…}
and \P{…}
always match a full code point rather than a code unit, using surrogate pairs for code points above U+FFFF
.
1// Using flag A to match astral code points 2XRegExp('^\\pS$').test('????'); // -> false 3XRegExp('^\\pS$', 'A').test('????'); // -> true 4XRegExp('(?A)^\\pS$').test('????'); // -> true 5// Using surrogate pair U+D83D U+DCA9 to represent U+1F4A9 (pile of poo) 6XRegExp('(?A)^\\pS$').test('\uD83D\uDCA9'); // -> true 7 8// Implicit flag A 9XRegExp.install('astral'); 10XRegExp('^\\pS$').test('????'); // -> true
Opting in to astral mode disables the use of \p{…}
and \P{…}
within character classes. In astral mode, use e.g. (\pL|[0-9_])+
instead of [\pL0-9_]+
.
XRegExp uses Unicode 13.0.0.
Build regular expressions using named subpatterns, for readability and pattern reuse:
1const time = XRegExp.build('(?x)^ {{hours}} ({{minutes}}) $', { 2 hours: XRegExp.build('{{h12}} : | {{h24}}', { 3 h12: /1[0-2]|0?[1-9]/, 4 h24: /2[0-3]|[01][0-9]/ 5 }), 6 minutes: /^[0-5][0-9]$/ 7}); 8 9time.test('10:59'); // -> true 10XRegExp.exec('10:59', time).minutes; // -> '59'
Named subpatterns can be provided as strings or regex objects. A leading ^
and trailing unescaped $
are stripped from subpatterns if both are present, which allows embedding independently-useful anchored patterns. {{…}}
tokens can be quantified as a single unit. Any backreferences in the outer pattern or provided subpatterns are automatically renumbered to work correctly within the larger combined pattern. The syntax ({{name}})
works as shorthand for named capture via (?<name>{{name}})
. Named subpatterns cannot be embedded within character classes.
Provides tagged template literals that create regexes with XRegExp syntax and flags:
1const h12 = /1[0-2]|0?[1-9]/; 2const h24 = /2[0-3]|[01][0-9]/; 3const hours = XRegExp.tag('x')`${h12} : | ${h24}`; 4const minutes = /^[0-5][0-9]$/; 5// Note that explicitly naming the 'minutes' group is required for named backreferences 6const time = XRegExp.tag('x')`^ ${hours} (?<minutes>${minutes}) $`; 7time.test('10:59'); // -> true 8XRegExp.exec('10:59', time).minutes; // -> '59'
XRegExp.tag does more than just basic interpolation. For starters, you get all the XRegExp syntax and flags. Even better, since XRegExp.tag
uses your pattern as a raw string, you no longer need to escape all your backslashes. And since it relies on XRegExp.build
under the hood, you get all of its extras for free. Leading ^
and trailing unescaped $
are stripped from interpolated patterns if both are present (to allow embedding independently useful anchored regexes), interpolating into a character class is an error (to avoid unintended meaning in edge cases), interpolated patterns are treated as atomic units when quantified, interpolated strings have their special characters escaped, and any backreferences within an interpolated regex are rewritten to work within the overall pattern.
Match recursive constructs using XRegExp pattern strings as left and right delimiters:
1const str1 = '(t((e))s)t()(ing)'; 2XRegExp.matchRecursive(str1, '\\(', '\\)', 'g'); 3// -> ['t((e))s', '', 'ing'] 4 5// Extended information mode with valueNames 6const str2 = 'Here is <div> <div>an</div></div> example'; 7XRegExp.matchRecursive(str2, '<div\\s*>', '</div>', 'gi', { 8 valueNames: ['between', 'left', 'match', 'right'] 9}); 10/* -> [ 11{name: 'between', value: 'Here is ', start: 0, end: 8}, 12{name: 'left', value: '<div>', start: 8, end: 13}, 13{name: 'match', value: ' <div>an</div>', start: 13, end: 27}, 14{name: 'right', value: '</div>', start: 27, end: 33}, 15{name: 'between', value: ' example', start: 33, end: 41} 16] */ 17 18// Omitting unneeded parts with null valueNames, and using escapeChar 19const str3 = '...{1}.\\{{function(x,y){return {y:x}}}'; 20XRegExp.matchRecursive(str3, '{', '}', 'g', { 21 valueNames: ['literal', null, 'value', null], 22 escapeChar: '\\' 23}); 24/* -> [ 25{name: 'literal', value: '...', start: 0, end: 3}, 26{name: 'value', value: '1', start: 4, end: 5}, 27{name: 'literal', value: '.\\{', start: 6, end: 9}, 28{name: 'value', value: 'function(x,y){return {y:x}}', start: 10, end: 37} 29] */ 30 31// Sticky mode via flag y 32const str4 = '<1><<<2>>><3>4<5>'; 33XRegExp.matchRecursive(str4, '<', '>', 'gy'); 34// -> ['1', '<<2>>', '3']
XRegExp.matchRecursive
throws an error if it scans past an unbalanced delimiter in the target string.
In browsers (bundle XRegExp with all of its addons):
1<script src="https://unpkg.com/xregexp/xregexp-all.js"></script>
Using npm:
1npm install xregexp
In Node.js:
1const XRegExp = require('xregexp');
In an AMD loader like RequireJS:
1require({paths: {xregexp: 'xregexp-all'}}, ['xregexp'], (XRegExp) => { 2 console.log(XRegExp.version); 3});
XRegExp copyright 2007-2017 by Steven Levithan. Unicode data generators by Mathias Bynens, adapted from unicode-data. XRegExp's syntax extensions and flags come from Perl, .NET, etc.
All code, including addons, tools, and tests, is released under the terms of the MIT License.
Fork me to show support, fix, and extend.
Learn more at xregexp.com.
XRegExp internally detects if the JS engine supports any of these RegExp flags:
u
(defined in ES6 standard)y
(defined in ES6 standard)g
i
m
These (and other flags registered by XRegExp addons) can be queried via the
XRegExp._registeredFlags()
API, e.g. when you want to include this information in a system diagnostics report which accompanies a user or automated bug report.
Creates an extended regular expression object for matching text with a pattern. Differs from a
native regular expression in that additional syntax and flags are supported. The returned object
is in fact a native RegExp
and works with all native methods.
pattern
: {String|RegExp} Regex pattern string, or an existing regex object to copy.
flags
: {String} (optional) Any combination of flags.
Native flags:
g
- globali
- ignore casem
- multiline anchorsu
- unicode (ES6)y
- sticky (Firefox 3+, ES6)Additional XRegExp flags:
n
- explicit captures
- dot matches all (aka singleline)x
- free-spacing and line comments (aka extended)A
- astral (requires the Unicode Base addon)Flags cannot be provided when constructing one RegExp
from another.
Returns {RegExp} Extended regular expression object.
RegExp
is part of the XRegExp prototype chain (XRegExp.prototype = new RegExp()
).
// With named capture and flag x
XRegExp('(?<year> [0-9]{4} ) -? # year \
(?<month> [0-9]{2} ) -? # month \
(?<day> [0-9]{2} ) # day ', 'x');
// Providing a regex object copies it. Native regexes are recompiled using native (not XRegExp)
// syntax. Copies maintain extended data, are augmented with `XRegExp.prototype` properties, and
// have fresh `lastIndex` properties (set to zero).
XRegExp(/regex/);
The XRegExp version number as a string containing three dot-separated parts. For example, '2.0.0-beta-3'.
Extends XRegExp syntax and allows custom flags. This is used internally and can be used to create XRegExp addons. If more than one token can match the same string, the last added wins.
regex
: {RegExp} Regex object that matches the new token.
handler
: {Function} Function that returns a new pattern string (using native regex syntax)
to replace the matched token within all future XRegExp regexes. Has access to persistent
properties of the regex being built, through this
. Invoked with three arguments:
The handler function becomes part of the XRegExp construction process, so be careful not to construct XRegExps within the function or you will trigger infinite recursion.
options
: {Object} (optional) Options object with optional properties:
scope
{String} Scope where the token applies: 'default 'class or 'all'.flag
{String} Single-character flag that triggers the token. This also registers the
flag, which prevents XRegExp from throwing an 'unknown flag' error when the flag is used.optionalFlags
{String} Any custom flags checked for within the token handler
that are
not required to trigger the token. This registers the flags, to prevent XRegExp from
throwing an 'unknown flag' error when any of the flags are used.reparse
{Boolean} Whether the handler
function's output should not be treated as
final, and instead be reparseable by other tokens (including the current token). Allows
token chaining or deferring.leadChar
{String} Single character that occurs at the beginning of any successful match
of the token (not always applicable). This doesn't change the behavior of the token unless
you provide an erroneous value. However, providing it can increase the token's performance
since the token can be skipped at any positions where this character doesn't appear.// Basic usage: Add \a for the ALERT control code
XRegExp.addToken(
/\\a/,
function() {return '\\x07';},
{scope: 'all'}
);
XRegExp('\\a[\\a-\\n]+').test('\x07\n\x07'); // -> true
// Add the U (ungreedy) flag from PCRE and RE2, which reverses greedy and lazy quantifiers.
// Since `scope` is not specified, it uses 'default' (i.e., transformations apply outside of
// character classes only)
XRegExp.addToken(
/([?*+]|{\d+(?:,\d*)?})(\??)/,
function(match) {return match[1] + (match[2] ? '' : '?');},
{flag: 'U'}
);
XRegExp('a+', 'U').exec('aaa')[0]; // -> 'a'
XRegExp('a+?', 'U').exec('aaa')[0]; // -> 'aaa'
Caches and returns the result of calling XRegExp(pattern, flags)
. On any subsequent call with
the same pattern and flag combination, the cached copy of the regex is returned.
pattern
: {String} Regex pattern string.
flags
: {String} (optional) Any combination of XRegExp flags.
Returns {RegExp} Cached XRegExp object.
while (match = XRegExp.cache('.', 'gs').exec(str)) {
// The regex is compiled once only
}
Intentionally undocumented; used in tests
Escapes any regular expression metacharacters, for use when matching literal strings. The result can safely be used at any point within a regex that uses any flags.
str
: {String} String to escape.
Returns {String} String with regex metacharacters escaped.
1XRegExp.escape('Escaped? <.>'); 2// -> 'Escaped\?\ <\.>'
Executes a regex search in a specified string. Returns a match array or null
. If the provided
regex uses named capture, named backreference properties are included on the match array.
Optional pos
and sticky
arguments specify the search start position, and whether the match
must start at the specified position only. The lastIndex
property of the provided regex is not
used, but is updated for compatibility. Also fixes browser bugs compared to the native
RegExp.prototype.exec
and can be used reliably cross-browser.
str
: {String} String to search.
regex
: {RegExp} Regex to search with.
pos
: {Number} [default: pos=0
] Zero-based index at which to start the search.
sticky
: {Boolean|String} [default: sticky=false
] Whether the match must start at the specified position
only. The string 'sticky'
is accepted as an alternative to true
.
Returns the match array with named backreference properties, or null
.
1// Basic use, with named backreference 2var match = XRegExp.exec('U+2620', XRegExp('U\\+(?<hex>[0-9A-F]{4})')); 3match.hex; // -> '2620' 4 5// With pos and sticky, in a loop 6var pos = 2, result = [], match; 7while (match = XRegExp.exec('<1><2><3><4>5<6>', /<(\d)>/, pos, 'sticky')) { 8 result.push(match[1]); 9 pos = match.index + match[0].length; 10} 11// result -> ['2', '3', '4']
Executes a provided function once per regex match. Searches always start at the beginning of the
string and continue until the end, regardless of the state of the regex's global
property and
initial lastIndex
.
str
: {String} String to search.
regex
: {RegExp} Regex to search with.
callback
: {Function} Function to execute for each match. Invoked with four arguments:
1// Extracts every other digit from a string
2var evens = [];
3XRegExp.forEach('1a2345', /\d/, function(match, i) {
4 if (i % 2) evens.push(+match[0]);
5});
6// evens -> [2, 4]
Copies a regex object and adds flag g
. The copy maintains extended data, is augmented with
XRegExp.prototype
properties, and has a fresh lastIndex
property (set to zero). Native
regexes are not recompiled using XRegExp syntax.
regex
: {RegExp} Regex to globalize.
Returns a copy of the provided regex with flag g
added.
1var globalCopy = XRegExp.globalize(/regex/); 2globalCopy.global; // -> true
Installs optional features according to the specified options. Can be undone using
XRegExp.uninstall
.
options
: {Object|String} Feature options object or feature string.
Enables or disables implicit astral mode opt-in. When enabled, flag A is automatically added to all new regexes created by XRegExp. This causes an error to be thrown when creating regexes if the Unicode Base addon is not available, since flag A is registered by that addon.
astral
: {Boolean} true
to enable; false
to disable.
Native methods to use and restore ('native' is an ES3 reserved keyword).
These native methods are overridden:
exec
: RegExp.prototype.exec
test
: RegExp.prototype.test
match
: String.prototype.match
replace
: String.prototype.replace
split
: String.prototype.split
1// With an options object 2XRegExp.install({ 3 // Enables support for astral code points in Unicode addons (implicitly sets flag A) 4 astral: true, 5 6 // DEPRECATED: Overrides native regex methods with fixed/extended versions 7 natives: true 8}); 9 10// With an options string 11XRegExp.install('astral natives');
Checks whether an individual optional feature is installed.
feature
: {String} Name of the feature to check. One of:
astral
natives
Return a {Boolean} value indicating whether the feature is installed.
1XRegExp.isInstalled('astral');
Returns true
if an object is a regex; false
if it isn't. This works correctly for regexes
created in another frame, when instanceof
and constructor
checks would fail.
value
: {any type allowed} The object to check.
Returns a {Boolean} value indicating whether the object is a RegExp
object.
1XRegExp.isRegExp('string'); // -> false
2XRegExp.isRegExp(/regex/i); // -> true
3XRegExp.isRegExp(RegExp('^', 'm')); // -> true
4XRegExp.isRegExp(XRegExp('(?s).')); // -> true
Returns the first matched string, or in global mode, an array containing all matched strings.
This is essentially a more convenient re-implementation of String.prototype.match
that gives
the result types you actually want (string instead of exec
-style array in match-first mode,
and an empty array instead of null
when no matches are found in match-all mode). It also lets
you override flag g and ignore lastIndex
, and fixes browser bugs.
str
: {String} String to search.
regex
: {RegExp} Regex to search with.
scope
: {String} [default: scope='one'
] Use 'one'
to return the first match as a string. Use 'all'
to
return an array of all matched strings. If not explicitly specified and regex
uses flag g
,
scope
is 'all'
.
Returns a {String} in match-first mode: First match as a string, or null
.
Returns an {Array} in match-all mode: Array of all matched strings, or an empty array.
1// Match first 2XRegExp.match('abc', /\w/); // -> 'a' 3XRegExp.match('abc', /\w/g, 'one'); // -> 'a' 4XRegExp.match('abc', /x/g, 'one'); // -> null 5 6// Match all 7XRegExp.match('abc', /\w/g); // -> ['a 'b 'c'] 8XRegExp.match('abc', /\w/, 'all'); // -> ['a 'b 'c'] 9XRegExp.match('abc', /x/, 'all'); // -> []
Retrieves the matches from searching a string using a chain of regexes that successively search
within previous matches. The provided chain
array can contain regexes and or objects with
regex
and backref
properties. When a backreference is specified, the named or numbered
backreference is passed forward to the next regex or returned.
str
: {String} String to search.
chain
: {Array} Regexes that each search for matches within preceding results.
Returns an {Array} of matches by the last regex in the chain, or an empty array.
1// Basic usage; matches numbers within <b> tags 2XRegExp.matchChain('1 <b>2</b> 3 <b>4 a 56</b>', [ XRegExp('(?is)<b>.*?</b>'), /\d+/]); 3// -> ['2', '4', '56'] 4 5// Passing forward and returning specific backreferences 6html = '<a href="http://xregexp.com/api/">XRegExp</a>\ 7 <a href="http://www.google.com/">Google</a>'; 8XRegExp.matchChain(html, [ 9 {regex: /<a href="([^"]+)">/i, backref: 1}, 10 {regex: XRegExp('(?i)^https?://(?<domain>[^/?#]+)'), backref: 'domain'} 11]); 12// -> ['xregexp.com', 'www.google.com']
Returns a new string with one or all matches of a pattern replaced. The pattern can be a string
or regex, and the replacement can be a string or a function to be called for each match. To
perform a global search and replace, use the optional scope
argument or include flag g if using
a regex. Replacement strings can use ${n}
for named and numbered backreferences. Replacement
functions can use named backreferences via arguments[0].name
. Also fixes browser bugs compared
to the native String.prototype.replace
and can be used reliably cross-browser.
str
: {String} String to search.
search
: {RegExp|String} Search pattern to be replaced.
replacement
: {String|Function} Replacement string or a function invoked to create it.
Replacement strings can include special replacement syntax:
$$
- Inserts a literal $
character.$&
, $0
- Inserts the matched substring.$'
- Inserts the string that follows the matched substring (right context).$n
, $nn
- Where n
/nn
are digits referencing an existent capturing group, inserts
backreference n
/nn
.${n}
- Where n
is a name or any number of digits that reference an existent capturing
group, inserts backreference n
.Replacement functions are invoked with three or more arguments:
scope
: {String} [default: scope='one'
] Use 'one'
to replace the first match only, or 'all'
. If not
explicitly specified and using a regex with flag g
, scope
is 'all'
.
Returns a new string with one or all matches replaced.
1// Regex search, using named backreferences in replacement string 2var name = XRegExp('(?<first>\\w+) (?<last>\\w+)'); 3XRegExp.replace('John Smith', name, '${last}, ${first}'); 4// -> 'Smith, John' 5 6// Regex search, using named backreferences in replacement function 7XRegExp.replace('John Smith', name, function(match) { 8 return match.last + ', ' + match.first; 9}); 10// -> 'Smith, John' 11 12// String search, with replace-all 13XRegExp.replace('RegExp builds RegExps', 'RegExp', 'XRegExp', 'all'); 14// -> 'XRegExp builds XRegExps'
Performs batch processing of string replacements. Used like XRegExp.replace
, but accepts an
array of replacement details. Later replacements operate on the output of earlier replacements.
Replacement details are accepted as an array with a regex or string to search for, the
replacement string or function, and an optional scope of 'one' or 'all'. Uses the XRegExp
replacement text syntax, which supports named backreference properties via ${name}
.
str
: {String} String to search.
replacements
: {Array} Array of replacement detail arrays.
Return a new string with all replacements.
1str = XRegExp.replaceEach(str, [ 2 [XRegExp('(?<name>a)'), 'z${name}'], 3 [/b/gi, 'y'], 4 [/c/g, 'x', 'one'], // scope 'one' overrides /g 5 [/d/, 'w', 'all'], // scope 'all' overrides lack of /g 6 ['e', 'v', 'all'], // scope 'all' allows replace-all for strings 7 [/f/g, function($0) { 8 return $0.toUpperCase(); 9 }] 10]);
Splits a string into an array of strings using a regex or string separator. Matches of the
separator are not included in the result array. However, if separator
is a regex that contains
capturing groups, backreferences are spliced into the result each time separator
is matched.
Fixes browser bugs compared to the native String.prototype.split
and can be used reliably
cross-browser.
str
: {String} String to split.
separator
: {RegExp|String} Regex or string to use for separating the string.
limit
: {Number} (optional) Maximum number of items to include in the result array.
Returns an array of substrings.
1// Basic use 2XRegExp.split('a b c', ' '); 3// -> ['a', 'b', 'c'] 4 5// With limit 6XRegExp.split('a b c', ' ', 2); 7// -> ['a', 'b'] 8 9// Backreferences in result array 10XRegExp.split('..word1..', /([a-z]+)(\d+)/i); 11// -> ['..', 'word', '1', '..']
Executes a regex search in a specified string. Returns true
or false
. Optional pos
and
sticky
arguments specify the search start position, and whether the match must start at the
specified position only. The lastIndex
property of the provided regex is not used, but is
updated for compatibility. Also fixes browser bugs compared to the native
RegExp.prototype.test
and can be used reliably cross-browser.
str
: {String} String to search.
regex
: {RegExp} Regex to search with.
pos
: {Number} [default: pos=0
] Zero-based index at which to start the search.
sticky
: {Boolean|String} [default: sticky=false
] Whether the match must start at the specified position
only. The string 'sticky'
is accepted as an alternative to true
.
Returns a {Boolean} value indicating whether the regex matched the provided value.
1// Basic use 2XRegExp.test('abc', /c/); // -> true 3 4// With pos and sticky 5XRegExp.test('abc', /c/, 0, 'sticky'); // -> false 6XRegExp.test('abc', /c/, 2, 'sticky'); // -> true
Uninstalls optional features according to the specified options. All optional features start out
uninstalled, so this is used to undo the actions of XRegExp.install
.
options
: {Object|String} Feature options object or features string. These features are supported:
astral
natives
1// With an options object
2XRegExp.uninstall({
3 // Disables support for astral code points in Unicode addons
4 astral: true,
5
6 // DEPRECATED: Restores native regex methods
7 natives: true
8});
9
10// With an options string
11XRegExp.uninstall('astral natives');
Returns an XRegExp object that is the concatenation of the given patterns. Patterns can be provided as
regex objects or strings. Metacharacters are escaped in patterns provided as strings.
Backreferences in provided regex objects are automatically renumbered to work correctly within
the larger combined pattern. Native flags used by provided regexes are ignored in favor of the
flags
argument.
patterns
: {Array} Regexes and strings to combine.
separator
: {String|RegExp} Regex or string to use as the joining separator.
flags
: {String} (optional) Any combination of XRegExp flags.
Returns the union regexp of the provided regexes and strings.
1XRegExp.join(['a+b*c', /(dogs)\1/, /(cats)\1/], 'i'); 2// -> /a\+b\*c(dogs)\1(cats)\2/i
Returns an XRegExp object that is the union of the given patterns. Patterns can be provided as
regex objects or strings. Metacharacters are escaped in patterns provided as strings.
Backreferences in provided regex objects are automatically renumbered to work correctly within
the larger combined pattern. Native flags used by provided regexes are ignored in favor of the
flags
argument.
patterns
: {Array} Regexes and strings to combine.
flags
: {String} (optional) Any combination of XRegExp flags.
Returns the union regexp of the provided regexes and strings.
1XRegExp.union(['a+b*c', /(dogs)\1/, /(cats)\1/], 'i'); 2// -> /a\+b\*c|(dogs)\1|(cats)\2/i
Calling XRegExp.install('natives')
uses this to override the native methods.
Adds named capture support (with backreferences returned as result.name
), and fixes browser
bugs in the native RegExp.prototype.exec
. Calling XRegExp.install('natives')
uses this to
override the native method. Use via XRegExp.exec
without overriding natives.
str
: {String} String to search.
Returns the match array with named backreference properties, or null
.
Fixes browser bugs in the native RegExp.prototype.test
. Calling XRegExp.install('natives')
uses this to override the native method.
str
: {String} String to search.
Returns a {Boolean} value indicating whether the regex matched the provided value.
Adds named capture support (with backreferences returned as result.name
), and fixes browser
bugs in the native String.prototype.match
. Calling XRegExp.install('natives')
uses this to
override the native method.
regex
: {RegExp|*} Regex to search with. If not a regex object, it is passed to the RegExp
constructor.
Returns an array of match strings or null
, if regex
uses flag g
.
Returns the result of calling regex.exec(this)
, if regex
was without flag g
.
Adds support for ${n}
tokens for named and numbered backreferences in replacement text, and
provides named backreferences to replacement functions as arguments[0].name
. Also fixes browser
bugs in replacement text syntax when performing a replacement using a nonregex search value, and
the value of a replacement regex's lastIndex
property during replacement iterations and upon
completion. Calling XRegExp.install('natives')
uses this to override the native method. Note
that this doesn't support SpiderMonkey's proprietary third (flags
) argument. Use via
XRegExp.replace
without overriding natives.
search
: {RegExp|String} Search pattern to be replaced.
replacement
: {String|Function} Replacement string or a function invoked to create it.
Returns a new string with one or all matches replaced.
Fixes browser bugs in the native String.prototype.split
. Calling XRegExp.install('natives')
uses this to override the native method. Use via XRegExp.split
without overriding natives.
separator
: {RegExp|String} Regex or string to use for separating the string.
limit
: {Number} (optional) Maximum number of items to include in the result array.
Returns an array of substrings.
Letter escapes that natively match literal characters: \a
, \A
, etc. These should be
SyntaxErrors but are allowed in web reality. XRegExp makes them errors for cross-browser
consistency and to reserve their syntax, but lets them be superseded by addons.
1XRegExp.addToken( 2 /\\([ABCE-RTUVXYZaeg-mopqyz]|c(?![A-Za-z])|u(?![\dA-Fa-f]{4}|{[\dA-Fa-f]+})|x(?![\dA-Fa-f]{2}))/, ...
Unicode code point escape with curly braces: \u{N..}
. N..
is any one or more digit
hexadecimal number from 0-10FFFF, and can include leading zeros. Requires the native ES6 u
flag
to support code points greater than U+FFFF. Avoids converting code points above U+FFFF to
surrogate pairs (which could be done without flag u
), since that could lead to broken behavior
if you follow a \u{N..}
token that references a code point above U+FFFF with a quantifier, or
if you use the same in a character class.
1XRegExp.addToken(
2 /\\u{([\dA-Fa-f]+)}/, ...
Empty character class: []
or [^]
. This fixes a critical cross-browser syntax inconsistency.
Unless this is standardized (per the ES spec), regex syntax can't be accurately parsed because
character class endings can't be determined.
1XRegExp.addToken( 2 /\[(\^?)\]/, ...
Comment pattern: (?# )
. Inline comments are an alternative to the line comments allowed in
free-spacing mode (flag x
).
1XRegExp.addToken(
2 /\(\?#[^)]*\)/, ...
Whitespace and line comments, in free-spacing mode (aka extended mode, flag x
) only.
1XRegExp.addToken(
2 /\s+|#[^\n]*\n?/, ...
s
flag)Dot, in dotall mode (aka singleline mode, flag s
) only.
1XRegExp.addToken( 2 /\./, 3 function() { 4 return '[\\s\\S]'; 5 }, 6 { 7 flag: 's', 8 leadChar: '.' 9 } 10);
Named backreference: \k<name>
. Backreference names can use the characters A-Z, a-z, 0-9, _,
and $ only. Also allows numbered backreferences as \k<n>
.
1XRegExp.addToken(
2 /\\k<([\w$]+)>/, ...
Numbered backreference or octal, plus any following digits: \0
, \11
, etc. Octals except \0
not followed by 0-9 and backreferences to unopened capture groups throw an error. Other matches
are returned unaltered. IE < 9 doesn't support backreferences above \99
in regex syntax.
1XRegExp.addToken(
2 /\\(\d+)/, ...
Named capturing group; match the opening delimiter only: (?<name>
. Capture names can use the
characters A-Z, a-z, 0-9, _, and $ only. Names can't be integers. Supports Python-style
(?P<name>
as an alternate syntax to avoid issues in some older versions of Opera which natively
supported the Python-style syntax. Otherwise, XRegExp might treat numbered backreferences to
Python-style named capture as octals.
1XRegExp.addToken(
2 /\(\?P?<([\w$]+)>/, ...
n
flag)Capturing group; match the opening parenthesis only. Required for support of named capturing
groups. Also adds explicit capture mode (flag n
).
1XRegExp.addToken( 2 /\((?!\?)/, 3 { 4 optionalFlags: 'n', 5 leadChar: '(' 6 }
Builds regexes using named subpatterns, for readability and pattern reuse. Backreferences in
the outer pattern and provided subpatterns are automatically renumbered to work correctly.
Native flags used by provided subpatterns are ignored in favor of the flags
argument.
pattern
: {String} XRegExp pattern using {{name}}
for embedded subpatterns. Allows
({{name}})
as shorthand for (?<name>{{name}})
. Patterns cannot be embedded within
character classes.
subs
: {Object} Lookup object for named subpatterns. Values can be strings or regexes. A
leading ^
and trailing unescaped $
are stripped from subpatterns, if both are present.
flags
: {String} (optional) Any combination of XRegExp flags.
Returns a regexp with interpolated subpatterns.
1var time = XRegExp.build('(?x)^ {{hours}} ({{minutes}}) $', { 2 hours: XRegExp.build('{{h12}} : | {{h24}}', { 3 h12: /1[0-2]|0?[1-9]/, 4 h24: /2[0-3]|[01][0-9]/ 5 }, 'x'), 6 minutes: /^[0-5][0-9]$/ 7}); 8time.test('10:59'); // -> true 9XRegExp.exec('10:59', time).minutes; // -> '59'
Returns an array of match strings between outermost left and right delimiters, or an array of objects with detailed match parts and position data. An error is thrown if delimiters are unbalanced within the data.
str
: {String} String to search.
left
: {String} Left delimiter as an XRegExp pattern.
right
: {String} Right delimiter as an XRegExp pattern.
flags
: {String} (optional) Any native or XRegExp flags, used for the left and right delimiters.
options
: {Object} (optional) Lets you specify valueNames
and escapeChar
options.
Returns an array of matches, or an empty array.
1// Basic usage 2var str = '(t((e))s)t()(ing)'; 3XRegExp.matchRecursive(str, '\\(', '\\)', 'g'); 4// -> ['t((e))s', ' ', 'ing'] 5 6// Extended information mode with valueNames 7str = 'Here is <div> <div>an</div></div> example'; 8XRegExp.matchRecursive(str, '<div\\s*>', '</div>', 'gi', { 9 valueNames: ['between', 'left', 'match', 'right'] 10}); 11// -> [ 12// {name: 'between', value: 'Here is ', start: 0, end: 8}, 13// {name: 'left', value: '<div>', start: 8, end: 13}, 14// {name: 'match', value: ' <div>an</div>', start: 13, end: 27}, 15// {name: 'right', value: '</div>', start: 27, end: 33}, 16// {name: 'between', value: ' example', start: 33, end: 41} 17// ] 18 19// Omitting unneeded parts with null valueNames, and using escapeChar 20str = '...{1}.\\{{function(x,y){return {y:x}}}'; 21XRegExp.matchRecursive(str, '{', '}', 'g', { 22 valueNames: ['literal', null, 'value', null], 23 escapeChar: '\\' 24}); 25// -> [ 26// {name: 'literal', value: '...', start: 0, end: 3}, 27// {name: 'value', value: '1', start: 4, end: 5}, 28// {name: 'literal', value: '.\\{', start: 6, end: 9}, 29// {name: 'value', value: 'function(x,y){return {y:x};}', start: 10, end: 37} 30// ] 31 32// Sticky mode via flag y 33str = '<1><<<2>>><3>4<5>'; 34XRegExp.matchRecursive(str, '<', '>', 'gy'); 35// -> ['1', '<<2>>', '3']
\p{..}
, \P{..}
, \p{^..}
, \pC
) & astral mode (A
flag)XRegExp adds base support for Unicode matching:
Adds syntax \p{..}
for matching Unicode tokens. Tokens can be inverted using \P{..}
or
\p{^..}
. Token names ignore case, spaces, hyphens, and underscores. You can omit the
braces for token names that are a single letter (e.g. \pL
or PL
).
Adds flag A
(astral), which enables 21-bit Unicode support.
Adds the XRegExp.addUnicodeData
method used by other addons to provide character data.
Unicode Base relies on externally provided Unicode character data. Official addons are
available to provide data for Unicode categories, scripts, blocks, and properties via
XRegExp.addToken()
API.
Adds to the list of Unicode tokens that XRegExp regexes can match via \p
or \P
.
data
{Array} Objects with named character ranges. Each object may have properties
name
, alias
, isBmpLast
, inverseOf
, bmp
, and astral
. All but name
are
optional, although one of bmp
or astral
is required (unless inverseOf
is set). If
astral
is absent, the bmp
data is used for BMP and astral modes. If bmp
is absent,
the name errors in BMP mode but works in astral mode. If both bmp
and astral
are
provided, the bmp
data only is used in BMP mode, and the combination of bmp
and
astral
data is used in astral mode. isBmpLast
is needed when a token matches orphan
high surrogates and uses surrogate pairs to match astral code points. The bmp
and
astral
data should be a combination of literal characters and \xHH
or \uHHHH
escape
sequences, with hyphens to create ranges. Any regex metacharacters in the data should be
escaped, apart from range-creating hyphens. The astral
data can additionally use
character classes and alternation, and should use surrogate pairs to represent astral code
points. inverseOf
can be used to avoid duplicating character data if a Unicode token is
defined as the exact inverse of another token.
1// Basic use 2XRegExp.addUnicodeData([{ 3 name: 'XDigit', 4 alias: 'Hexadecimal', 5 bmp: '0-9A-Fa-f' 6}]); 7XRegExp('\\p{XDigit}:\\p{Hexadecimal}+').test('0:3D'); // -> true
'Unofficial/Unsupported API': interface may be subject to change between any XRegExp releases; used in tests and addons; suitable for advanced users of the library only.
Returns a reference to the internal registered flags object, where each flag is a hash key:
1var flags = XRegExp._registeredFlags(); 2assert(flags['u'], 'expected native Unicode support');
'Unofficial/Unsupported API': interface may be subject to change between any XRegExp releases; used in tests and addons; suitable for advanced users of the library only.
Enables or disables native method overrides.
on
: {Boolean} true
to enable; false
to disable.
Used internally by the XRegExp.install()
and XRegExp.uninstall()
APIs; setNatives()
is itself not accessibly externally (private function).
'Unofficial/Unsupported API': interface may be subject to change between any XRegExp releases; used in tests and addons; suitable for advanced users of the library only.
Check if the regex flag is supported natively in your environment.
Returns {Boolean}.
Developer Note:
Can't check based on the presence of properties/getters since browsers might support such properties even when they don't support the corresponding flag in regex construction (tested in Chrome 48, where
'unicode' in /x/
is true but trying to construct a regex with flagu
throws an error)
'Unofficial/Unsupported API': interface may be subject to change between any XRegExp releases; used in tests and addons; suitable for advanced users of the library only.
Converts hexadecimal to decimal.
hex
: {String}
Returns {Number}
'Unofficial/Unsupported API': interface may be subject to change between any XRegExp releases; used in tests and addons; suitable for advanced users of the library only.
Converts decimal to hexadecimal.
dec
: {Number|String}
Returns {String}
'Unofficial/Unsupported API': interface may be subject to change between any XRegExp releases; used in tests and addons; suitable for advanced users of the library only.
Adds leading zeros if shorter than four characters. Used for fixed-length hexadecimal values.
str
: {String}
Returns {String}
'Unofficial/Unsupported API': interface may be subject to change between any XRegExp releases; used in tests and addons; suitable for advanced users of the library only.
Return a reference to the internal Unicode definition structure for the given Unicode Property
if the given name is a legal Unicode Property for use in XRegExp \p
or \P
regex constructs.
name
: {String} Name by which the Unicode Property may be recognized (case-insensitive),
e.g. 'N'
or 'Number'
.
The given name is matched against all registered Unicode Properties and Property Aliases.
Token names are case insensitive, and any spaces, hyphens, and underscores are ignored.
Returns {Object} reference to definition structure when the name matches a Unicode Property;
false
when the name does not match any Unicode Property or Property Alias.
For more info on Unicode Properties, see also http://unicode.org/reports/tr18/#Categories.
This method is not part of the officially documented and published API and is meant 'for advanced use only' where userland code wishes to re-use the (large) internal Unicode structures set up by XRegExp as a single point of Unicode 'knowledge' in the application.
See some example usage of this functionality, used as a boolean check if the given name is legal and to obtain internal structural data:
function prepareMacros(...)
in https://github.com/GerHobbelt/jison-lex/blob/master/regexp-lexer.js#L885function generateRegexesInitTableCode(...)
in https://github.com/GerHobbelt/jison-lex/blob/master/regexp-lexer.js#L1999Note that the second function in the example (function generateRegexesInitTableCode(...)
)
uses a approach without using this API to obtain a Unicode range spanning regex for use in environments
which do not support XRegExp by simply expanding the XRegExp instance to a String through
the map()
mapping action and subsequent join()
.
XRegExp adds support for all Unicode blocks. Block names use the prefix 'In'. E.g.
\p{InBasicLatin}
. Token names are case insensitive, and any spaces, hyphens, and
underscores are ignored.
Currently XRegExp supports the Unicode 8.0.0 block names listed below:
InAegean_Numbers
InAhom
InAlchemical_Symbols
InAlphabetic_Presentation_Forms
InAnatolian_Hieroglyphs
InAncient_Greek_Musical_Notation
InAncient_Greek_Numbers
InAncient_Symbols
InArabic
InArabic_Extended_A
InArabic_Mathematical_Alphabetic_Symbols
InArabic_Presentation_Forms_A
InArabic_Presentation_Forms_B
InArabic_Supplement
InArmenian
InArrows
InAvestan
InBalinese
InBamum
InBamum_Supplement
InBasic_Latin
InBassa_Vah
InBatak
InBengali
InBlock_Elements
InBopomofo
InBopomofo_Extended
InBox_Drawing
InBrahmi
InBraille_Patterns
InBuginese
InBuhid
InByzantine_Musical_Symbols
InCarian
InCaucasian_Albanian
InChakma
InCham
InCherokee
InCherokee_Supplement
InCJK_Compatibility
InCJK_Compatibility_Forms
InCJK_Compatibility_Ideographs
InCJK_Compatibility_Ideographs_Supplement
InCJK_Radicals_Supplement
InCJK_Strokes
InCJK_Symbols_and_Punctuation
InCJK_Unified_Ideographs
InCJK_Unified_Ideographs_Extension_A
InCJK_Unified_Ideographs_Extension_B
InCJK_Unified_Ideographs_Extension_C
InCJK_Unified_Ideographs_Extension_D
InCJK_Unified_Ideographs_Extension_E
InCombining_Diacritical_Marks
InCombining_Diacritical_Marks_Extended
InCombining_Diacritical_Marks_for_Symbols
InCombining_Diacritical_Marks_Supplement
InCombining_Half_Marks
InCommon_Indic_Number_Forms
InControl_Pictures
InCoptic
InCoptic_Epact_Numbers
InCounting_Rod_Numerals
InCuneiform
InCuneiform_Numbers_and_Punctuation
InCurrency_Symbols
InCypriot_Syllabary
InCyrillic
InCyrillic_Extended_A
InCyrillic_Extended_B
InCyrillic_Supplement
InDeseret
InDevanagari
InDevanagari_Extended
InDingbats
InDomino_Tiles
InDuployan
InEarly_Dynastic_Cuneiform
InEgyptian_Hieroglyphs
InElbasan
InEmoticons
InEnclosed_Alphanumeric_Supplement
InEnclosed_Alphanumerics
InEnclosed_CJK_Letters_and_Months
InEnclosed_Ideographic_Supplement
InEthiopic
InEthiopic_Extended
InEthiopic_Extended_A
InEthiopic_Supplement
InGeneral_Punctuation
InGeometric_Shapes
InGeometric_Shapes_Extended
InGeorgian
InGeorgian_Supplement
InGlagolitic
InGothic
InGrantha
InGreek_and_Coptic
InGreek_Extended
InGujarati
InGurmukhi
InHalfwidth_and_Fullwidth_Forms
InHangul_Compatibility_Jamo
InHangul_Jamo
InHangul_Jamo_Extended_A
InHangul_Jamo_Extended_B
InHangul_Syllables
InHanunoo
InHatran
InHebrew
InHigh_Private_Use_Surrogates
InHigh_Surrogates
InHiragana
InIdeographic_Description_Characters
InImperial_Aramaic
InInscriptional_Pahlavi
InInscriptional_Parthian
InIPA_Extensions
InJavanese
InKaithi
InKana_Supplement
InKanbun
InKangxi_Radicals
InKannada
InKatakana
InKatakana_Phonetic_Extensions
InKayah_Li
InKharoshthi
InKhmer
InKhmer_Symbols
InKhojki
InKhudawadi
InLao
InLatin_1_Supplement
InLatin_Extended_A
InLatin_Extended_Additional
InLatin_Extended_B
InLatin_Extended_C
InLatin_Extended_D
InLatin_Extended_E
InLepcha
InLetterlike_Symbols
InLimbu
InLinear_A
InLinear_B_Ideograms
InLinear_B_Syllabary
InLisu
InLow_Surrogates
InLycian
InLydian
InMahajani
InMahjong_Tiles
InMalayalam
InMandaic
InManichaean
InMathematical_Alphanumeric_Symbols
InMathematical_Operators
InMeetei_Mayek
InMeetei_Mayek_Extensions
InMende_Kikakui
InMeroitic_Cursive
InMeroitic_Hieroglyphs
InMiao
InMiscellaneous_Mathematical_Symbols_A
InMiscellaneous_Mathematical_Symbols_B
InMiscellaneous_Symbols
InMiscellaneous_Symbols_and_Arrows
InMiscellaneous_Symbols_and_Pictographs
InMiscellaneous_Technical
InModi
InModifier_Tone_Letters
InMongolian
InMro
InMultani
InMusical_Symbols
InMyanmar
InMyanmar_Extended_A
InMyanmar_Extended_B
InNabataean
InNew_Tai_Lue
InNKo
InNumber_Forms
InOgham
InOl_Chiki
InOld_Hungarian
InOld_Italic
InOld_North_Arabian
InOld_Permic
InOld_Persian
InOld_South_Arabian
InOld_Turkic
InOptical_Character_Recognition
InOriya
InOrnamental_Dingbats
InOsmanya
InPahawh_Hmong
InPalmyrene
InPau_Cin_Hau
InPhags_pa
InPhaistos_Disc
InPhoenician
InPhonetic_Extensions
InPhonetic_Extensions_Supplement
InPlaying_Cards
InPrivate_Use_Area
InPsalter_Pahlavi
InRejang
InRumi_Numeral_Symbols
InRunic
InSamaritan
InSaurashtra
InSharada
InShavian
InShorthand_Format_Controls
InSiddham
InSinhala
InSinhala_Archaic_Numbers
InSmall_Form_Variants
InSora_Sompeng
InSpacing_Modifier_Letters
InSpecials
InSundanese
InSundanese_Supplement
InSuperscripts_and_Subscripts
InSupplemental_Arrows_A
InSupplemental_Arrows_B
InSupplemental_Arrows_C
InSupplemental_Mathematical_Operators
InSupplemental_Punctuation
InSupplemental_Symbols_and_Pictographs
InSupplementary_Private_Use_Area_A
InSupplementary_Private_Use_Area_B
InSutton_SignWriting
InSyloti_Nagri
InSyriac
InTagalog
InTagbanwa
InTags
InTai_Le
InTai_Tham
InTai_Viet
InTai_Xuan_Jing_Symbols
InTakri
InTamil
InTelugu
InThaana
InThai
InTibetan
InTifinagh
InTirhuta
InTransport_and_Map_Symbols
InUgaritic
InUnified_Canadian_Aboriginal_Syllabics
InUnified_Canadian_Aboriginal_Syllabics_Extended
InVai
InVariation_Selectors
InVariation_Selectors_Supplement
InVedic_Extensions
InVertical_Forms
InWarang_Citi
InYi_Radicals
InYi_Syllables
InYijing_Hexagram_Symbols
XRegExp adds support for Unicode's general categories. E.g., \p{Lu}
or \p{Uppercase Letter}
. See
category descriptions in UAX #44 http://unicode.org/reports/tr44/#GC_Values_Table. Token
names are case insensitive, and any spaces, hyphens, and underscores are ignored.
Currently XRegExp supports the Unicode 8.0.0 category names listed below:
Close_Punctuation
Connector_Punctuation
Control
Currency_Symbol
Dash_Punctuation
Decimal_Number
Enclosing_Mark
Final_Punctuation
Format
Initial_Punctuation
Letter
Letter_Number
Line_Separator
Lowercase_Letter
Mark
Math_Symbol
Modifier_Letter
Modifier_Symbol
Nonspacing_Mark
Number
Open_Punctuation
Other
Other_Letter
Other_Number
Other_Punctuation
Other_Symbol
Paragraph_Separator
Private_Use
Punctuation
Separator
Space_Separator
Spacing_Mark
Surrogate
Symbol
Titlecase_Letter
Unassigned
Uppercase_Letter
C
Cc
Cf
Cn
Co
Cs
L
Ll
Lm
Lo
Lt
Lu
M
Mc
Me
Mn
N
Nd
Nl
No
P
Pc
Pd
Pe
Pf
Pi
Po
Ps
S
Sc
Sk
Sm
So
Z
Zl
Zp
Zs
XRegExp adds properties to meet the UTS #18 Level 1 RL1.2 requirements for Unicode regex support. See http://unicode.org/reports/tr18/#RL1.2. Following are definitions of these properties from UAX #44 http://unicode.org/reports/tr44/:
Alphabetic
Characters with the Alphabetic property. Generated from: Lowercase + Uppercase + Lt + Lm + Lo + Nl + Other_Alphabetic
.
Default_Ignorable_Code_Point
For programmatic determination of default ignorable code points. New characters that should be ignored in rendering (unless explicitly supported) will be assigned in these ranges, permitting programs to correctly handle the default rendering of such characters when not otherwise supported.
Lowercase
Characters with the Lowercase property. Generated from: Ll + Other_Lowercase
.
Noncharacter_Code_Point
Code points permanently reserved for internal use.
Uppercase
Characters with the Uppercase property. Generated from: Lu + Other_Uppercase
.
White_Space
Spaces, separator characters and other control characters which should be treated by programming languages as "white space" for the purpose of parsing elements.
The properties ASCII
, Any
, and Assigned
are also included but are not defined in UAX #44.
UTS #18 RL1.2 additionally requires support for Unicode scripts and general categories. These are
included in XRegExp's Unicode Categories and Unicode Scripts addons.
Token names are case insensitive, and any spaces, hyphens, and underscores are ignored.
Currently XRegExp supports the Unicode 8.0.0 property names listed below:
Alphabetic
Any
ASCII
Default_Ignorable_Code_Point
Lowercase
Noncharacter_Code_Point
Uppercase
White_Space
Next to these, this property name is available as well:
Assigned
This is defined as the inverse of Unicode category Cn
(Unassigned
)
XRegExp adds support for all Unicode scripts. E.g., \p{Latin}
. Token names are case insensitive,
and any spaces, hyphens, and underscores are ignored.
Currently XRegExp supports the Unicode 8.0.0 script names listed below:
Ahom
Anatolian_Hieroglyphs
Arabic
Armenian
Avestan
Balinese
Bamum
Bassa_Vah
Batak
Bengali
Bopomofo
Brahmi
Braille
Buginese
Buhid
Canadian_Aboriginal
Carian
Caucasian_Albanian
Chakma
Cham
Cherokee
Common
Coptic
Cuneiform
Cypriot
Cyrillic
Deseret
Devanagari
Duployan
Egyptian_Hieroglyphs
Elbasan
Ethiopic
Georgian
Glagolitic
Gothic
Grantha
Greek
Gujarati
Gurmukhi
Han
Hangul
Hanunoo
Hatran
Hebrew
Hiragana
Imperial_Aramaic
Inherited
Inscriptional_Pahlavi
Inscriptional_Parthian
Javanese
Kaithi
Kannada
Katakana
Kayah_Li
Kharoshthi
Khmer
Khojki
Khudawadi
Lao
Latin
Lepcha
Limbu
Linear_A
Linear_B
Lisu
Lycian
Lydian
Mahajani
Malayalam
Mandaic
Manichaean
Meetei_Mayek
Mende_Kikakui
Meroitic_Cursive
Meroitic_Hieroglyphs
Miao
Modi
Mongolian
Mro
Multani
Myanmar
Nabataean
New_Tai_Lue
Nko
Ogham
Ol_Chiki
Old_Hungarian
Old_Italic
Old_North_Arabian
Old_Permic
Old_Persian
Old_South_Arabian
Old_Turkic
Oriya
Osmanya
Pahawh_Hmong
Palmyrene
Pau_Cin_Hau
Phags_Pa
Phoenician
Psalter_Pahlavi
Rejang
Runic
Samaritan
Saurashtra
Sharada
Shavian
Siddham
SignWriting
Sinhala
Sora_Sompeng
Sundanese
Syloti_Nagri
Syriac
Tagalog
Tagbanwa
Tai_Le
Tai_Tham
Tai_Viet
Takri
Tamil
Telugu
Thaana
Thai
Tibetan
Tifinagh
Tirhuta
Ugaritic
Vai
Warang_Citi
Yi
Additional token names may be defined via the XRegExp.addUnicodeData(unicodeData)
API.
To regenerate the xregexp-all.js
source file you can simply run the command
1npm run build
in the base directory of the repository.
XRegExp project collaborators are:
Thanks to all contributors and others who have submitted code, provided feedback, reported bugs, and inspired new features.
XRegExp is released under the MIT License. Learn more at xregexp.com.
No vulnerabilities found.
Reason
license file detected
Details
Reason
binaries present in source code
Details
Reason
Found 0/30 approved changesets -- score normalized to 0
Reason
0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0
Reason
no SAST tool detected
Details
Reason
no effort to earn an OpenSSF best practices badge detected
Reason
security policy file not detected
Details
Reason
branch protection not enabled on development/release branches
Details
Reason
project is not fuzzed
Details
Reason
105 existing vulnerabilities detected
Details
Score
Last Scanned on 2024-12-23
The Open Source Security Foundation is a cross-industry collaboration to improve the security of open source software (OSS). The Scorecard provides security health metrics for open source projects.
Learn More