Gathering detailed insights and metrics for @naytev/grapheme-splitter
Gathering detailed insights and metrics for @naytev/grapheme-splitter
Gathering detailed insights and metrics for @naytev/grapheme-splitter
Gathering detailed insights and metrics for @naytev/grapheme-splitter
A JavaScript library that breaks strings into their individual user-perceived characters.
npm install @naytev/grapheme-splitter
Typescript
Module System
Node Version
NPM Version
JavaScript (100%)
Total Downloads
0
Last Day
0
Last Week
0
Last Month
0
Last Year
0
MIT License
965 Stars
42 Commits
47 Forks
18 Watchers
1 Branches
9 Contributors
Updated on Jul 08, 2025
Latest Version
1.0.0
Package Id
@naytev/grapheme-splitter@1.0.0
Size
26.49 kB
NPM Version
3.8.9
Node Version
6.2.0
Cumulative downloads
Total Downloads
Last Day
0%
NaN
Compared to previous day
Last Week
0%
NaN
Compared to previous week
Last Month
0%
NaN
Compared to previous month
Last Year
0%
NaN
Compared to previous year
1
In JavaScript there is not always a one-to-one relationship between string characters and what a user would call a separate visual "letter". Some symbols are represented by several characters. This can cause issues when splitting strings and inadvertently cutting a multi-char letter in half, or when you need the actual number of letters in a string.
For example, emoji characters like "🌷","🎁","💩","😜" and "👍" are represented by two JavaScript characters each (high surrogate and low surrogate). That is,
1"🌷".length == 2
What's more, some languages often include combining marks - characters that are used to modify the letters before them. Common examples are the German letter ü and the Spanish letter ñ. Sometimes they can be represented alternatively both as a single character and as a letter + combining mark, with both forms equally valid:
1var two = "ñ"; // unnormalized two-char n+◌̃ , i.e. "\u006E\u0303"; 2var one = "ñ"; // normalized single-char, i.e. "\u00F1" 3console.log(one!=two); // prints 'true'
Unicode normalization, as performed by the popular punycode.js library or ECMAScript 6's String.normalize, can sometimes fix those differences and turn two-char sequences into single characters. But it is not enough in all cases. Some languages like Hindi make extensive use of combining marks on their letters, that have no dedicated single-codepoint Unicode sequences, due to the sheer number of possible combinations. For example, the Hindi word "अनुच्छेद" is comprised of 5 letters and 3 combining marks:
अ + न + ु + च + ् + छ + े + द
which is in fact just 5 user-perceived letters:
अ + नु + च् + छे + द
and which Unicode normalization would not combine properly. There are also the unusual letter+combining mark combinations which have no dedicated Unicode codepoint. The string Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ͫ͗͢L̠ͨͧͩ͘G̴̻͈͍͔̹̑͗̎̅͛́Ǫ̵̹̻̝̳͂̌̌͘ obviously has 5 separate letters, but is in fact comprised of 58 JavaScript characters, most of which are combining marks.
Enter the grapheme-splitter.js library. It can be used to properly split JavaScript strings into what a human user would call separate letters (or "extended grapheme clusters" in Unicode terminology), no matter what their internal representation is. It is an implementation of the Unicode UAX-29 standard.
To install grapheme-splitter
to your project, use the NPM command below:
$ npm install --save grapheme-splitter
To run the tests on grapheme-splitter
, use the command below:
$ npm test
Just initialize and use:
1var splitter = new GraphemeSplitter(); 2 3// split the string to an array of grapheme clusters (one string each) 4var graphemes = splitter.splitGraphemes(string); 5 6// or do this if you just need their number 7var graphemeCount = splitter.countGraphemes(string);
1var splitter = new GraphemeSplitter(); 2 3// plain latin alphabet - nothing spectacular 4splitter.splitGraphemes("abcd"); // returns ["a", "b", "c", "d"] 5 6// two-char emojis and four-char country flag 7splitter.splitGraphemes("🌷🎁💩😜👍🇺🇸"); // returns ["🌷","🎁","💩","😜","👍","🇺🇸"] 8 9// diacritics as combining marks, 10 JavaScript chars 10splitter.splitGraphemes("Ĺo͂ře᷒m̅"); // returns ["Ĺ","o͂","ř","e᷒","m̅"] 11 12// individual Korean characters (Jamo), 4 JavaScript chars 13splitter.splitGraphemes("뎌쉐"); // returns ["뎌","쉐"] 14 15// Hindi text with combining marks, 8 JavaScript chars 16splitter.splitGraphemes("अनुच्छेद"); // returns ["अ","नु","च्","छे","द"] 17 18// demonic multiple combining marks, 75 JavaScript chars 19splitter.splitGraphemes("Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ͫ͗͢L̠ͨͧͩ͘G̴̻͈͍͔̹̑͗̎̅͛́Ǫ̵̹̻̝̳͂̌̌͘!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞"); // returns ["Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍","A̴̵̜̰͔ͫ͗͢","L̠ͨͧͩ͘","G̴̻͈͍͔̹̑͗̎̅͛́","Ǫ̵̹̻̝̳͂̌̌͘","!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞"]
This library is heavily influenced by Devon Govett's excellent grapheme-breaker CoffeeScript library at https://github.com/devongovett/grapheme-breaker with an emphasis on ease of integration and pure JavaScript implementation.
No vulnerabilities found.
Reason
no binaries found in the repo
Reason
0 existing vulnerabilities detected
Reason
license file detected
Details
Reason
Found 7/21 approved changesets -- score normalized to 3
Reason
0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0
Reason
no effort to earn an OpenSSF best practices badge detected
Reason
security policy file not detected
Details
Reason
project is not fuzzed
Details
Reason
branch protection not enabled on development/release branches
Details
Reason
SAST tool is not run on all commits -- score normalized to 0
Details
Score
Last Scanned on 2025-07-07
The Open Source Security Foundation is a cross-industry collaboration to improve the security of open source software (OSS). The Scorecard provides security health metrics for open source projects.
Learn More