Installations

npm install @naytev/grapheme-splitter

Developer Guide

BETA

Typescript

No

Module System

CommonJS

Node Version

6.2.0

NPM Version

3.8.9 Pull Requests

Open

1

Total

12

Closed

2

Merged

9

Issues

Open

7

Total

21

Closed

14

Releases

Unable to fetch releases

Languages

1

JavaScript

JavaScript (100%)

Developer

orling

Download Statistics

Total Downloads

0

Last Day

0

Last Week

0

Last Month

0

Last Year

0

GitHub Statistics

MIT License

965 Stars

42 Commits

47 Forks

18 Watchers

1 Branches

9 Contributors

Updated on Jul 08, 2025

Maintainers

1

View All 9 Contributors

Package Meta Information

Latest Version

1.0.0

Package Id

@naytev/grapheme-splitter@1.0.0

Size

26.49 kB

NPM Version

3.8.9

Node Version

6.2.0

Total Downloads

Cumulative downloads

Total Downloads

NaN

Last Day

0%

NaN

Compared to previous day

Last Week

0%

NaN

Compared to previous week

Last Month

0%

NaN

Compared to previous month

Last Year

0%

NaN

Compared to previous year

Weekly Downloads

Monthly Downloads

Yearly Downloads

Dev Dependencies

1

tape

Background

In JavaScript there is not always a one-to-one relationship between string characters and what a user would call a separate visual "letter". Some symbols are represented by several characters. This can cause issues when splitting strings and inadvertently cutting a multi-char letter in half, or when you need the actual number of letters in a string.

For example, emoji characters like "🌷","🎁","💩","😜" and "👍" are represented by two JavaScript characters each (high surrogate and low surrogate). That is,

1"🌷".length == 2

What's more, some languages often include combining marks - characters that are used to modify the letters before them. Common examples are the German letter ü and the Spanish letter ñ. Sometimes they can be represented alternatively both as a single character and as a letter + combining mark, with both forms equally valid:

1var two = "ñ"; // unnormalized two-char n+◌̃  , i.e. "\u006E\u0303";
2var one = "ñ"; // normalized single-char, i.e. "\u00F1"
3console.log(one!=two); // prints 'true'

Unicode normalization, as performed by the popular punycode.js library or ECMAScript 6's String.normalize, can sometimes fix those differences and turn two-char sequences into single characters. But it is not enough in all cases. Some languages like Hindi make extensive use of combining marks on their letters, that have no dedicated single-codepoint Unicode sequences, due to the sheer number of possible combinations. For example, the Hindi word "अनुच्छेद" is comprised of 5 letters and 3 combining marks:

अ + न + ु + च + ् + छ + े + द

which is in fact just 5 user-perceived letters:

अ + नु + च् + छे + द

and which Unicode normalization would not combine properly. There are also the unusual letter+combining mark combinations which have no dedicated Unicode codepoint. The string Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ͫ͗͢L̠ͨͧͩ͘G̴̻͈͍͔̹̑͗̎̅͛́Ǫ̵̹̻̝̳͂̌̌͘ obviously has 5 separate letters, but is in fact comprised of 58 JavaScript characters, most of which are combining marks.

Enter the grapheme-splitter.js library. It can be used to properly split JavaScript strings into what a human user would call separate letters (or "extended grapheme clusters" in Unicode terminology), no matter what their internal representation is. It is an implementation of the Unicode UAX-29 standard.

Installation

To install grapheme-splitter to your project, use the NPM command below:

$ npm install --save grapheme-splitter

Tests

To run the tests on grapheme-splitter, use the command below:

$ npm test

Usage

Just initialize and use:

1var splitter = new GraphemeSplitter();
2
3// split the string to an array of grapheme clusters (one string each)
4var graphemes = splitter.splitGraphemes(string);
5
6// or do this if you just need their number
7var graphemeCount = splitter.countGraphemes(string);

Examples

1var splitter = new GraphemeSplitter();
2
3// plain latin alphabet - nothing spectacular
4splitter.splitGraphemes("abcd"); // returns ["a", "b", "c", "d"]
5
6// two-char emojis and four-char country flag
7splitter.splitGraphemes("🌷🎁💩😜👍🇺🇸"); // returns ["🌷","🎁","💩","😜","👍","🇺🇸"]
8
9// diacritics as combining marks, 10 JavaScript chars
10splitter.splitGraphemes("Ĺo͂ře᷒m̅"); // returns ["Ĺ","o͂","ř","e᷒","m̅"]
11
12// individual Korean characters (Jamo), 4 JavaScript chars
13splitter.splitGraphemes("뎌쉐"); // returns ["뎌","쉐"]
14
15// Hindi text with combining marks, 8 JavaScript chars
16splitter.splitGraphemes("अनुच्छेद"); // returns ["अ","नु","च्","छे","द"]
17
18// demonic multiple combining marks, 75 JavaScript chars
19splitter.splitGraphemes("Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ͫ͗͢L̠ͨͧͩ͘G̴̻͈͍͔̹̑͗̎̅͛́Ǫ̵̹̻̝̳͂̌̌͘!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞"); // returns ["Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍","A̴̵̜̰͔ͫ͗͢","L̠ͨͧͩ͘","G̴̻͈͍͔̹̑͗̎̅͛́","Ǫ̵̹̻̝̳͂̌̌͘","!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞"]

Acknowledgements

This library is heavily influenced by Devon Govett's excellent grapheme-breaker CoffeeScript library at https://github.com/devongovett/grapheme-breaker with an emphasis on ease of integration and pure JavaScript implementation.

No vulnerabilities found.

10

Binary-Artifacts

Determines if the project has generated executable (binary) artifacts in the source repository.

10

Vulnerabilities

Determines if the project has open, known unfixed vulnerabilities.

10

License

Determines if the project has defined a license.

3

Code-Review

Determines if the project requires human code review before pull requests (aka merge requests) are merged.

0

Maintained

Determines if the project is "actively maintained".

0

CII-Best-Practices

Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.

0

Security-Policy

Determines if the project has published a security policy.

0

Fuzzing

Determines if the project uses fuzzing.

0

Branch-Protection

Determines if the default and release branches are protected with GitHub's branch protection settings.

0

SAST

Determines if the project uses static code analysis.

Score

3.4

/10

Last Scanned on 2025-07-07

The Open Source Security Foundation is a cross-industry collaboration to improve the security of open source software (OSS). The Scorecard provides security health metrics for open source projects.

Learn More