npmpackage.info

Gathering detailed insights and metrics for string-comparison

Other packages similar to string-comparison

string-natural-compare

3.0.1

Compare alphanumeric strings the same way a human would, using a natural order algorithm

leven

4.0.0

Measure the difference between two strings using the Levenshtein distance algorithm

fastest-levenshtein

1.0.16

Fastest Levenshtein distance implementation in JS.

string-similarity

4.0.4

Finds degree of similarity between strings, based on Dice's Coefficient, which is mostly better than Levenshtein distance.

Gathering detailed insights and metrics for string-comparison

string-comparison - 1.3.0 | npmpackage.info

string-comparison

🤠A library implementing different string similarity using JavaScript.

1.3.0

MIT

TypeScript

1.53 kB

2,298,702

Installations

npm install string-comparison

Developer Guide

BETA

Typescript

Yes

Module System

CommonJS, ESM

Min. Node Version

^16.0.0 || >=18.0.0

Score

99.5

Supply Chain

99.4

Quality

75.7

Maintenance

100

Vulnerability

100

License

Pull Requests

Open

0

Total

13

Closed

0

Merged

13

Issues

Open

1

Total

6

Closed

5

Releases

v1.1.0

Updated on Feb 18, 2022

v1.0.9

Updated on Sep 28, 2020

v1.0.8

Updated on Sep 28, 2020

View All 3 releases

Languages

TypeScript

TypeScript (100%)

Developer

Rabbitzzc

Download Statistics

Total Downloads

2,298,702

Last Day

2,591

Last Week

50,218

Last Month

207,742

Last Year

1,626,681

GitHub Statistics

MIT License

54 Stars

74 Commits

5 Forks

2 Watchers

2 Branches

3 Contributors

Updated on Apr 22, 2025

Bundle Size

4.77 kB

Minified

1.53 kB

Minified + Gzipped

Bundlephobia

Maintainers

View All 3 Contributors

Package Meta Information

Latest Version

1.3.0

Package Id

string-comparison@1.3.0

Unpacked Size

32.99 kB

Size

9.14 kB

File Count

Published on

Nov 29, 2023

Total Downloads

Cumulative downloads

Total Downloads

2,298,702

Last Day

-3.4%

2,591

Compared to previous day

Last Week

-3.8%

50,218

Compared to previous week

Last Month

9.6%

207,742

Compared to previous month

Last Year

343.2%

1,626,681

Compared to previous year

Weekly Downloads

Monthly Downloads

Yearly Downloads

Dev Dependencies

@swc/core @types/mocha @types/node @typescript-eslint/eslint-plugin @typescript-eslint/parser async eslint eslint-config-alloy eslint-config-prettier eslint-plugin-prettier mocha npm-run-all prettier ts-node tsup typescript

string-comparison

npm bundle size npm

JavaScript implementation of tdebatty/java-string-similarity

A library implementing different string similarity, distance and sortMatch measures. A dozen of algorithms (including Levenshtein edit distance and sibblings, Longest Common Subsequence, cosine similarity etc.) are currently implemented. Check the summary table below for the complete list...

string-comparison

Download & Usage

download

1npm install string-comparison --save
2yarn add string-comparison
3pnpm add string-comparison

usage

1let stringComparison = require('string-comparison')
2// or import stringComparison from 'string-comparison'
3
4const Thanos = 'healed'
5const Rival = 'sealed'
6const Avengers = ['edward', 'sealed', 'theatre']
7
8// use by cosine
9let cos = stringComparison.Cosine
10
11console.log(cos.similarity(Thanos, Rival))
12console.log(cos.distance(Thanos, Rival))
13console.log(cos.sortMatch(Thanos, Avengers))
14

OverView

The main characteristics of each implemented algorithm are presented below. The "cost" column gives an estimation of the computational cost to compute the similarity between two strings of length m and n respectively.

	Measure(s)	Normalized?	Metric?	Type	Cost	Typical usage
Jaccard index	similarity distance sortMatch	Yes	Yes	Set	O(m+n)
Cosine similarity	similarity distance sortMatch	Yes	No	Profile	O(m+n)
Sorensen-Dice coefficient	similarity distance sortMatch	Yes	No	Set	O(m+n)
Levenshtein	similarity distance sortMatch	No	Yes		O(m*n)
Jaro-Winkler	similarity distance sortMatch	Yes	No		O(m*n)	typo correction

Normalized, metric, similarity and distance

Although the topic might seem simple, a lot of different algorithms exist to measure text similarity or distance. Therefore the library defines some interfaces to categorize them.

(Normalized) similarity and distance

StringSimilarity : Implementing algorithms define a similarity between strings (0 means strings are completely different).
NormalizedStringSimilarity : Implementing algorithms define a similarity between 0.0 and 1.0, like Jaro-Winkler for example.
StringDistance : Implementing algorithms define a distance between strings (0 means strings are identical), like Levenshtein for example. The maximum distance value depends on the algorithm.
NormalizedStringDistance : This interface extends StringDistance. For implementing classes, the computed distance value is between 0.0 and 1.0. NormalizedLevenshtein is an example of NormalizedStringDistance.

Levenshtein

The Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other.

It is a metric string distance. This implementation uses dynamic programming (Wagner–Fischer algorithm), with only 2 rows of data. The space requirement is thus O(m) and the algorithm runs in O(m.n).

1import { levenshtein } from "string-comparison"
2import type {SortMatchResultType} from "string-comparison"
3
4const Thanos = 'healed'
5const Rival = 'sealed'
6const Avengers = ['edward', 'sealed', 'theatre']
7
8console.log(levenshtein.similarity(Thanos, Rival))
9console.log(levenshtein.distance(Thanos, Rival))
10console.log(levenshtein.sortMatch(Thanos, Avengers) as SortMatchResultType)
11
12// output
130.8333333333333334
141
15[
16  { member: 'edward', index: 0, rating: 0.16666666666666663 },
17  { member: 'theatre', index: 2, rating: 0.4285714285714286 },
18  { member: 'sealed', index: 1, rating: 0.8333333333333334 }
19]

Longest Common Subsequence

The longest common subsequence (LCS) problem consists in finding the longest subsequence common to two (or more) sequences. It differs from problems of finding common substrings: unlike substrings, subsequences are not required to occupy consecutive positions within the original sequences.

It is used by the diff utility, by Git for reconciling multiple changes, etc.

The LCS distance between strings X (of length n) and Y (of length m) is n + m - 2 |LCS(X, Y)| min = 0 max = n + m

LCS distance is equivalent to Levenshtein distance when only insertion and deletion is allowed (no substitution), or when the cost of the substitution is the double of the cost of an insertion or deletion.

This class implements the dynamic programming approach, which has a space requirement O(m.n), and computation cost O(m.n).

In "Length of Maximal Common Subsequences", K.S. Larsen proposed an algorithm that computes the length of LCS in time O(log(m).log(n)). But the algorithm has a memory requirement O(m.n²) and was thus not implemented here.

1import { longestCommonSubsequence } from "string-comparison"
2or 
3import { lcs } from "string-comparison"
4
5
6const Thanos = 'healed'
7const Rival = 'sealed'
8const Avengers = ['edward', 'sealed', 'theatre']
9
10console.log(lcs.similarity(Thanos, Rival))
11console.log(lcs.distance(Thanos, Rival))
12console.log(lcs.sortMatch(Thanos, Avengers))
13
14// output
150.8333333333333334
162
17[
18  { member: 'edward', index: 0, rating: 0.5 },
19  { member: 'theatre', index: 2, rating: 0.6153846153846154 },
20  { member: 'sealed', index: 1, rating: 0.8333333333333334 }
21]

Metric Longest Common Subsequence

Distance metric based on Longest Common Subsequence, from the notes "An LCS-based string metric" by Daniel Bakkelund. http://heim.ifi.uio.no/~danielry/StringMetric.pdf

The distance is computed as 1 - |LCS(s1, s2)| / max(|s1|, |s2|)

1import { metricLcs } from "string-comparison"
2or 
3import { mlcs } from "string-comparison"
4
5const Thanos = 'healed'
6const Rival = 'sealed'
7const Avengers = ['edward', 'sealed', 'theatre']
8
9console.log(metricLcs.similarity(Thanos, Rival))
10console.log(metricLcs.distance(Thanos, Rival))
11console.log(metricLcs.sortMatch(Thanos, Avengers))
12
13// output
140.8333333333333334
150.16666666666666663
16[
17  { member: 'edward', index: 0, rating: 0.5 },
18  { member: 'theatre', index: 2, rating: 0.5714285714285714 },
19  { member: 'sealed', index: 1, rating: 0.8333333333333334 }
20]

Cosine similarity

Like Q-Gram distance, the input strings are first converted into sets of n-grams (sequences of n characters, also called k-shingles), but this time the cardinality of each n-gram is not taken into account. Each input string is simply a set of n-grams. The Jaccard index is then computed as |V1 inter V2| / |V1 union V2|.

Distance is computed as 1 - similarity. Jaccard index is a metric distance.

1import { cosine } from "string-comparison"

Sorensen-Dice coefficient

Similar to Jaccard index, but this time the similarity is computed as 2 * |V1 inter V2| / (|V1| + |V2|).

Distance is computed as 1 - similarity.

1import { diceCoefficient } from "string-comparison"

Jaro-Winkler similarity

The Jaro-Winkler similarity is a string metric measuring edit distance between two strings. Jaro – Winkler Similarity is much similar to Jaro Similarity. They both differ when the prefix of two string match. Jaro – Winkler Similarity uses a prefix scale ‘p’ which gives a more accurate answer when the strings have a common prefix up to a defined maximum length l.

1import { jaroWinkler } from "string-comparison"

API

cosine
diceCoefficient
jaccardIndex
levenshtein
lcs = longestCommonSubsequence
mlcs = metricLcs
jaroWinkler

Methods

similarity.
distance.
sortMatch

similarity

Implementing algorithms define a similarity between strings

params

thanos [String]
rival [String]

return

Return a similarity between 0.0 and 1.0

distance

Implementing algorithms define a distance between strings (0 means strings are identical)

params

thanos [String]
rival [String]

return

Return a number

sortMatch

params

thanos [String]
avengers [...String]

return

Return an array of objects - SortMatchResultType ex:

1[
2  { member: 'edward', rating: 0.16666666666666663 },
3  { member: 'theatre', rating: 0.4285714285714286 },
4  { member: 'mailed', rating: 0.5 },
5  { member: 'sealed', rating: 0.8333333333333334 }
6]

CHANGELOG

MIT

No vulnerabilities found.

No security vulnerabilities found.