Gathering detailed insights and metrics for string-comparison
Gathering detailed insights and metrics for string-comparison
Gathering detailed insights and metrics for string-comparison
Gathering detailed insights and metrics for string-comparison
compare-timing-safe
String comparison in length constant time
hash-equals
Timing attack safe string comparison
locale-compare
Locale-aware string comparison with Intl.Collator and localeCompare fallback
@medly-components/utils
This library consists of the most commonly used functionalities and components for the rest of the library. Eg String Helper has functions to convert a string to camelCase, case insensitive string comparison etc. Check docs [here](https://medly.github.io/
🤠A library implementing different string similarity using JavaScript.
npm install string-comparison
Module System
Min. Node Version
Typescript Support
Node Version
NPM Version
50 Stars
73 Commits
5 Forks
2 Watching
2 Branches
3 Contributors
Updated on 25 Nov 2024
Minified
Minified + Gzipped
TypeScript (100%)
Cumulative downloads
Total Downloads
Last day
3%
5,101
Compared to previous day
Last week
18.9%
33,134
Compared to previous week
Last month
10.5%
122,561
Compared to previous month
Last year
432.5%
904,288
Compared to previous year
JavaScript implementation of tdebatty/java-string-similarity
A library implementing different string similarity, distance and sortMatch measures. A dozen of algorithms (including Levenshtein edit distance and sibblings, Longest Common Subsequence, cosine similarity etc.) are currently implemented. Check the summary table below for the complete list...
download
1npm install string-comparison --save 2yarn add string-comparison 3pnpm add string-comparison
usage
1let stringComparison = require('string-comparison') 2// or import stringComparison from 'string-comparison' 3 4const Thanos = 'healed' 5const Rival = 'sealed' 6const Avengers = ['edward', 'sealed', 'theatre'] 7 8// use by cosine 9let cos = stringComparison.Cosine 10 11console.log(cos.similarity(Thanos, Rival)) 12console.log(cos.distance(Thanos, Rival)) 13console.log(cos.sortMatch(Thanos, Avengers)) 14
The main characteristics of each implemented algorithm are presented below. The "cost" column gives an estimation of the computational cost to compute the similarity between two strings of length m and n respectively.
Measure(s) | Normalized? | Metric? | Type | Cost | Typical usage | |
---|---|---|---|---|---|---|
Jaccard index | similarity distance sortMatch | Yes | Yes | Set | O(m+n) | |
Cosine similarity | similarity distance sortMatch | Yes | No | Profile | O(m+n) | |
Sorensen-Dice coefficient | similarity distance sortMatch | Yes | No | Set | O(m+n) | |
Levenshtein | similarity distance sortMatch | No | Yes | O(m*n) | ||
Jaro-Winkler | similarity distance sortMatch | Yes | No | O(m*n) | typo correction |
Although the topic might seem simple, a lot of different algorithms exist to measure text similarity or distance. Therefore the library defines some interfaces to categorize them.
The Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other.
It is a metric string distance. This implementation uses dynamic programming (Wagner–Fischer algorithm), with only 2 rows of data. The space requirement is thus O(m) and the algorithm runs in O(m.n).
1import { levenshtein } from "string-comparison" 2import type {SortMatchResultType} from "string-comparison" 3 4const Thanos = 'healed' 5const Rival = 'sealed' 6const Avengers = ['edward', 'sealed', 'theatre'] 7 8console.log(levenshtein.similarity(Thanos, Rival)) 9console.log(levenshtein.distance(Thanos, Rival)) 10console.log(levenshtein.sortMatch(Thanos, Avengers) as SortMatchResultType) 11 12// output 130.8333333333333334 141 15[ 16 { member: 'edward', index: 0, rating: 0.16666666666666663 }, 17 { member: 'theatre', index: 2, rating: 0.4285714285714286 }, 18 { member: 'sealed', index: 1, rating: 0.8333333333333334 } 19]
The longest common subsequence (LCS) problem consists in finding the longest subsequence common to two (or more) sequences. It differs from problems of finding common substrings: unlike substrings, subsequences are not required to occupy consecutive positions within the original sequences.
It is used by the diff utility, by Git for reconciling multiple changes, etc.
The LCS distance between strings X (of length n) and Y (of length m) is n + m - 2 |LCS(X, Y)| min = 0 max = n + m
LCS distance is equivalent to Levenshtein distance when only insertion and deletion is allowed (no substitution), or when the cost of the substitution is the double of the cost of an insertion or deletion.
This class implements the dynamic programming approach, which has a space requirement O(m.n), and computation cost O(m.n).
In "Length of Maximal Common Subsequences", K.S. Larsen proposed an algorithm that computes the length of LCS in time O(log(m).log(n)). But the algorithm has a memory requirement O(m.n²) and was thus not implemented here.
1import { longestCommonSubsequence } from "string-comparison" 2or 3import { lcs } from "string-comparison" 4 5 6const Thanos = 'healed' 7const Rival = 'sealed' 8const Avengers = ['edward', 'sealed', 'theatre'] 9 10console.log(lcs.similarity(Thanos, Rival)) 11console.log(lcs.distance(Thanos, Rival)) 12console.log(lcs.sortMatch(Thanos, Avengers)) 13 14// output 150.8333333333333334 162 17[ 18 { member: 'edward', index: 0, rating: 0.5 }, 19 { member: 'theatre', index: 2, rating: 0.6153846153846154 }, 20 { member: 'sealed', index: 1, rating: 0.8333333333333334 } 21]
Distance metric based on Longest Common Subsequence, from the notes "An LCS-based string metric" by Daniel Bakkelund. http://heim.ifi.uio.no/~danielry/StringMetric.pdf
The distance is computed as 1 - |LCS(s1, s2)| / max(|s1|, |s2|)
1import { metricLcs } from "string-comparison" 2or 3import { mlcs } from "string-comparison" 4 5const Thanos = 'healed' 6const Rival = 'sealed' 7const Avengers = ['edward', 'sealed', 'theatre'] 8 9console.log(metricLcs.similarity(Thanos, Rival)) 10console.log(metricLcs.distance(Thanos, Rival)) 11console.log(metricLcs.sortMatch(Thanos, Avengers)) 12 13// output 140.8333333333333334 150.16666666666666663 16[ 17 { member: 'edward', index: 0, rating: 0.5 }, 18 { member: 'theatre', index: 2, rating: 0.5714285714285714 }, 19 { member: 'sealed', index: 1, rating: 0.8333333333333334 } 20]
Like Q-Gram distance, the input strings are first converted into sets of n-grams (sequences of n characters, also called k-shingles), but this time the cardinality of each n-gram is not taken into account. Each input string is simply a set of n-grams. The Jaccard index is then computed as |V1 inter V2| / |V1 union V2|.
Distance is computed as 1 - similarity. Jaccard index is a metric distance.
1import { cosine } from "string-comparison"
Similar to Jaccard index, but this time the similarity is computed as 2 * |V1 inter V2| / (|V1| + |V2|).
Distance is computed as 1 - similarity.
1import { diceCoefficient } from "string-comparison"
The Jaro-Winkler similarity is a string metric measuring edit distance between two strings. Jaro – Winkler Similarity is much similar to Jaro Similarity. They both differ when the prefix of two string match. Jaro – Winkler Similarity uses a prefix scale ‘p’ which gives a more accurate answer when the strings have a common prefix up to a defined maximum length l.
1import { jaroWinkler } from "string-comparison"
cosine
diceCoefficient
jaccardIndex
levenshtein
lcs
= longestCommonSubsequence
mlcs
= metricLcs
jaroWinkler
similarity
.distance
.sortMatch
Implementing algorithms define a similarity between strings
Return a similarity between 0.0 and 1.0
Implementing algorithms define a distance between strings (0 means strings are identical)
thanos
[String]rival
[String]Return a number
Return an array of objects - SortMatchResultType
ex:
1[ 2 { member: 'edward', rating: 0.16666666666666663 }, 3 { member: 'theatre', rating: 0.4285714285714286 }, 4 { member: 'mailed', rating: 0.5 }, 5 { member: 'sealed', rating: 0.8333333333333334 } 6]
No vulnerabilities found.
No security vulnerabilities found.