JS implementation of probabilistic data structures: Bloom Filter (and its derived), HyperLogLog, Count-Min Sketch, Top-K and MinHash
Installations
npm install bloom-filters
Releases
v3.0.4
Published on 21 Nov 2024
v3.0.3
Published on 13 Nov 2024
v3.0.2
Published on 11 Jun 2024
Bugfixes - v3.0.1
Published on 03 Oct 2023
Scalable Bloom Filters, Optimize Partitioned BF and bug fixes
Published on 30 Mar 2022
XOR filters, BitSet-backed Bloom filters, and import/export backward compatibility
Published on 18 Feb 2022
Developer
Developer Guide
Module System
CommonJS
Min. Node Version
>=12
Typescript Support
No
Node Version
NPM Version
Statistics
378 Stars
228 Commits
43 Forks
4 Watching
6 Branches
9 Contributors
Updated on 26 Nov 2024
Languages
TypeScript (62.07%)
JavaScript (37.93%)
Total Downloads
Cumulative downloads
Total Downloads
12,238,463
Last day
-6.1%
39,560
Compared to previous day
Last week
4.8%
217,084
Compared to previous week
Last month
23.2%
879,731
Compared to previous month
Last year
105.2%
7,979,616
Compared to previous year
Daily Downloads
Weekly Downloads
Monthly Downloads
Yearly Downloads
Dependencies
8
Bloom-Filters
JavaScript/TypeScript implementation of probabilistic data structures: Bloom Filter (and its derived), HyperLogLog, Count-Min Sketch, Top-K and MinHash. This package relies on non-cryptographic hash functions.
Keywords: bloom filter, cuckoo filter, KyperLogLog, MinHash, Top-K, probabilistic data-structures, XOR-Filter.
❗️Compatibility❗️
- Be carefull when migrating from a version to another.
- Bug fixes were introduced in
1.3.7
and from1.3.9
to2.0.0+
for hashing and indexing data. Then, you must re-build completely your filters from start to be compatible with the new versions. - To keep the
breaking changes
rule of npm versions we will make now newmajored versions
since 1.3.9 whenever a modification is done on the hashing/indexing system or breaks the current API.
Table of contents
- Installation
- Data structures
- Export and import
- Seeding and Hashing
- Documentation
- Tests
- References
- Changelog
- License
Installation
1npm install bloom-filters --save
Supported platforms
- Node.js: v4.0.0 or higher
- Google Chrome: v41 or higher
- Mozilla Firefox: v34 or higher
- Microsoft Edge: v12 or higher
Data structures
Classic Bloom Filter
A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not.
Reference: Bloom, B. H. (1970). Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7), 422-426. (Full text article)
Methods
add(element: HashableInput) -> void
: add an element into the filter.has(element: HashableInput) -> boolean
: Test an element for membership, returning False if the element is definitively not in the filter and True is the element might be in the filter.equals(other: BloomFilter) -> boolean
: Test if two filters are equals.rate() -> number
: compute the filter's false positive rate (or error rate).
1const {BloomFilter} = require('bloom-filters') 2// create a Bloom Filter with a size of 10 and 4 hash functions 3let filter = new BloomFilter(10, 4) 4// insert data 5filter.add('alice') 6filter.add('bob') 7 8// lookup for some data 9console.log(filter.has('bob')) // output: true 10console.log(filter.has('daniel')) // output: false 11 12// print the error rate 13console.log(filter.rate()) 14 15// alternatively, create a bloom filter optimal for a number of items and a desired error rate 16const items = ['alice', 'bob'] 17const errorRate = 0.04 // 4 % error rate 18filter = BloomFilter.create(items.length, errorRate) 19 20// or create a bloom filter optimal for a collections of items and a desired error rate 21filter = BloomFilter.from(items, errorRate)
Partitioned Bloom Filter
A Partitioned Bloom Filter is a variation of a classic Bloom Filter.
This filter works by partitioning the M-sized bit array into k slices of size m = M/k
bits, k = nb of hash functions
in the filter.
Each hash function produces an index over m
for its respective slice.
Thus, each element is described by exactly k
bits, meaning the distribution of false positives is uniform across all elements.
Be careful, as a Partitioned Bloom Filter have much higher collison risks that a classic Bloom Filter on small sets of data.
Reference: Chang, F., Feng, W. C., & Li, K. (2004, March). Approximate caches for packet classification. In INFOCOM 2004. Twenty-third AnnualJoint Conference of the IEEE Computer and Communications Societies (Vol. 4, pp. 2196-2207). IEEE. (Full text article)
Methods
add(element: HashableInput) -> void
: add an element into the filter.has(element: HashableInput) -> boolean
: Test an element for membership, returning False if the element is definitively not in the filter and True is the element might be in the filter.equals(other: PartitionedBloomFilter) -> boolean
: Test if two filters are equals.rate() -> number
: compute the filter's false positive rate (or error rate).
1const {PartitionedBloomFilter} = require('bloom-filters') 2 3// create a PartitionedBloomFilter of size 10, with 5 hash functions and a load factor of 0.5 4const filter = new PartitionedBloomFilter(10, 5, 0.5) 5 6// add some value in the filter 7filter.add('alice') 8filter.add('bob') 9 10// lookup for some data 11console.log(filter.has('bob')) // output: true 12console.log(filter.has('daniel')) // output: false 13 14// now use it like a classic bloom filter! 15// ... 16 17// alternatively, create a PartitionedBloomFilter optimal for a number of items and a desired error rate 18const items = ['alice', 'bob'] 19const errorRate = 0.04 // 4 % error rate 20filter = PartitionedBloomFilter.create(items.length, errorRate) 21 22// or create a PartitionedBloomFilter optimal for a collections of items and a desired error rate 23filter = PartitionedBloomFilter.from(items, errorRate)
Scalable Bloom Filter
A Scalable Bloom Filter is a variant of Bloom Filters that can adapt dynamically to the number of elements stored, while assuring a maximum false positive probability
Reference: ALMEIDA, Paulo Sérgio, BAQUERO, Carlos, PREGUIÇA, Nuno, et al. Scalable bloom filters. Information Processing Letters, 2007, vol. 101, no 6, p. 255-261. (Full text article)
This filter use internally Paritionned Bloom Filters.
Methods
add(element: HashableInput) -> void
: add an element into the filter.has(element: HashableInput) -> boolean
: Test an element for membership, returning False if the element is definitively not in the filter and True is the element might be in the filter.equals(other: ScalableBloomFilter) -> boolean
: Test if two filters are equals.capacity():number
-> return the total capacity of this filterrate() -> number
: compute the filter's false positive rate (or error rate).
1const {ScalableBloomFilter} = require('bloom-filters') 2 3// by default it creates an ideally scalable bloom filter for 8 elements with an error rate of 0.01 and a load factor of 0.5 4const filter = new ScalableBloomFilter() 5filter.add('alice') 6filter.add('bob') 7filter.add('carl') 8for (let i = 0; i < 10000; i++) { 9 filter.add('elem:' + i) 10} 11filter.has('somethingwrong') // false 12 13filter.capacity() // total capacity 14filter.rate() // current rate of the current internal filter used
Cuckoo Filter
Cuckoo filters improve on Bloom filters by supporting deletion, limited counting, and bounded False positive rate with similar storage efficiency as a standard Bloom Filter.
Reference: Fan, B., Andersen, D. G., Kaminsky, M., & Mitzenmacher, M. D. (2014, December). Cuckoo filter: Practically better than bloom. In Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies (pp. 75-88). ACM. (Full text article)
Methods
add(element: HashableInput) -> void
: add an element into the filter.remove(element: HashableInput) -> boolean
: delete an element from the filter, returning True if the deletion was a success and False otherwise.has(element: HashableInput) -> boolean
: Test an element for membership, returning False if the element is definitively not in the filter and True is the element might be in the filter.equals(other: CuckooFilter) -> boolean
: Test if two filters are equals.rate() -> number
: compute the filter's false positive rate (or error rate).
1const {CuckooFilter} = require('bloom-filters') 2 3// create a Cuckoo Filter with size = 15, fingerprint length = 3 and bucket size = 2 4const filter = new CuckooFilter(15, 3, 2) 5filter.add('alice') 6filter.add('bob') 7 8// lookup for some data 9console.log(filter.has('bob')) // output: true 10console.log(filter.has('daniel')) // output: false 11 12// remove something 13filter.remove('bob') 14console.log(filter.has('bob')) // output: false 15 16// alternatively, create a Cuckoo Filter optimal for a number of items and a desired error rate 17const items = ['alice', 'bob'] 18const errorRate = 0.04 // 4 % error rate 19filter = CuckooFilter.create(items.length, errorRate) 20 21// or create a Cuckoo Filter optimal for a collections of items and a desired error rate 22filter = CuckooFilter.from(items, errorRate)
WARNING: The error rate cannot be higher than 1 * 10^-18
. Above this value, you will get an exception stating that the fingerprint length is higher than the hash length.
Counting Bloom Filter
A Counting Bloom filter works in a similar manner as a regular Bloom filter; however, it is able to keep track of insertions and deletions. In a counting Bloom filter, each entry in the Bloom filter is a small counter associated with a basic Bloom filter bit.
Reference: F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese, “An Improved Construction for Counting Bloom Filters,” in 14th Annual European Symposium on Algorithms, LNCS 4168, 2006
Methods
add(element: HashableInput) -> void
: add an element into the filter.remove(element: HashableInput) -> boolean
: delete an element from the filter, returning True if the deletion was a success and False otherwise.has(element: HashableInput) -> boolean
: Test an element for membership, returning False if the element is definitively not in the filter and True is the element might be in the filter.equals(other: CountingBloomFilter) -> boolean
: Test if two filters are equals.rate() -> number
: compute the filter's false positive rate (or error rate).
1const CountingBloomFilter = require('bloom-filters').CountingBloomFilter 2 3// create a Bloom Filter with capacity = 15 and 4 hash functions 4let filter = new CountingBloomFilter(15, 4) 5 6// add some value in the filter 7filter.add('alice') 8filter.add('bob') 9filter.add('carole') 10 11// remove some value 12filter.remove('carole') 13 14// lookup for some data 15console.log(filter.has('bob')) // output: true 16console.log(filter.has('carole')) // output: false 17console.log(filter.has('daniel')) // output: false 18 19// print false positive rate (around 0.1) 20console.log(filter.rate()) 21 22// alternatively, create a Counting Bloom Filter optimal for a number of items and a desired error rate 23const items = ['alice', 'bob'] 24const errorRate = 0.04 // 4 % error rate 25filter = CountingBloomFilter.create(items.length, errorRate) 26 27// or create a Counting Bloom Filter optimal for a collections of items and a desired error rate 28filter = CountingBloomFilter.from(items, errorRate)
Count Min Sketch
The Count Min Sketch (CM sketch) is a probabilistic data structure that serves as a frequency table of events in a stream of data. It uses hash functions to map events to frequencies, but unlike a hash table uses only sub-linear space, at the expense of overcounting some events due to collisions.
Reference: Cormode, G., & Muthukrishnan, S. (2005). An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55(1), 58-75. (Full text article)
Methods
update(element: HashableInput, count = 1) -> void
: addcount
occurences of an element into the sketch.count(element: HashableInput) -> number
: estimate the number of occurences of an element.merge(other: CountMinSketch) -> CountMinSketch
: merge occurences of two sketches.equals(other: CountMinSketch) -> boolean
: Test if two sketchs are equals.clone(): CountMinSketch
: Clone the sketch.
1const {CountMinSketch} = require('bloom-filters') 2 3// create a new Count Min sketch with 2048 columns and 1 row 4const sketch = new CountMinSketch(2048, 1) 5 6// push some occurrences in the sketch 7sketch.update('alice') 8sketch.update('alice') 9sketch.update('bob') 10 11// count occurrences 12console.log(sketch.count('alice')) // output: 2 13console.log(sketch.count('bob')) // output: 1 14console.log(sketch.count('daniel')) // output: 0 15 16// alternatively, create a Count Min sketch optimal for a target error rate and probability of accuracy 17const items = ['alice', 'bob'] 18const errorRate = 0.04 // 4 % error rate 19const accuracy = 0.99 // 99% accuracy 20sketch = CountMinSketch.create(errorRate, accuracy) 21 22// or create a Count Min Sketch optimal for a collections of items, 23// a target error rate and probability of accuracy 24sketch = CountMinSketch.from(items, errorRate, accuracy)
HyperLogLog
HyperLogLog is an algorithm for the count-distinct problem, approximating the number of distinct elements in a multiset. Calculating the exact cardinality of a multiset requires an amount of memory proportional to the cardinality, which is impractical for very large data sets. Probabilistic cardinality estimators, such as the HyperLogLog algorithm, use significantly less memory than this, at the cost of obtaining only an approximation of the cardinality.
The HyperLogLog algorithm is able to estimate cardinalities greather than 10e9
with a typical accuracy (standard error) of 2%
, using around 1.5 kB of memory (see reference).
Reference: Philippe Flajolet, Éric Fusy, Olivier Gandouet and Frédéric Meunier (2007). "Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm". Discrete Mathematics and Theoretical Computer Science Proceedings. (Full text article)
Methods
update(element: HashableInput) -> void
: add a new occurence of an element to the sketch.count() -> number
: estimate the number of distinct elements in the sketch.merge(other: HyperLogLog) -> HyperLogLog
: merge occurences of two sketches.equals(other: HyperLogLog) -> boolean
: Test if two sketchs are equals.
1const {HyperLogLog} = require('bloom-filters') 2 3// create a new HyperLogLog with 100 registers 4const sketch = new HyperLogLog(100) 5 6// push some occurrences in the sketch 7sketch.update('alice') 8sketch.update('alice') 9sketch.update('bob') 10 11// count occurrences 12console.log(sketch.count()) 13 14// print accuracy 15console.log(sketch.accuracy())
MinHash
MinHash (or the min-wise independent permutations locality sensitive hashing scheme) is a technique for quickly estimating how similar two sets are. The goal of MinHash is to estimate the Jaccard similarity coefficient, a commonly used indicator of the similarity between two sets, without explicitly computing the intersection and union of the two sets. It does so by computing fixed sized signatures for a set of numbers using randomly generated hash functions.
❗️WARNINGS❗
- A
MinHash
class only acceptsnumbers
(integers and floats) as inputs. - Two MinHash can be compared only if they share the same set of randomly generated hash functions. To ease the creation of MinHash sets, we introduce a
MinHashFactory
class that is able to create MinHash structures that share the same set of hash functions. We recommend most users to rely on the factory, but theMinHash
class remains importable for advanced usage.
Reference: Andrei Z. Broder, "On the resemblance and containment of documents", in Compression and Complexity of Sequences: Proceedings (1997). (Full text article)
MinHashFactory
methods
create() -> MinHash
: create a new empty MinHash structure, using the parameters of the factory.
MinHash
methods
add(element: number) -> void
: add a new element to the set.bulkLoad(elements: number[]) -> void
: efficently add several new elements to the set.isEmpty() -> boolean
: test if the signature of the MinHash is empty.compareWith(other: MinHash) -> number
: estimate the Jaccard similarity coefficient with another MinHash set.
1const {MinHashFactory} = require('bloom-filters') 2 3// create the MinHashFactory, to create several comparable MinHash sets 4// it uses 10 random hash functions and expect to see a maximum value of 999 5const factory = new MinHashFactory(10, 999) 6 7// create two empty MinHash 8const fistSet = factory.create() 9const secondSet = factory.create() 10 11// push some occurrences in the first set 12fistSet.add(1) 13fistSet.add(2) 14 15// the MinHash class also supports bulk loading 16secondSet.bulkLoad([1, 3, 4]) 17 18// estimate the jaccard similarity between the two sets 19const jaccardSim = fistSet.compareWith(secondSet) 20console.log(`The estimated Jaccard similarity is ${jaccardSim}`)
Top-K
Given a multiset of elements, the Top-K problem is to compute the ranking of these elements (by an arbitrary score) and returns the k
results with the highest scores.
This package provides an implementation of the Top-K problem that sort items based on their estimated cardinality in the multiset. It is based on a Count Min Sketch, for estimating the cardinality of items, and a MinHeap, for implementing a sliding window over the k
results with the highest scores.
Items produced by the TopK
class are JavaScript objects with the following content (shown in Typescript notation).
1interface TopkElement { 2 // The element's value 3 value: string 4 // The element's frequency 5 frequency: number 6 // The element's rank in the TopK, ranging from 1 to k 7 rank: number 8}
Methods
add(element: string, count: number = 1) -> void
: add one or more new occurences of an element to the sketch.values() -> Array<TopkElement>
: get the top-k values as an array of objects.iterator() -> Iterator<TopkElement>
: get the top-k values as an iterator that yields objects.
1const {TopK} = require('bloom-filters') 2 3// create a new TopK with k = 10, an error rate of 0.001 and an accuracy of 0.99 4const topk = new TopK(10, 0.001, 0.99) 5 6// push occurrences one-at-a-time in the multiset 7topk.add('alice') 8topk.add('bob') 9topk.add('alice') 10 11// or, equally, push multiple occurrences at-once in the multiset 12// topk.add('alice', 2) 13// topk.add('bob', 1) 14 15// print the top k values 16for (let item of topk.values()) { 17 console.log( 18 `Item "${item.value}" is in position ${item.rank} with an estimated frequency of ${item.frequency}` 19 ) 20} 21// Output: 22// Item "alice" is in position 1 with an estimated frequency of 2 23// Item "bob" is in position 2 with an estimated frequency of 1
Invertible Bloom Filters
An Invertible Bloom Filters (IBLT), also called Invertible Bloom Lookup Table, is a space-efficient and probabilistic data-structure for solving the set-difference problem efficiently without the use of logs or other prior context. It computes the set difference with communication proportional to the size of the difference between the sets being compared. They can simultaneously calculate D(A−B) and D(B−A) using O(d) space. This data structure encodes sets in a fashion that is similar in spirit to Tornado codes’ construction, in that it randomly combines elements using the XOR function.
❗️WARNING❗️ An IBLT only accepts Buffer
as inputs. If you are using bloom-filters
in a Web browser, you might consider using the feros/buffer
package, which provides a polyfill for Buffer
in a browser.
Reference: Eppstein, D., Goodrich, M. T., Uyeda, F., & Varghese, G. (2011). What's the difference?: efficient set reconciliation without prior context. ACM SIGCOMM Computer Communication Review, 41(4), 218-229. (Full text article)
Methods
add(element: Buffer) -> void
: add an element into the filter.remove(element: Buffer) -> void
: delete an element from the filter, returning True if the deletion was a success and False otherwise.has(element: Buffer) -> boolean
: Test an element for membership, returning False if the element is definitively not in the filter and True is the element might be in the filter.equals(other: InvertibleBloomFilter) -> boolean
: Test if two filters are equals.substract(remote: InvertibleBloomFilter)
: peform the XOR substraction of two IBLTs.decode() -> {additional: Buffer[], missing: Buffer[]}
: decode an IBLT.listEntries() -> Generator<Buffer, number, void>
: list all entries in the IBLT using a Generator.
1const {InvertibleBloomFilter} = require('bloom-filters') 2 3const hashcount = 3 4const size = 50 5const iblt = new InvertibleBloomFilter(size, hashcount) 6 7// push some data in the IBLT 8iblt.add(Buffer.from('alice')) 9iblt.add(Buffer.from('42')) 10iblt.add(Buffer.from('help')) 11iblt.add(Buffer.from('meow')) 12iblt.add(Buffer.from('json')) 13 14console.log(ilbt.has(Buffer.from('alice'))) // output: true 15console.log(ilbt.has(Buffer.from('daniel'))) // output: false 16 17iblt.remove(Buffer.from('alice')) 18console.log(ilbt.has(Buffer.from('alice'))) // output: false 19 20// Now, let's demonstrate the decoding power of IBLT! 21const remote = new InvertibleBloomFilter(size, hashcount) 22remote.add(Buffer.from('alice')) 23remote.add(Buffer.from('car')) 24remote.add(Buffer.from('meow')) 25remote.add(Buffer.from('help')) 26 27// decode the difference between the two filters 28const result = iblt.substract(remote).decode() 29 30console.log( 31 `Did we successfully decode the subtracted iblts? ${result.success}. Why? $${result.reason}` 32) 33console.log( 34 `Elements of iblt missing elements from remote: ${result.additional}` 35) 36console.log(`Elements of remote missing elements from iblt: ${result.missing}`) 37 38// alternatively, create an IBLT optimal for a number of items and a desired error rate 39const items = [Buffer.from('alice'), Buffer.from('bob')] 40const errorRate = 0.04 // 4 % error rate 41filter = InvertibleBloomFilter.create(items.length, errorRate) 42 43// or create an IBLT optimal for a collections of items and a desired error rate 44filter = InvertibleBloomFilter.from(items, errorRate)
Tuning the IBLT We recommend to use at least a hashcount of 3 and an alpha of 1.5 for at least 50 differences, which equals to 1.5*50 = 75 cells. Then, if you insert a huge number of values in there, the decoding will work (whatever the number of differences less than 50) but testing the presence of a value is still probabilistic, based on the number of elements inserted (Even for the functions like listEntries). For more details, you should read the seminal research paper on IBLTs (full-text article).
XOR Filter
Available as 8-bits and 16-bits fingerprint length
A XOR Filter is a better space-efficient probabilistic data structure than Bloom Filters. Very usefull for space efficiency of readonly sets.
Reference: Graf, Thomas Mueller, and Daniel Lemire. "Xor filters: Faster and smaller than bloom and cuckoo filters." Journal of Experimental Algorithmics (JEA) 25 (2020): 1-16. (Full text article)
Methods
add(elements: XorHashableInput[]) -> void
: Add elements to the filter. Calling more than once this methods will override the current filter with the new elements.has(element: XorHashableInput) -> boolean
: true/false whether the element is in the set or not.
- Extended input types:
type XorHashableInput = HashableInput | Long
- We use Buffers internally which are exported/imported to/from
base64
strings.
1const {XorFilter} = require('bloom-filters') 2const xor8 = new XorFilter(1) 3xor8.add(['a']) 4xor8.has('a') // true 5xor8.has('b') // false 6// or the combined 7const filter = XorFilter.create(['a']) 8filter.has('a') // true 9// using 16-bits fingerprint length 10XorFilter.create(['a'], 16).has('a') // true 11const a = new XorFilter(1, 16) 12a.add(['a']) 13a.has('a') // true
Export and import
All data structures exposed by this package can be exported and imported to/from JSON:
- Use the method
saveAsJSON()
to export any data structures into a JSON object. - Use the static method
fromJSON(json)
to load a data structure from a JSON object.
1const {BloomFilter} = require('bloom-filters') 2 3const filter = new BloomFilter(15, 0.01) 4filter.add('alice') 5 6// export a bloom filter to JSON 7const exported = filter.saveAsJSON() 8 9// do something with the JSON object (save it as file, send it to a server, etc) 10// ... 11 12// import the same filter from its JSON export 13const importedFilter = BloomFilter.fromJSON(exported) 14console.log(filter.has('alice')) // output: true 15console.log(filter.has('bob')) // output: false
Seeding and Hashing
By default every hash function is seeded with an internal seed which is equal to 0x1234567890
. If you want to change it:
1const { BloomFilter } = require('bloom-filter') 2const bl = new BloomFilter(...) 3console.log(bl.seed) // 78187493520 4bl.seed = 0xABCD 5console.log(bl.seed) // 43981
By default we hash elements using XXH.h64
function from xxhashjs
.
In the case you want to use your own hash functions, you can use your own Hashing class by extending the default one. Example:
1const {BloomFilter, Hashing} = require('bloom-filters') 2 3class CustomHashing extends Hashing { 4 serialize(_element, _seed) { 5 return Number(1) 6 } 7} 8 9const bl = BloomFilter.create(2, 0.01) 10// override just your structure locally 11bl._hashing = new CustomHashing() 12bl.add('a')
See test/utils-test.js
"Use different hash functions" describe close.
Documentation
See documentation online or generate it in directory doc/
with: npm run doc
Tests and Development
- Tests are performed using mocha and nyc (code coverage) on node 12.x, 14.x, 15.x and 16.x for the moment.
- Linting and formatting are made using
prettier
andeslint
When submitting pull requests please follow the following guidance:
- Please open pull requests on the develop branch. Direct contributions to the master branch will be refused without comments
- Add tests when possible in the
test
folder. - Functions, methods, variables and types must be documented using typedoc annotations
- Run
yarn test
(build, lint and run the mocha tests suite)
References
- Classic Bloom Filter: Bloom, B. H. (1970). Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7), 422-426.
- Partitioned Bloom Filter: Chang, F., Feng, W. C., & Li, K. (2004, March). Approximate caches for packet classification. In INFOCOM 2004. Twenty-third AnnualJoint Conference of the IEEE Computer and Communications Societies (Vol. 4, pp. 2196-2207). IEEE.
- Cuckoo Filter: Fan, B., Andersen, D. G., Kaminsky, M., & Mitzenmacher, M. D. (2014, December). Cuckoo filter: Practically better than bloom. In Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies (pp. 75-88). ACM.
- Counting Bloom Filter: F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese, An Improved Construction for Counting Bloom Filters, in 14th Annual European Symposium on Algorithms, LNCS 4168, 2006, pp.
- Count Min Sketch: Cormode, G., & Muthukrishnan, S. (2005). An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55(1), 58-75.
- HyperLogLog: Philippe Flajolet, Éric Fusy, Olivier Gandouet and Frédéric Meunier (2007). "Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm". Discrete Mathematics and Theoretical Computer Science Proceedings.
- MinHash: Andrei Z. Broder, "On the resemblance and containment of documents", in Compression and Complexity of Sequences: Proceedings (1997).
- Invertible Bloom Filters: Eppstein, D., Goodrich, M. T., Uyeda, F., & Varghese, G. (2011). What's the difference?: efficient set reconciliation without prior context. ACM SIGCOMM Computer Communication Review, 41(4), 218-229.
- Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters Thomas Mueller Graf, Daniel Lemire, Journal of Experimental Algorithmics 25 (1), 2020. DOI: 10.1145/3376122
- Scalable Bloom Filters ALMEIDA, Paulo Sérgio, BAQUERO, Carlos, PREGUIÇA, Nuno, et al. Scalable bloom filters. Information Processing Letters, 2007, vol. 101, no 6, p. 255-261.
Changelog
Version | Release date | Major changes |
---|---|---|
v2.1.0 | 03/2022 | - Add Scalable Bloom filters - Use array of BitSet for Partitionned Bloom Filter - Fix wrong MinHash comparison |
v2.0.0 | 02/2022 | - Use correctly double hashing #issue43. - Move all hashing related functions to its specific Hash class in a component of the BaseFilter class. It also allows for overriding the serizalize function for using custom hash functions - Add #PR44 optimizing the BloomFilter internal storage with Uint arrays. - Disable 10.x, 15.x node tests. - Add XorFilter #29 - Add .nextInt32() function to get a new random seeded int 32-bits from the current seed. - Make all properties public for allowing developpers to override everything. |
v1.3.0 | 10/04/2020 | Added the MinHash set |
v1.2.0 | 08/04/2020 | Add the TopK class |
v1.1.0 | 03/04/2020 | Add the HyperLogLog sketch |
v1.0.0 | 23/03/2020 | Rework the whole library using TypeScript, unify the API and fix the documentation |
v0.8.0 | 11/11/2019 | Fix some issues with the cuckoo filter (performances). Fix the global API. It allows now to customize each Filter. If you want to use the old API, use the .create() or .from() functions to match the old api. |
v0.7.1 | 11/09/2019 | Add the Counting Bloom Filter |
v0.7.0 | 01/07/2019 | Move to XXHASH for hashing elements in the library. One property has been added into the exported json _seed which is used to seed every hash of every elements. Update Invertible Bloom Filters with #add, #has, #delete, #listEntries, #substract, #Static.decode methods. Updated the way to get distinct indices which could have collisions in many cases. |
v0.6.1 | 18/06/2019 | Add Invertible Bloom Filters (only #encode/#substract/#Static.decode methods) |
License
No vulnerabilities found.
Reason
no binaries found in the repo
Reason
no dangerous workflow patterns detected
Reason
license file detected
Details
- Info: project has a license file: LICENSE:0
- Info: FSF or OSI recognized license: MIT License: LICENSE:0
Reason
0 existing vulnerabilities detected
Reason
SAST tool detected but not run on all commits
Details
- Info: SAST configuration detected: CodeQL
- Warn: 8 commits out of 26 are checked with a SAST tool
Reason
Found 11/18 approved changesets -- score normalized to 6
Reason
5 commit(s) and 1 issue activity found in the last 90 days -- score normalized to 5
Reason
detected GitHub workflow tokens with excessive permissions
Details
- Info: jobLevel 'actions' permission set to 'read': .github/workflows/codeql-analysis.yml:28
- Info: jobLevel 'contents' permission set to 'read': .github/workflows/codeql-analysis.yml:29
- Warn: no topLevel permission defined: .github/workflows/codeql-analysis.yml:1
- Warn: no topLevel permission defined: .github/workflows/gh_pages_doc.yml:1
- Warn: no topLevel permission defined: .github/workflows/npm_release.yml:1
- Warn: no topLevel permission defined: .github/workflows/npm_test_doc.yml:1
- Info: no jobLevel write permissions found
Reason
dependency not pinned by hash detected -- score normalized to 0
Details
- Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/codeql-analysis.yml:42: update your workflow using https://app.stepsecurity.io/secureworkflow/Callidon/bloom-filters/codeql-analysis.yml/master?enable=pin
- Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/codeql-analysis.yml:46: update your workflow using https://app.stepsecurity.io/secureworkflow/Callidon/bloom-filters/codeql-analysis.yml/master?enable=pin
- Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/codeql-analysis.yml:54: update your workflow using https://app.stepsecurity.io/secureworkflow/Callidon/bloom-filters/codeql-analysis.yml/master?enable=pin
- Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/codeql-analysis.yml:65: update your workflow using https://app.stepsecurity.io/secureworkflow/Callidon/bloom-filters/codeql-analysis.yml/master?enable=pin
- Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/gh_pages_doc.yml:9: update your workflow using https://app.stepsecurity.io/secureworkflow/Callidon/bloom-filters/gh_pages_doc.yml/master?enable=pin
- Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/gh_pages_doc.yml:11: update your workflow using https://app.stepsecurity.io/secureworkflow/Callidon/bloom-filters/gh_pages_doc.yml/master?enable=pin
- Warn: third-party GitHubAction not pinned by hash: .github/workflows/gh_pages_doc.yml:21: update your workflow using https://app.stepsecurity.io/secureworkflow/Callidon/bloom-filters/gh_pages_doc.yml/master?enable=pin
- Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/npm_release.yml:12: update your workflow using https://app.stepsecurity.io/secureworkflow/Callidon/bloom-filters/npm_release.yml/master?enable=pin
- Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/npm_release.yml:14: update your workflow using https://app.stepsecurity.io/secureworkflow/Callidon/bloom-filters/npm_release.yml/master?enable=pin
- Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/npm_test_doc.yml:16: update your workflow using https://app.stepsecurity.io/secureworkflow/Callidon/bloom-filters/npm_test_doc.yml/master?enable=pin
- Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/npm_test_doc.yml:18: update your workflow using https://app.stepsecurity.io/secureworkflow/Callidon/bloom-filters/npm_test_doc.yml/master?enable=pin
- Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/npm_test_doc.yml:22: update your workflow using https://app.stepsecurity.io/secureworkflow/Callidon/bloom-filters/npm_test_doc.yml/master?enable=pin
- Warn: npmCommand not pinned by hash: .github/workflows/npm_test_doc.yml:30
- Info: 0 out of 11 GitHub-owned GitHubAction dependencies pinned
- Info: 0 out of 1 third-party GitHubAction dependencies pinned
- Info: 0 out of 1 npmCommand dependencies pinned
Reason
no effort to earn an OpenSSF best practices badge detected
Reason
security policy file not detected
Details
- Warn: no security policy file detected
- Warn: no security file to analyze
- Warn: no security file to analyze
- Warn: no security file to analyze
Reason
project is not fuzzed
Details
- Warn: no fuzzer integrations found
Score
5.4
/10
Last Scanned on 2024-11-25
The Open Source Security Foundation is a cross-industry collaboration to improve the security of open source software (OSS). The Scorecard provides security health metrics for open source projects.
Learn More