Gathering detailed insights and metrics for daq-proc
Gathering detailed insights and metrics for daq-proc
Gathering detailed insights and metrics for daq-proc
Gathering detailed insights and metrics for daq-proc
Simple document and query processor that makes search running in the browser and node.js a little better
npm install daq-proc
Breaking change in words-n-numbers: Extracting single emojis instead of emoji words
Published on 20 Apr 2023
New test framework + more transparent functions
Published on 07 Sept 2022
email extraction
Published on 06 May 2021
More robust document processor demo
Published on 30 Apr 2021
Fixing both demos
Published on 30 Apr 2021
Emoji, tags and username extraction + bug-fixes / security issues
Published on 22 Dec 2020
Module System
Min. Node Version
Typescript Support
Node Version
NPM Version
11 Stars
500 Commits
2 Forks
2 Watching
3 Branches
2 Contributors
Updated on 27 Sept 2024
JavaScript (100%)
Cumulative downloads
Total Downloads
Last day
0%
1
Compared to previous day
Last week
300%
24
Compared to previous week
Last month
409.1%
56
Compared to previous month
Last year
-65.4%
465
Compared to previous year
7
1
Simple document and query processor to makes search running in the browser and node.js a little better. Removes stopwords (smaller index and less irrelevant hits), extract keywords to filter on and prepares ngrams for auto-complete functionality.
This library is not creating anything new, but just packaging 6 libraries that goes well togehter into one browser distribution file. Also showing how it may be usefull through tests and the interactive demo.
cheerio
- Here specifically used to extract text from all- or parts of some HTML.eklem-headline-parser
- Determines the most relevant keywords in a headline by considering article contexthit-highlighter
- Higlighting hits from a query in a result item.leven-match
- Calculating Levenshtein match between words in two arrays within given distance. Good for fuzzy matching.ngraminator
- Generate n-grams.stopword
- Removes stopwords from an array of words. To keep your index small and remove all words without a scent of information and/or remove stopwords from the query, making the search engine work less hard to find relevant results.words'n'numbers
- Extract words and optionally numbers from a string of text into arrays. Arrays that can be fed to stopword
, eklem-headline-parser
, leven-match
, ngraminator
and hit-highlighter
.1<script src="https://cdn.jsdelivr.net/npm/daq-proc/dist/daq-proc.umd.min.js"></script> 2 3<script> 4 // exposing the underlying libraries in a transparent way 5 const { 6 load, 7 removeStopwords, _123, afr, ara, hye, eus, ben, bre, bul, cat, zho, hrv, ces, dan, nld, eng, epo, est, fin, fra, glg, deu, ell, guj, hau, heb, hin, hun, ind, gle, ita, jpn, kor, kur, lat, lav, lit, lgg, lggNd, msa, mar, mya, nob, fas, pol, por, porBr, panGu, ron, rus, slk, slv, som, sot, spa, swa, swe, tha, tgl, tur, urd, ukr, vie, yor, zul, 8 extract, words, numbers, emojis, tags, usernames, email, 9 ngraminator, 10 findKeywords, 11 highlight, 12 levenMatch 13} = dqp 14 15 // input 16 const headlineString = 'Document and query processing for the browser!' 17 const bodyString = 'Yay! The day is here =) We now have document and query processing for the browser. It is mostly packaging 4 modules together in a browser distribution file. The modules are words-n-numbers, stopword, ngraminator and eklem-headline-parser' 18 19 // extracting word arrays 20 let headlineArray = extract(headlineString, {regex: [words, numbers], toLowercase: true}) 21 let bodyArray = extract(bodyString, {regex: [words, numbers], toLowercase: true}) 22 console.log('Word arrays: ') 23 console.dir(headlineArray) 24 console.dir(bodyArray) 25 26 // removing stopwords 27 let headlineStopped = removeStopwords(headlineArray) 28 let bodyStopped = removeStopwords(bodyArray) 29 console.log('Stopword removed arrays: ') 30 console.dir(headlineStopped) 31 console.dir(bodyStopped) 32 33 // n-grams 34 let headlineNgrams = ngraminator(headlineStopped, [2,3,4]) 35 let bodyNgrams = ngraminator(bodyStopped, [2,3,4]) 36 console.log('Ngram arrays: ') 37 console.dir(headlineNgrams) 38 console.dir(bodyNgrams) 39 40 // calculating important keywords 41 let keywords = findKeywords(headlineStopped, bodyStopped, 5) 42 console.log('Keyword array: ') 43 console.dir(keywords) 44</script>
1<script src="https://cdn.jsdelivr.net/npm/daq-proc/dist/daq-proc.umd.min.js"></script> 2 3<script> 4 // exposing the underlying libraries in a transparent way 5 const { 6 highlight, 7 levenMatch 8 } = dqp 9 10 const query = ['interesting', 'words'] 11 const searchResult = ['some', 'interesting', 'words', 'to', 'remember'] 12 13 highlight(query, searchResult) 14 // returns: 15 // 'some <span class="highlighted">interesting words</span> to remember' 16 17 const index = ['return', 'all', 'word', 'matches', 'between', 'two', 'arrays', 'within', 'given', 'levenshtein', 'distance', 'intended', 'use', 'is', 'to', 'words', 'in', 'a', 'query', 'that', 'has', 'an', 'index', 'good', 'for', 'autocomplete', 'type', 'functionality,', 'and', 'some', 'cases', 'also', 'searching'] 18 const query = ['qvery', 'words', 'levensthein'] 19 20 levenMatch(query, index, {distance: 2}) 21 // returns: 22 //[ [ 'query' ], [ 'word', 'words' ], [ 'levenshtein' ] ] 23</script>
It's fully possible to use on Node.js too. The tests are both for Node.js and the browser. It's only wrapping 6 libraries for the ease of use in the browser, but could come in handy for i.e. simple crawler scenarios.
No vulnerabilities found.
No security vulnerabilities found.