npmpackage.info

Gathering detailed insights and metrics for daq-proc

Other packages similar to daq-proc

proc-log

5.0.0

just emit 'log' events on the process object

spawnd

10.1.4

Spawn a dependent child process.

@opentelemetry/resource-detector-container

0.5.1

Opentelemetry resource detector to get container resource attributes

tinyexec

0.3.1

A minimal library for executing processes in Node

Gathering detailed insights and metrics for daq-proc

daq-proc

Simple document and query processor that makes search running in the browser and node.js a little better

8.0.0

MIT

JavaScript

4,801

Installations

npm install daq-proc

Pull Requests

Open

1

Total

277

Closed

73

Merged

203

Issues

Open

7

Total

57

Closed

50

Releases

Breaking change in words-n-numbers: Extracting single emojis instead of emoji words

v8.0.0

Published on 20 Apr 2023

New test framework + more transparent functions

v7.0.1

Published on 07 Sept 2022

email extraction

v6.0.0

Published on 06 May 2021

Developer

eklem

Developer Guide

BETA

Module System

CommonJS, UMD

Min. Node Version

Typescript Support

No

Node Version

18.10.0

NPM Version

9.6.4 Statistics

11 Stars

500 Commits

2 Forks

2 Watching

3 Branches

2 Contributors

Updated on 27 Sept 2024

Languages

JavaScript (100%)

Total Downloads

Cumulative downloads

Total Downloads

4,801

Last day

Compared to previous day

Last week

300%

Compared to previous week

Last month

409.1%

Compared to previous month

Last year

-65.4%

465

Compared to previous year

Daily Downloads

Weekly Downloads

Monthly Downloads

Yearly Downloads

Dependencies

cheerio eklem-headline-parser hit-highlighter leven-match ngraminator stopword words-n-numbers

Dev Dependencies

batr

Versions

daq-proc

Simple document and query processor to makes search running in the browser and node.js a little better. Removes stopwords (smaller index and less irrelevant hits), extract keywords to filter on and prepares ngrams for auto-complete functionality.

Demo

document processor. It showcases the document processor end. Just add some words and figure it out.
query processor. Showcases hit highlighting and truncating text if needed. Possible to turn fuzzy matching on/off.

This library is not creating anything new, but just packaging 6 libraries that goes well togehter into one browser distribution file. Also showing how it may be usefull through tests and the interactive demo.

Libraries that daq-proc is depending on

cheerio - Here specifically used to extract text from all- or parts of some HTML.
eklem-headline-parser - Determines the most relevant keywords in a headline by considering article context
hit-highlighter - Higlighting hits from a query in a result item.
leven-match - Calculating Levenshtein match between words in two arrays within given distance. Good for fuzzy matching.
ngraminator - Generate n-grams.
stopword - Removes stopwords from an array of words. To keep your index small and remove all words without a scent of information and/or remove stopwords from the query, making the search engine work less hard to find relevant results.
words'n'numbers - Extract words and optionally numbers from a string of text into arrays. Arrays that can be fed to stopword, eklem-headline-parser, leven-match, ngraminator and hit-highlighter.

Browser

Example - document processing side

1<script src="https://cdn.jsdelivr.net/npm/daq-proc/dist/daq-proc.umd.min.js"></script>
2
3<script>
4  // exposing the underlying libraries in a transparent way
5  const {
6    load,
7    removeStopwords, _123, afr, ara, hye, eus, ben, bre, bul, cat, zho, hrv, ces, dan, nld, eng, epo, est, fin, fra, glg, deu, ell, guj, hau, heb, hin, hun, ind, gle, ita, jpn, kor, kur, lat, lav, lit, lgg, lggNd, msa, mar, mya, nob, fas, pol, por, porBr, panGu, ron, rus, slk, slv, som, sot, spa, swa, swe, tha, tgl, tur, urd, ukr, vie, yor, zul,
8    extract, words, numbers, emojis, tags, usernames, email,
9    ngraminator,
10    findKeywords,
11    highlight,
12    levenMatch
13} = dqp
14
15  // input
16  const headlineString = 'Document and query processing for the browser!'
17  const bodyString = 'Yay! The day is here =) We now have document and query processing for the browser. It is mostly packaging 4 modules together in a browser distribution file. The modules are words-n-numbers, stopword, ngraminator and eklem-headline-parser'
18
19  // extracting word arrays
20  let headlineArray = extract(headlineString, {regex: [words, numbers], toLowercase: true})
21  let bodyArray = extract(bodyString, {regex: [words, numbers], toLowercase: true})
22  console.log('Word arrays: ')
23  console.dir(headlineArray)
24  console.dir(bodyArray)
25
26  // removing stopwords
27  let headlineStopped = removeStopwords(headlineArray)
28  let bodyStopped = removeStopwords(bodyArray)
29  console.log('Stopword removed arrays: ')
30  console.dir(headlineStopped)
31  console.dir(bodyStopped)
32
33  // n-grams
34  let headlineNgrams = ngraminator(headlineStopped, [2,3,4])
35  let bodyNgrams = ngraminator(bodyStopped, [2,3,4])
36  console.log('Ngram arrays: ')
37  console.dir(headlineNgrams)
38  console.dir(bodyNgrams)
39
40  // calculating important keywords
41  let keywords = findKeywords(headlineStopped, bodyStopped, 5)
42  console.log('Keyword array: ')
43  console.dir(keywords)
44</script>

Example - Query side

1<script src="https://cdn.jsdelivr.net/npm/daq-proc/dist/daq-proc.umd.min.js"></script>
2
3<script>
4  // exposing the underlying libraries in a transparent way
5  const {
6    highlight,
7    levenMatch
8  } = dqp
9
10  const query = ['interesting', 'words']
11  const searchResult = ['some', 'interesting', 'words', 'to', 'remember']
12
13  highlight(query, searchResult)
14  // returns:
15  // 'some <span class="highlighted">interesting words</span> to remember'
16
17  const index = ['return', 'all', 'word', 'matches', 'between', 'two', 'arrays', 'within', 'given', 'levenshtein', 'distance', 'intended', 'use', 'is', 'to', 'words', 'in', 'a', 'query', 'that', 'has', 'an', 'index', 'good', 'for', 'autocomplete', 'type', 'functionality,', 'and', 'some', 'cases', 'also', 'searching']
18  const query = ['qvery', 'words', 'levensthein']
19
20  levenMatch(query, index, {distance: 2})
21  // returns:
22  //[ [ 'query' ], [ 'word', 'words' ], [ 'levenshtein' ] ]
23</script>

Node.js

It's fully possible to use on Node.js too. The tests are both for Node.js and the browser. It's only wrapping 6 libraries for the ease of use in the browser, but could come in handy for i.e. simple crawler scenarios.

Something missing?

Create an issue so we can discuss =).

No vulnerabilities found.

No security vulnerabilities found.