Installations
npm install @eastuni/lunr-languages-ko
Releases
Unable to fetch releases
Developer
eastuni
Developer Guide
Module System
CommonJS
Min. Node Version
Typescript Support
No
Node Version
14.16.1
NPM Version
6.14.12
Statistics
91 Commits
3 Branches
1 Contributors
Updated on 30 Apr 2021
Languages
JavaScript (100%)
Total Downloads
Cumulative downloads
Total Downloads
4,302
Last day
-91.7%
1
Compared to previous day
Last week
-12.5%
21
Compared to previous week
Last month
37.1%
48
Compared to previous month
Last year
-55.1%
610
Compared to previous year
Daily Downloads
Weekly Downloads
Monthly Downloads
Yearly Downloads
Dev Dependencies
4
Lunr Languages
Lunr Languages is a Lunr addon that helps you search in documents written in the following languages:
- German
- French
- Spanish
- Italian
- Japanese
- Dutch
- Danish
- Portuguese
- Finnish
- Romanian
- Hungarian
- Russian
- Norwegian
- Thai
- Vietnamese
- Arabic
- Contribute with a new language
Lunr Languages is compatible with Lunr version 0.6
, 0.7
, 1.0
and 2.X
.
How to use
Lunr-languages works well with script loaders (Webpack, requirejs) and can be used in the browser and on the server.
In a web browser
The following example is for the German language (de).
Add the following JS files to the page:
1<script src="lunr.js"></script> <!-- lunr.js library --> 2<script src="lunr.stemmer.support.js"></script> 3<script src="lunr.de.js"></script> <!-- or any other language you want -->
then, use the language in when initializing lunr:
1var idx = lunr(function () { 2 // use the language (de) 3 this.use(lunr.de); 4 // then, the normal lunr index initialization 5 this.field('title', { boost: 10 }); 6 this.field('body'); 7 // now you can call this.add(...) to add documents written in German 8});
That's it. Just add the documents and you're done. When searching, the language stemmer and stopwords list will be the one you used.
In a web browser, with RequireJS
Add require.js
to the page:
1<script src="lib/require.js"></script>
then, use the language in when initializing lunr:
1require(['lib/lunr.js', '../lunr.stemmer.support.js', '../lunr.de.js'], function(lunr, stemmerSupport, de) { 2 // since the stemmerSupport and de add keys on the lunr object, we'll pass it as reference to them 3 // in the end, we will only need lunr. 4 stemmerSupport(lunr); // adds lunr.stemmerSupport 5 de(lunr); // adds lunr.de key 6 7 // at this point, lunr can be used 8 var idx = lunr(function () { 9 // use the language (de) 10 this.use(lunr.de); 11 // then, the normal lunr index initialization 12 this.field('title', { boost: 10 }) 13 this.field('body') 14 // now you can call this.add(...) to add documents written in German 15 }); 16});
With node.js
1var lunr = require('./lib/lunr.js'); 2require('./lunr.stemmer.support.js')(lunr); 3require('./lunr.de.js')(lunr); // or any other language you want 4 5var idx = lunr(function () { 6 // use the language (de) 7 this.use(lunr.de); 8 // then, the normal lunr index initialization 9 this.field('title', { boost: 10 }) 10 this.field('body') 11 // now you can call this.add(...) to add documents written in German 12});
Indexing multi-language content
If your documents are written in more than one language, you can enable multi-language indexing. This ensures every word is properly trimmed and stemmed, every stopword is removed, and no words are lost (indexing in just one language would remove words from every other one.)
1var lunr = require('./lib/lunr.js'); 2require('./lunr.stemmer.support.js')(lunr); 3require('./lunr.ru.js')(lunr); 4require('./lunr.multi.js')(lunr); 5 6var idx = lunr(function () { 7 // the reason "en" does not appear above is that "en" is built in into lunr js 8 this.use(lunr.multiLanguage('en', 'ru')); 9 // then, the normal lunr index initialization 10 // ... 11});
You can combine any number of supported languages this way. The corresponding lunr language scripts must be loaded (English is built in).
If you serialize the index and load it in another script, you'll have to initialize the multi-language support in that script, too, like this:
1lunr.multiLanguage('en', 'ru'); 2var idx = lunr.Index.load(serializedIndex);
How to add a new language
Check the Contributing section
How does Lunr Languages work?
Searching inside documents is not as straight forward as using indexOf()
, since there are many things to consider in order to get quality search results:
- Tokenization
- Given a string like "Hope you like using Lunr Languages!", the tokenizer would split it into individual words, becoming an array like
['Hope', 'you', 'like', 'using', 'Lunr', 'Languages!']
- Though it seems a trivial task for Latin characters (just splitting by the space), it gets more complicated for languages like Japanese. Lunr Languages has this included for the Japanese language.
- Given a string like "Hope you like using Lunr Languages!", the tokenizer would split it into individual words, becoming an array like
- Trimming
- After tokenization, trimming ensures that the words contain just what is needed in them. In our example above, the trimmer would convert
Languages!
intoLanguages
- So, the trimmer basically removes special characters that do not add value for the search purpose.
- After tokenization, trimming ensures that the words contain just what is needed in them. In our example above, the trimmer would convert
- Stemming
- What happens if our text contains the word
consignment
but we want to search forconsigned
? It should find it, since its meaning is the same, only the form is different. - A stemmer extracts the root of words that can have many forms and stores it in the index. Then, any search is also stemmed and searched in the index.
- Lunr Languages does stemming for all the included languages, so you can capture all the forms of words in your documents.
- What happens if our text contains the word
- Stop words
- There's no point in adding or searching words like
the
,it
,so
, etc. These words are called Stop words - Stop words are removed so your index will only contain meaningful words.
- Lunr Languages includes stop words for all the included languages.
- There's no point in adding or searching words like
Technical details & Credits
I've created this project by compiling and wrapping stemmers toghether with stop words from various sources so they can be directly used with all the current versions of Lunr.
- https://github.com/fortnightlabs/snowball-js (the stemmers for all languages, ported from snowball-js)
- https://github.com/brenes/stopwords-filter (the stop words list for the other languages)
- http://chasen.org/~taku/software/TinySegmenter/ (the tinyseg Tiny Segmente Japanese tokenizer)
I am providing code in the repository to you under an open source license. Because this is my personal repository, the license you receive to my code is from me and not my employer (Facebook)
No vulnerabilities found.
Reason
no binaries found in the repo
Reason
license file detected
Details
- Info: project has a license file: LICENSE:0
- Warn: project license file does not contain an FSF or OSI license.
Reason
6 existing vulnerabilities detected
Details
- Warn: Project is vulnerable to: GHSA-93q8-gq69-wqmw
- Warn: Project is vulnerable to: GHSA-gxpj-cx7g-858c
- Warn: Project is vulnerable to: GHSA-2j2x-2gpw-g8fm
- Warn: Project is vulnerable to: GHSA-f8q6-p94x-37v3
- Warn: Project is vulnerable to: GHSA-xvch-5gv4-984h
- Warn: Project is vulnerable to: GHSA-c2qf-rxjj-qqgw
Reason
Found 0/30 approved changesets -- score normalized to 0
Reason
0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0
Reason
no SAST tool detected
Details
- Warn: no pull requests merged into dev branch
Reason
no effort to earn an OpenSSF best practices badge detected
Reason
security policy file not detected
Details
- Warn: no security policy file detected
- Warn: no security file to analyze
- Warn: no security file to analyze
- Warn: no security file to analyze
Reason
project is not fuzzed
Details
- Warn: no fuzzer integrations found
Reason
branch protection not enabled on development/release branches
Details
- Warn: branch protection not enabled for branch 'master'
Score
2.2
/10
Last Scanned on 2024-11-25
The Open Source Security Foundation is a cross-industry collaboration to improve the security of open source software (OSS). The Scorecard provides security health metrics for open source projects.
Learn More