Gathering detailed insights and metrics for ts-content-based-recommender
Gathering detailed insights and metrics for ts-content-based-recommender
Gathering detailed insights and metrics for ts-content-based-recommender
Gathering detailed insights and metrics for ts-content-based-recommender
npm install ts-content-based-recommender
Typescript
Module System
Min. Node Version
Node Version
NPM Version
Cumulative downloads
Total Downloads
Last Day
0%
NaN
Compared to previous day
Last Week
0%
NaN
Compared to previous week
Last Month
0%
NaN
Compared to previous month
Last Year
0%
NaN
Compared to previous year
1
This is a TypeScript-based content-based recommender with enhanced multilingual support, forked from stanleyfok/content-based-recommender.
This package is forked from stanleyfok/content-based-recommender by Stanley Fok.
ProcessingPipelineFactory
for easy component creationtest/
directory with improved coveragetrainBidirectional(collectionA, collectionB)
to allow recommendations between
two different datasetsUpgrade dependencies to fix security alerts
Introduce the use of unigram, bigrams and trigrams when constructing the word vector
Simplify the implementation by not using sorted set data structure to store the similar documents data. Also support the maxSimilarDocuments and minScore options to save memory used by the recommender.
Update to newer version of vector-object
npm install ts-content-based-recommender
And then import the ContentBasedRecommender class
1const ContentBasedRecommender = require('ts-content-based-recommender')
For TypeScript projects:
1import ContentBasedRecommender from 'ts-content-based-recommender' 2// or import individual components 3import { 4 ProcessingPipelineFactory, 5 EnglishTokenizer, 6 JapaneseTokenizer, 7 EnglishTokenFilter, 8 JapaneseTokenFilter 9} from 'ts-content-based-recommender'
This is a content-based recommender implemented in TypeScript to illustrate the concept of content-based recommendation. Content-based recommender is a popular recommendation technique to show similar items to users, especially useful to websites for e-commerce, news content, etc.
After the recommender is trained by an array of documents, it can tell the list of documents which are more similar to the input document.
The training process involves 3 main steps:
Special thanks to the library natural helps a lot by providing a lot of NLP functionalities, such as tf-idf and word stemming.
⚠️ Note:
I haven't tested how this recommender is performing with a large dataset. I will share more results after some more testing.
1import ContentBasedRecommender from 'ts-content-based-recommender' 2 3const recommender = new ContentBasedRecommender({ 4 minScore: 0.1, 5 maxSimilarDocuments: 100 6}); 7 8// prepare documents data 9const documents = [ 10 { id: '1000001', content: 'Why studying javascript is fun?' }, 11 { id: '1000002', content: 'The trend for javascript in machine learning' }, 12 { id: '1000003', content: 'The most insightful stories about JavaScript' }, 13 { id: '1000004', content: 'Introduction to Machine Learning' }, 14 { id: '1000005', content: 'Machine learning and its application' }, 15 { id: '1000006', content: 'Python vs Javascript, which is better?' }, 16 { id: '1000007', content: 'How Python saved my life?' }, 17 { id: '1000008', content: 'The future of Bitcoin technology' }, 18 { id: '1000009', content: 'Is it possible to use javascript for machine learning?' } 19]; 20 21// start training (now async) 22await recommender.train(documents); 23 24//get top 10 similar items to document 1000002 25const similarDocuments = recommender.getSimilarDocuments('1000002', 0, 10); 26 27console.log(similarDocuments); 28/* 29 the higher the score, the more similar the item is 30 documents with score < 0.1 are filtered because options minScore is set to 0.1 31 [ 32 { id: '1000004', score: 0.5114304586412038 }, 33 { id: '1000009', score: 0.45056313558918837 }, 34 { id: '1000005', score: 0.37039308109283564 }, 35 { id: '1000003', score: 0.10896767690747626 } 36 ] 37*/
This example shows how to automatically match posts with related tags
1import ContentBasedRecommender from 'ts-content-based-recommender' 2 3const posts = [ 4 { 5 id: '1000001', 6 content: 'Why studying javascript is fun?', 7 }, 8 { 9 id: '1000002', 10 content: 'The trend for javascript in machine learning', 11 }, 12 { 13 id: '1000003', 14 content: 'The most insightful stories about JavaScript', 15 }, 16 { 17 id: '1000004', 18 content: 'Introduction to Machine Learning', 19 }, 20 { 21 id: '1000005', 22 content: 'Machine learning and its application', 23 }, 24 { 25 id: '1000006', 26 content: 'Python vs Javascript, which is better?', 27 }, 28 { 29 id: '1000007', 30 content: 'How Python saved my life?', 31 }, 32 { 33 id: '1000008', 34 content: 'The future of Bitcoin technology', 35 }, 36 { 37 id: '1000009', 38 content: 'Is it possible to use javascript for machine learning?', 39 }, 40]; 41 42const tags = [ 43 { 44 id: '1', 45 content: 'Javascript', 46 }, 47 { 48 id: '2', 49 content: 'machine learning', 50 }, 51 { 52 id: '3', 53 content: 'application', 54 }, 55 { 56 id: '4', 57 content: 'introduction', 58 }, 59 { 60 id: '5', 61 content: 'future', 62 }, 63 { 64 id: '6', 65 content: 'Python', 66 }, 67 { 68 id: '7', 69 content: 'Bitcoin', 70 }, 71 ]; 72 73const tagMap = tags.reduce((acc, tag) => { 74 acc[tag.id] = tag; 75 return acc; 76}, {}); 77 78const recommender = new ContentBasedRecommender(); 79 80// Training is now async 81await recommender.trainBidirectional(posts, tags); 82 83for (const post of posts) { 84 const relatedTags = recommender.getSimilarDocuments(post.id); 85 const tagNames = relatedTags.map(t => tagMap[t.id].content); 86 console.log(post.content, 'related tags:', tagNames); 87} 88 89 90/* 91Why studying javascript is fun? related tags: [ 'Javascript' ] 92The trend for javascript in machine learning related tags: [ 'machine learning', 'Javascript' ] 93The most insightful stories about JavaScript related tags: [ 'Javascript' ] 94Introduction to Machine Learning related tags: [ 'machine learning', 'introduction' ] 95Machine learning and its application related tags: [ 'machine learning', 'application' ] 96Python vs Javascript, which is better? related tags: [ 'Python', 'Javascript' ] 97How Python saved my life? related tags: [ 'Python' ] 98The future of Bitcoin technology related tags: [ 'future', 'Bitcoin' ] 99Is it possible to use javascript for machine learning? related tags: [ 'machine learning', 'Javascript' ] 100*/ 101
1import ContentBasedRecommender from 'ts-content-based-recommender' 2 3const recommender = new ContentBasedRecommender({ 4 language: 'ja', // 日本語サポートを有効化 5 minScore: 0.1, 6 maxSimilarDocuments: 100 7}); 8 9// 日本語文書データの準備 10const japaneseDocuments = [ 11 { id: '1', content: 'JavaScriptプログラミングは楽しいです。フロントエンドの開発に最適です。' }, 12 { id: '2', content: 'プログラミング言語の比較検討。PythonとJavaScriptの違いについて。' }, 13 { id: '3', content: '機械学習の基礎知識。データサイエンスへの応用。' }, 14 { id: '4', content: 'ウェブ開発のベストプラクティス。モダンなJavaScript技術。' }, 15 { id: '5', content: 'データ分析とビジュアライゼーション。統計学の活用。' } 16]; 17 18// 学習開始(非同期処理) 19await recommender.train(japaneseDocuments); 20 21// 文書IDが'1'に類似した上位5件を取得 22const similarDocuments = recommender.getSimilarDocuments('1', 0, 5); 23 24console.log(similarDocuments); 25/* 26 日本語の形態素解析により、より精密な類似度計算が可能 27 [ 28 { id: '4', score: 0.45123456789 }, 29 { id: '2', score: 0.32456789012 } 30 ] 31*/ 32
The library now provides modular components that can be used independently:
1import { 2 ProcessingPipelineFactory, 3 EnglishTokenizer, 4 JapaneseTokenizer, 5 EnglishTokenFilter, 6 JapaneseTokenFilter 7} from 'ts-content-based-recommender' 8 9// Using factory pattern to create processing pipelines 10const englishPipeline = ProcessingPipelineFactory.createPipeline('en', { 11 minTokenLength: 2, 12 removeStopwords: true, 13 customStopWords: ['custom', 'words'] 14}); 15 16const japanesePipeline = ProcessingPipelineFactory.createPipeline('ja', { 17 allowedPos: ['名詞', '動詞', '形容詞'], // part-of-speech filtering 18 minTokenLength: 1 19}); 20 21// Using tokenizers directly 22const englishTokenizer = ProcessingPipelineFactory.createTokenizer('en'); 23const japaneseTokenizer = ProcessingPipelineFactory.createTokenizer('ja'); 24 25const englishTokens = await englishTokenizer.tokenize('machine learning algorithm'); 26const japaneseTokens = await japaneseTokenizer.tokenize('機械学習アルゴリズム'); 27 28// Using filters directly 29const englishFilter = new EnglishTokenFilter({ 30 removeDuplicates: true, 31 removeStopwords: true, 32 minTokenLength: 2 33}); 34 35const japaneseFilter = new JapaneseTokenFilter({ 36 allowedPos: ['名詞', '動詞'], 37 removeDuplicates: false 38}); 39 40const filteredEnglishTokens = englishFilter.filter(englishTokens); 41const filteredJapaneseTokens = japaneseFilter.filter(japaneseTokens);
1import ContentBasedRecommender from 'ts-content-based-recommender' 2 3// Example with advanced token filtering options 4const recommender = new ContentBasedRecommender({ 5 language: 'ja', 6 minScore: 0.1, 7 maxSimilarDocuments: 50, 8 tokenFilterOptions: { 9 removeDuplicates: false, // Keep duplicate tokens for frequency analysis 10 removeStopwords: true, // Remove Japanese stopwords 11 minTokenLength: 2, // Exclude tokens shorter than 2 characters 12 allowedPos: ['名詞', '動詞'], // Only extract nouns and verbs 13 customStopWords: ['です', 'ます'] // Additional custom stopwords 14 } 15}); 16 17const documents = [ 18 { id: '1', content: 'JavaScriptプログラミングはとても楽しいです' }, 19 { id: '2', content: 'Pythonによる機械学習の勉強をします' }, 20 { id: '3', content: 'ウェブ開発の最新技術トレンド' } 21]; 22 23await recommender.train(documents); 24const similar = recommender.getSimilarDocuments('1');
The main class for content-based recommendations.
To create the recommender instance
Supported options:
To tell the recommender about your documents and then it will start training itself.
Note: This method is now asynchronous and returns a Promise. Use await
or .then()
to handle the async operation.
Works like the normal train function, but it creates recommendations between two different collections instead of within one collection.
Note: This method is now asynchronous and returns a Promise. Use await
or .then()
to handle the async operation.
To get an array of similar items with document id
It returns an array of objects, with fields id and score (ranging from 0 to 1)
To export the recommender as json object.
1const recommender = new ContentBasedRecommender(); 2await recommender.train(documents); 3 4const object = recommender.export(); 5//can save the object to disk, database or otherwise
To update the recommender by importing from a json object, exported by the export() method
1const recommender = new ContentBasedRecommender(); 2recommender.import(object); // object can be loaded from disk, database or otherwise
Factory class for creating processing pipelines and individual components.
Creates a complete processing pipeline with tokenizer and filter.
Creates a tokenizer for the specified language.
Creates an English-specific processing pipeline.
Creates a Japanese-specific processing pipeline.
Tokenizes English text with stemming and N-gram support.
1const tokenizer = new EnglishTokenizer(); 2const tokens = await tokenizer.tokenize('machine learning algorithm');
Tokenizes Japanese text using kuromoji morphological analyzer.
1const tokenizer = new JapaneseTokenizer(); 2const tokens = await tokenizer.tokenize('機械学習アルゴリズム'); 3const detailedTokens = await tokenizer.getDetailedTokens('機械学習アルゴリズム');
Filters English tokens with stopword removal, N-gram support, and more.
1const filter = new EnglishTokenFilter({ 2 removeDuplicates: true, 3 removeStopwords: true, 4 minTokenLength: 2, 5 customStopWords: ['custom', 'words'] 6}); 7const filtered = filter.filter(tokens); 8const ngramFiltered = filter.filterWithNgrams(tokens);
Filters Japanese tokens with part-of-speech filtering and Japanese-specific processing.
1const filter = new JapaneseTokenFilter({ 2 allowedPos: ['名詞', '動詞', '形容詞'], 3 removeDuplicates: true, 4 removeStopwords: true, 5 minTokenLength: 1 6}); 7const filtered = filter.filter(tokens); 8const posFiltered = filter.filterWithPos(detailedTokens);
Common filter options for both English and Japanese:
Japanese-specific options:
├── src/ # Source code
│ ├── lib/ # Main library code
│ │ ├── tokenizers/ # Tokenizer implementations
│ │ │ ├── EnglishTokenizer.ts
│ │ │ └── JapaneseTokenizer.ts
│ │ ├── filters/ # Token filter implementations
│ │ │ ├── EnglishTokenFilter.ts
│ │ │ └── JapaneseTokenFilter.ts
│ │ ├── factories/ # Factory classes
│ │ │ └── ProcessingPipelineFactory.ts
│ │ ├── ContentBasedRecommender.ts # Main recommender class
│ │ └── index.ts # Library exports
│ ├── types/ # TypeScript type definitions
│ │ └── index.ts
│ └── index.ts # Main export file
├── test/ # Test files
│ ├── tokenizers/ # Tokenizer tests
│ ├── filters/ # Filter tests
│ ├── factories/ # Factory tests
│ └── *.ts # Integration and main tests
├── fixtures/ # Test data
│ ├── sample-documents.ts
│ ├── sample-document-tags.ts
│ ├── sample-target-documents.ts
│ └── sample-japanese-documents.ts
├── example/ # Usage examples
│ └── example.ts
├── index.ts # Package entry point
├── tsconfig.json # TypeScript configuration
└── eslint.config.js # ESLint configuration
The test suite includes comprehensive unit tests and integration tests for all components:
1# Install dependencies 2npm install 3 4# Run all tests 5npm test 6 7# Run specific test categories 8npm test -- --grep "EnglishTokenizer" 9npm test -- --grep "JapaneseTokenizer" 10npm test -- --grep "EnglishTokenFilter" 11npm test -- --grep "JapaneseTokenFilter" 12npm test -- --grep "ProcessingPipelineFactory" 13npm test -- --grep "ContentBasedRecommender" 14 15# Run example 16npm run example 17 18# Run development mode with ts-node 19npm run dev
1# Build TypeScript 2npm run build 3 4# Run linting 5npm run lint 6 7# Fix linting issues 8npm run lint:fix
This package is based on the original work by Stanley Fok. For historical changes before the fork, see: https://github.com/stanleyfok/content-based-recommender
No vulnerabilities found.
No security vulnerabilities found.