npmpackage.info

Gathering detailed insights and metrics for ts-content-based-recommender

Other packages similar to ts-content-based-recommender

content-based-recommender-ts

2.1.0

A simple content-based recommender implemented in javascript

Gathering detailed insights and metrics for ts-content-based-recommender

ts-content-based-recommender - 1.6.2 | npmpackage.info

ts-content-based-recommender

1.6.2

204.63 kB

Installations

npm install ts-content-based-recommender

Developer Guide

BETA

Typescript

Yes

Module System

ESM

Min. Node Version

>=18.0.0

Node Version

20.19.2

NPM Version

10.8.2 Releases

Unable to fetch releases

Download Statistics

Total Downloads

Last Day

Last Week

Last Month

Last Year

Maintainers

Package Meta Information

Latest Version

1.6.2

Package Id

ts-content-based-recommender@1.6.2

Unpacked Size

204.63 kB

Size

38.45 kB

File Count

NPM Version

10.8.2

Node Version

20.19.2

Published on

Jul 01, 2025

Total Downloads

Cumulative downloads

Total Downloads

NaN

Last Day

NaN

Compared to previous day

Last Week

NaN

Compared to previous week

Last Month

NaN

Compared to previous month

Last Year

NaN

Compared to previous year

Weekly Downloads

Monthly Downloads

Yearly Downloads

TypeScript Content Based Recommender

This is a TypeScript-based content-based recommender with enhanced multilingual support, forked from stanleyfok/content-based-recommender.

Credits

This package is forked from stanleyfok/content-based-recommender by Stanley Fok.

Original Author

Stanley Fok - Original implementation and concept

Enhancements in this fork

Full TypeScript support with comprehensive type definitions
Japanese language support using kuromoji morphological analyzer
Enhanced multilingual text processing capabilities
Improved testing coverage with better error handling
Updated dependencies and modern build system with ESLint v9
Performance optimizations in similarity calculations
Modular architecture with separated tokenizers and filters
Factory pattern implementation for easy component creation

What's New

Latest Version

Modular Architecture: Separated tokenizers and filters into independent classes
Factory Pattern: Introduced ProcessingPipelineFactory for easy component creation
Enhanced Testing: Moved all tests to test/ directory with improved coverage
Improved Japanese Support: Advanced morphological analysis with part-of-speech filtering
Better TypeScript Support: Comprehensive type definitions for all components

1.5.0

Added trainBidirectional(collectionA, collectionB) to allow recommendations between two different datasets

1.4.0

Upgrade dependencies to fix security alerts

1.3.0

Introduce the use of unigram, bigrams and trigrams when constructing the word vector

1.2.0

Simplify the implementation by not using sorted set data structure to store the similar documents data. Also support the maxSimilarDocuments and minScore options to save memory used by the recommender.

1.1.0

Update to newer version of vector-object

Installation

npm install ts-content-based-recommender

And then import the ContentBasedRecommender class

1const ContentBasedRecommender = require('ts-content-based-recommender')

For TypeScript projects:

1import ContentBasedRecommender from 'ts-content-based-recommender'
2// or import individual components
3import {
4  ProcessingPipelineFactory,
5  EnglishTokenizer,
6  JapaneseTokenizer,
7  EnglishTokenFilter,
8  JapaneseTokenFilter
9} from 'ts-content-based-recommender'

Overview

This is a content-based recommender implemented in TypeScript to illustrate the concept of content-based recommendation. Content-based recommender is a popular recommendation technique to show similar items to users, especially useful to websites for e-commerce, news content, etc.

After the recommender is trained by an array of documents, it can tell the list of documents which are more similar to the input document.

The training process involves 3 main steps:

content pre-processing, such as html tag stripping, stopwords removal and stemming
document vectors formation using tf-idf
find the cosine similarity scores between all document vectors

Special thanks to the library natural helps a lot by providing a lot of NLP functionalities, such as tf-idf and word stemming.

⚠️ Note:

I haven't tested how this recommender is performing with a large dataset. I will share more results after some more testing.

Language Support

English

Tokenization using natural.WordTokenizer
Porter Stemmer for word stemming
Stopword removal
N-gram support (unigram, bigram, trigram)

Japanese

Morphological analysis using kuromoji
Part-of-speech filtering (nouns, verbs, adjectives)
Japanese-specific text processing

Usage

Single collection

1import ContentBasedRecommender from 'ts-content-based-recommender'
2
3const recommender = new ContentBasedRecommender({
4  minScore: 0.1,
5  maxSimilarDocuments: 100
6});
7
8// prepare documents data
9const documents = [
10  { id: '1000001', content: 'Why studying javascript is fun?' },
11  { id: '1000002', content: 'The trend for javascript in machine learning' },
12  { id: '1000003', content: 'The most insightful stories about JavaScript' },
13  { id: '1000004', content: 'Introduction to Machine Learning' },
14  { id: '1000005', content: 'Machine learning and its application' },
15  { id: '1000006', content: 'Python vs Javascript, which is better?' },
16  { id: '1000007', content: 'How Python saved my life?' },
17  { id: '1000008', content: 'The future of Bitcoin technology' },
18  { id: '1000009', content: 'Is it possible to use javascript for machine learning?' }
19];
20
21// start training (now async)
22await recommender.train(documents);
23
24//get top 10 similar items to document 1000002
25const similarDocuments = recommender.getSimilarDocuments('1000002', 0, 10);
26
27console.log(similarDocuments);
28/*
29  the higher the score, the more similar the item is
30  documents with score < 0.1 are filtered because options minScore is set to 0.1
31  [
32    { id: '1000004', score: 0.5114304586412038 },
33    { id: '1000009', score: 0.45056313558918837 },
34    { id: '1000005', score: 0.37039308109283564 },
35    { id: '1000003', score: 0.10896767690747626 }
36  ]
37*/

Multi collection

This example shows how to automatically match posts with related tags

1import ContentBasedRecommender from 'ts-content-based-recommender'
2
3const posts = [
4  {
5    id: '1000001',
6    content: 'Why studying javascript is fun?',
7  },
8  {
9    id: '1000002',
10    content: 'The trend for javascript in machine learning',
11  },
12  {
13    id: '1000003',
14    content: 'The most insightful stories about JavaScript',
15  },
16  {
17    id: '1000004',
18    content: 'Introduction to Machine Learning',
19  },
20  {
21    id: '1000005',
22    content: 'Machine learning and its application',
23  },
24  {
25    id: '1000006',
26    content: 'Python vs Javascript, which is better?',
27  },
28  {
29    id: '1000007',
30    content: 'How Python saved my life?',
31  },
32  {
33    id: '1000008',
34    content: 'The future of Bitcoin technology',
35  },
36  {
37    id: '1000009',
38    content: 'Is it possible to use javascript for machine learning?',
39  },
40];
41
42const tags = [
43               {
44                 id: '1',
45                 content: 'Javascript',
46               },
47               {
48                 id: '2',
49                 content: 'machine learning',
50               },
51               {
52                 id: '3',
53                 content: 'application',
54               },
55               {
56                 id: '4',
57                 content: 'introduction',
58               },
59               {
60                 id: '5',
61                 content: 'future',
62               },
63               {
64                 id: '6',
65                 content: 'Python',
66               },
67               {
68                 id: '7',
69                 content: 'Bitcoin',
70               },
71             ];
72
73const tagMap = tags.reduce((acc, tag) => {
74  acc[tag.id] = tag;
75  return acc;
76}, {});
77
78const recommender = new ContentBasedRecommender();
79
80// Training is now async
81await recommender.trainBidirectional(posts, tags);
82
83for (const post of posts) {
84  const relatedTags = recommender.getSimilarDocuments(post.id);
85  const tagNames = relatedTags.map(t => tagMap[t.id].content);
86  console.log(post.content, 'related tags:', tagNames);
87}
88
89
90/*
91Why studying javascript is fun? related tags: [ 'Javascript' ]
92The trend for javascript in machine learning related tags: [ 'machine learning', 'Javascript' ]
93The most insightful stories about JavaScript related tags: [ 'Javascript' ]
94Introduction to Machine Learning related tags: [ 'machine learning', 'introduction' ]
95Machine learning and its application related tags: [ 'machine learning', 'application' ]
96Python vs Javascript, which is better? related tags: [ 'Python', 'Javascript' ]
97How Python saved my life? related tags: [ 'Python' ]
98The future of Bitcoin technology related tags: [ 'future', 'Bitcoin' ]
99Is it possible to use javascript for machine learning? related tags: [ 'machine learning', 'Javascript' ]
100*/
101

Japanese Language Example

1import ContentBasedRecommender from 'ts-content-based-recommender'
2
3const recommender = new ContentBasedRecommender({
4  language: 'ja', // 日本語サポートを有効化
5  minScore: 0.1,
6  maxSimilarDocuments: 100
7});
8
9// 日本語文書データの準備
10const japaneseDocuments = [
11  { id: '1', content: 'JavaScriptプログラミングは楽しいです。フロントエンドの開発に最適です。' },
12  { id: '2', content: 'プログラミング言語の比較検討。PythonとJavaScriptの違いについて。' },
13  { id: '3', content: '機械学習の基礎知識。データサイエンスへの応用。' },
14  { id: '4', content: 'ウェブ開発のベストプラクティス。モダンなJavaScript技術。' },
15  { id: '5', content: 'データ分析とビジュアライゼーション。統計学の活用。' }
16];
17
18// 学習開始（非同期処理）
19await recommender.train(japaneseDocuments);
20
21// 文書IDが'1'に類似した上位5件を取得
22const similarDocuments = recommender.getSimilarDocuments('1', 0, 5);
23
24console.log(similarDocuments);
25/*
26  日本語の形態素解析により、より精密な類似度計算が可能
27  [
28    { id: '4', score: 0.45123456789 },
29    { id: '2', score: 0.32456789012 }
30  ]
31*/
32

Using Individual Components

The library now provides modular components that can be used independently:

1import {
2  ProcessingPipelineFactory,
3  EnglishTokenizer,
4  JapaneseTokenizer,
5  EnglishTokenFilter,
6  JapaneseTokenFilter
7} from 'ts-content-based-recommender'
8
9// Using factory pattern to create processing pipelines
10const englishPipeline = ProcessingPipelineFactory.createPipeline('en', {
11  minTokenLength: 2,
12  removeStopwords: true,
13  customStopWords: ['custom', 'words']
14});
15
16const japanesePipeline = ProcessingPipelineFactory.createPipeline('ja', {
17  allowedPos: ['名詞', '動詞', '形容詞'],  // part-of-speech filtering
18  minTokenLength: 1
19});
20
21// Using tokenizers directly
22const englishTokenizer = ProcessingPipelineFactory.createTokenizer('en');
23const japaneseTokenizer = ProcessingPipelineFactory.createTokenizer('ja');
24
25const englishTokens = await englishTokenizer.tokenize('machine learning algorithm');
26const japaneseTokens = await japaneseTokenizer.tokenize('機械学習アルゴリズム');
27
28// Using filters directly
29const englishFilter = new EnglishTokenFilter({
30  removeDuplicates: true,
31  removeStopwords: true,
32  minTokenLength: 2
33});
34
35const japaneseFilter = new JapaneseTokenFilter({
36  allowedPos: ['名詞', '動詞'],
37  removeDuplicates: false
38});
39
40const filteredEnglishTokens = englishFilter.filter(englishTokens);
41const filteredJapaneseTokens = japaneseFilter.filter(japaneseTokens);

Advanced Configuration Example

1import ContentBasedRecommender from 'ts-content-based-recommender'
2
3// Example with advanced token filtering options
4const recommender = new ContentBasedRecommender({
5  language: 'ja',
6  minScore: 0.1,
7  maxSimilarDocuments: 50,
8  tokenFilterOptions: {
9    removeDuplicates: false,           // Keep duplicate tokens for frequency analysis
10    removeStopwords: true,             // Remove Japanese stopwords
11    minTokenLength: 2,                 // Exclude tokens shorter than 2 characters
12    allowedPos: ['名詞', '動詞'],       // Only extract nouns and verbs
13    customStopWords: ['です', 'ます']   // Additional custom stopwords
14  }
15});
16
17const documents = [
18  { id: '1', content: 'JavaScriptプログラミングはとても楽しいです' },
19  { id: '2', content: 'Pythonによる機械学習の勉強をします' },
20  { id: '3', content: 'ウェブ開発の最新技術トレンド' }
21];
22
23await recommender.train(documents);
24const similar = recommender.getSimilarDocuments('1');

API Reference

ContentBasedRecommender

The main class for content-based recommendations.

constructor([options])

To create the recommender instance

options (optional): an object to configure the recommender

Supported options:

language - the language to use for text processing. Supported values: 'en'（English）, 'ja'（Japanese）. Default is 'en'.
maxVectorSize - to control the max size of word vector after tf-idf processing. A smaller vector size will help training performance while not affecting recommendation quality. Defaults to be 100.
minScore - the minimum score required to meet to consider it is a similar document. It will save more memory by filtering out documents having low scores. Allowed values range from 0 to 1. Default is 0.
maxSimilarDocuments - the maximum number of similar documents to keep for each document. Default is the max safe integer in javascript.
debug - show progress messages so can monitor the training progress
tokenFilterOptions - advanced filtering options for token processing:
- removeDuplicates - remove duplicate tokens（default: true）
- removeStopwords - remove stopwords（default: true）
- customStopWords - additional custom stopwords（default: empty array）
- minTokenLength - minimum token length（default: 1）
- allowedPos - for Japanese: allowed part-of-speech tags（default: 名詞、動詞、形容詞）

train(documents)

To tell the recommender about your documents and then it will start training itself.

documents - an array of object, with fields id and content

Note: This method is now asynchronous and returns a Promise. Use await or .then() to handle the async operation.

trainBidirectional(collectionA, collectionB)

Works like the normal train function, but it creates recommendations between two different collections instead of within one collection.

Note: This method is now asynchronous and returns a Promise. Use await or .then() to handle the async operation.

getSimilarDocuments(id, [start], [size])

To get an array of similar items with document id

id - the id of the document
start - the start index, inclusive. Default to be 0
size - the max number of similar documents to obtain. If it is omitted, the whole list after start index will be returned

It returns an array of objects, with fields id and score (ranging from 0 to 1)

export()

To export the recommender as json object.

1const recommender = new ContentBasedRecommender();
2await recommender.train(documents);
3
4const object = recommender.export();
5//can save the object to disk, database or otherwise

import(object)

To update the recommender by importing from a json object, exported by the export() method

1const recommender = new ContentBasedRecommender();
2recommender.import(object); // object can be loaded from disk, database or otherwise

ProcessingPipelineFactory

Factory class for creating processing pipelines and individual components.

ProcessingPipelineFactory.createPipeline(language, options)

Creates a complete processing pipeline with tokenizer and filter.

language - 'en' for English or 'ja' for Japanese
options - filter options (optional)

ProcessingPipelineFactory.createTokenizer(language)

Creates a tokenizer for the specified language.

language - 'en' for English or 'ja' for Japanese

ProcessingPipelineFactory.createEnglishPipeline(options)

Creates an English-specific processing pipeline.

options - filter options (optional)

ProcessingPipelineFactory.createJapanesePipeline(options)

Creates a Japanese-specific processing pipeline.

options - filter options (optional)

Tokenizers

EnglishTokenizer

Tokenizes English text with stemming and N-gram support.

1const tokenizer = new EnglishTokenizer();
2const tokens = await tokenizer.tokenize('machine learning algorithm');

JapaneseTokenizer

Tokenizes Japanese text using kuromoji morphological analyzer.

1const tokenizer = new JapaneseTokenizer();
2const tokens = await tokenizer.tokenize('機械学習アルゴリズム');
3const detailedTokens = await tokenizer.getDetailedTokens('機械学習アルゴリズム');

Filters

EnglishTokenFilter

Filters English tokens with stopword removal, N-gram support, and more.

1const filter = new EnglishTokenFilter({
2  removeDuplicates: true,
3  removeStopwords: true,
4  minTokenLength: 2,
5  customStopWords: ['custom', 'words']
6});
7const filtered = filter.filter(tokens);
8const ngramFiltered = filter.filterWithNgrams(tokens);

JapaneseTokenFilter

Filters Japanese tokens with part-of-speech filtering and Japanese-specific processing.

1const filter = new JapaneseTokenFilter({
2  allowedPos: ['名詞', '動詞', '形容詞'],
3  removeDuplicates: true,
4  removeStopwords: true,
5  minTokenLength: 1
6});
7const filtered = filter.filter(tokens);
8const posFiltered = filter.filterWithPos(detailedTokens);

Filter Options

Common filter options for both English and Japanese:

removeDuplicates - remove duplicate tokens（default: true）
removeStopwords - remove stopwords（default: true）
customStopWords - additional custom stopwords（default: empty array）
minTokenLength - minimum token length（default: 1）

Japanese-specific options:

allowedPos - allowed part-of-speech tags（default: 名詞、動詞、形容詞）

Development

Project Structure

├── src/                    # Source code
│   ├── lib/               # Main library code
│   │   ├── tokenizers/    # Tokenizer implementations
│   │   │   ├── EnglishTokenizer.ts
│   │   │   └── JapaneseTokenizer.ts
│   │   ├── filters/       # Token filter implementations
│   │   │   ├── EnglishTokenFilter.ts
│   │   │   └── JapaneseTokenFilter.ts
│   │   ├── factories/     # Factory classes
│   │   │   └── ProcessingPipelineFactory.ts
│   │   ├── ContentBasedRecommender.ts  # Main recommender class
│   │   └── index.ts       # Library exports
│   ├── types/             # TypeScript type definitions
│   │   └── index.ts
│   └── index.ts           # Main export file
├── test/                  # Test files
│   ├── tokenizers/        # Tokenizer tests
│   ├── filters/           # Filter tests
│   ├── factories/         # Factory tests
│   └── *.ts              # Integration and main tests
├── fixtures/              # Test data
│   ├── sample-documents.ts
│   ├── sample-document-tags.ts
│   ├── sample-target-documents.ts
│   └── sample-japanese-documents.ts
├── example/               # Usage examples
│   └── example.ts
├── index.ts               # Package entry point
├── tsconfig.json          # TypeScript configuration
└── eslint.config.js       # ESLint configuration

Running Tests

The test suite includes comprehensive unit tests and integration tests for all components:

1# Install dependencies
2npm install
3
4# Run all tests
5npm test
6
7# Run specific test categories
8npm test -- --grep "EnglishTokenizer"
9npm test -- --grep "JapaneseTokenizer"
10npm test -- --grep "EnglishTokenFilter"
11npm test -- --grep "JapaneseTokenFilter"
12npm test -- --grep "ProcessingPipelineFactory"
13npm test -- --grep "ContentBasedRecommender"
14
15# Run example
16npm run example
17
18# Run development mode with ts-node
19npm run dev

Building

1# Build TypeScript
2npm run build
3
4# Run linting
5npm run lint
6
7# Fix linting issues
8npm run lint:fix

Authors

Current Maintainer

Ken Sakurai - TypeScript migration and Japanese language support

Original Author

Stanley Fok - Original implementation

Contributors

Marian Klühspies

License

MIT

Historical Changes (from upstream)

This package is based on the original work by Stanley Fok. For historical changes before the fork, see: https://github.com/stanleyfok/content-based-recommender

No vulnerabilities found.

No security vulnerabilities found.

Other packages similar to ts-content-based-recommender

Other packages similar to ts-content-based-recommender

ts-content-based-recommender

Installations

Developer Guide

Yes

ESM

>=18.0.0

20.19.2

10.8.2

Releases

Unable to fetch releases

Download Statistics

Maintainers

Package Meta Information

Total Downloads

NaN

Weekly Downloads

Monthly Downloads

Yearly Downloads

Dependencies

Peer Dependencies

Dev Dependencies

TypeScript Content Based Recommender

Credits

Original Author

Enhancements in this fork

What's New

Latest Version

1.5.0

1.4.0

1.3.0

1.2.0

1.1.0

Installation

Overview

Language Support

English

Japanese

Usage

Single collection

Multi collection

Japanese Language Example

Using Individual Components

Advanced Configuration Example

API Reference

ContentBasedRecommender

constructor([options])

train(documents)

trainBidirectional(collectionA, collectionB)

getSimilarDocuments(id, [start], [size])

export()

import(object)

ProcessingPipelineFactory

ProcessingPipelineFactory.createPipeline(language, options)

ProcessingPipelineFactory.createTokenizer(language)

ProcessingPipelineFactory.createEnglishPipeline(options)

ProcessingPipelineFactory.createJapanesePipeline(options)

Tokenizers

EnglishTokenizer

JapaneseTokenizer

Filters

EnglishTokenFilter

JapaneseTokenFilter

Filter Options

Development

Project Structure

Running Tests

Building

Authors

Current Maintainer

Original Author

Contributors

License

Historical Changes (from upstream)