Gathering detailed insights and metrics for office-text-extractor
Gathering detailed insights and metrics for office-text-extractor
Gathering detailed insights and metrics for office-text-extractor
Gathering detailed insights and metrics for office-text-extractor
office-text-extractor-browser
Fork of office-text-extractor with unreleased changes that include browser support
@mtamayo/office-text-extractor
Yet another library to extract text from MS Office and PDF files
doxtract
Very Fast Pure JS Text Extractor For Your Office Files
doc-textify
A Node.js library to extract text from office documents (docx, pptx, xlsx, odt, odp, ods, pdf, text, html ...)
Yet another library to extract text from MS Office and PDF files
npm install office-text-extractor
Typescript
Module System
Min. Node Version
Node Version
NPM Version
64.5
Supply Chain
93.4
Quality
76.9
Maintenance
100
Vulnerability
96.2
License
TypeScript (100%)
Total Downloads
473,424
Last Day
412
Last Week
10,863
Last Month
36,468
Last Year
298,423
ISC License
78 Stars
86 Commits
7 Forks
1 Watchers
1 Branches
1 Contributors
Updated on May 14, 2025
Latest Version
3.0.3
Package Id
office-text-extractor@3.0.3
Unpacked Size
33.37 kB
Size
9.36 kB
File Count
25
NPM Version
10.5.0
Node Version
20.12.2
Published on
Apr 19, 2024
Cumulative downloads
Total Downloads
Last Day
-18.3%
412
Compared to previous day
Last Week
31.5%
10,863
Compared to previous week
Last Month
-14%
36,468
Compared to previous month
Last Year
133.5%
298,423
Compared to previous year
yet another library to extract text from docx, pptx, xlsx, and pdf files.
there are other great libraries that do the same job and have inspired this project, such as:
however, office-text-extractor has the following differences:
this package uses some amazing existing libraries that perform better than the ones that originally existed in this module, and are therefore used instead:
a big thank you to the contributors of these projects!
from version 2.0.0 onwards, this package is pure esm. please read this article for a guide on how to ensure your project can import this library.
to use office-text-extractor in an Node project, install it using npm
/pnpm
/yarn
:
1> npm install office-text-extractor 2> pnpm add office-text-extractor 3> yarn add office-text-extractor
the library currently cannot be used in the browser due to its usage of the node:buffer
library. pull requests that can replace node:buffer
with a different library are welcome!
an example of using the library to extract text is as follows:
1import { readFile } from 'node:fs/promises' 2import { getTextExtractor } from 'office-text-extractor' 3 4// this function returns a new instance of the `TextExtractor` class, with the default 5// extraction methods (docx, pptx, xlsx, pdf) registered. 6const extractor = getTextExtractor() 7 8// extract text from a url, because that's a neat first example :p 9const url = 'https://raw.githubusercontent.com/gamemaker1/office-text-extractor/rewrite/test/fixtures/docs/pptx.pptx' 10const text = await extractor.extractText({ input: url, type: 'url' }) 11 12// you can extract text from a file too, like so: 13const path = 'stuff/boring.pdf' 14const text = await extractor.extractText({ input: path, type: 'file' }) 15 16// if you have a buffer with the file in it, you can pass that too: 17const buffer = await readFile(path) 18const text = await extractor.extractText({ input: buffer, type: 'buffer' }) 19 20console.log(text)
the following is an example of how to create and use your own text extraction method:
1import { type Buffer } from 'node:buffer' 2import { TextExtractor, type TextExtractionMethod } from 'office-text-extractor' 3 4/** 5 * Extracts text from images. 6 */ 7class ImageExtractor implements TextExtractionMethod { 8 /** 9 * The mime types of the file that the extractor accepts. 10 */ 11 mimes = ['image/png', 'image/jpeg'] 12 13 /** 14 * Extracts text from the image file passed by the user. 15 */ 16 apply = async (input: Buffer): Promise<string> { 17 const text = await processImage(input) 18 return text 19 } 20} 21 22// create a new extractor and register our extraction method 23const extractor = new TextExtractor() 24extractor.addMethod(new ImageExtractor()) 25 26// then use it like you would normally 27const text = await extractor.extractText({ input: '...', type: '...' } 28console.log(text)
this project is licensed under the ISC license. please see license.md
for more details.
No vulnerabilities found.
Reason
no dangerous workflow patterns detected
Reason
no binaries found in the repo
Reason
license file detected
Details
Reason
4 existing vulnerabilities detected
Details
Reason
0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0
Reason
Found 0/19 approved changesets -- score normalized to 0
Reason
detected GitHub workflow tokens with excessive permissions
Details
Reason
no effort to earn an OpenSSF best practices badge detected
Reason
dependency not pinned by hash detected -- score normalized to 0
Details
Reason
security policy file not detected
Details
Reason
project is not fuzzed
Details
Reason
branch protection not enabled on development/release branches
Details
Reason
SAST tool is not run on all commits -- score normalized to 0
Details
Score
Last Scanned on 2025-06-30
The Open Source Security Foundation is a cross-industry collaboration to improve the security of open source software (OSS). The Scorecard provides security health metrics for open source projects.
Learn More