Gathering detailed insights and metrics for office-text-extractor
Gathering detailed insights and metrics for office-text-extractor
Gathering detailed insights and metrics for office-text-extractor
Gathering detailed insights and metrics for office-text-extractor
Yet another library to extract text from MS Office and PDF files
npm install office-text-extractor
Typescript
Module System
Min. Node Version
Node Version
NPM Version
50.1
Supply Chain
93.7
Quality
78.3
Maintenance
100
Vulnerability
96.8
License
TypeScript (100%)
Total Downloads
0
Last Day
0
Last Week
0
Last Month
0
Last Year
0
68 Stars
86 Commits
7 Forks
2 Watching
1 Branches
1 Contributors
Latest Version
3.0.3
Package Id
office-text-extractor@3.0.3
Unpacked Size
33.37 kB
Size
9.36 kB
File Count
25
NPM Version
10.5.0
Node Version
20.12.2
Publised On
19 Apr 2024
Cumulative downloads
Total Downloads
Last day
0%
0
Compared to previous day
Last week
0%
0
Compared to previous week
Last month
0%
0
Compared to previous month
Last year
0%
0
Compared to previous year
yet another library to extract text from docx, pptx, xlsx, and pdf files.
there are other great libraries that do the same job and have inspired this project, such as:
however, office-text-extractor has the following differences:
this package uses some amazing existing libraries that perform better than the ones that originally existed in this module, and are therefore used instead:
a big thank you to the contributors of these projects!
from version 2.0.0 onwards, this package is pure esm. please read this article for a guide on how to ensure your project can import this library.
to use office-text-extractor in an Node project, install it using npm
/pnpm
/yarn
:
1> npm install office-text-extractor 2> pnpm add office-text-extractor 3> yarn add office-text-extractor
the library currently cannot be used in the browser due to its usage of the node:buffer
library. pull requests that can replace node:buffer
with a different library are welcome!
an example of using the library to extract text is as follows:
1import { readFile } from 'node:fs/promises' 2import { getTextExtractor } from 'office-text-extractor' 3 4// this function returns a new instance of the `TextExtractor` class, with the default 5// extraction methods (docx, pptx, xlsx, pdf) registered. 6const extractor = getTextExtractor() 7 8// extract text from a url, because that's a neat first example :p 9const url = 'https://raw.githubusercontent.com/gamemaker1/office-text-extractor/rewrite/test/fixtures/docs/pptx.pptx' 10const text = await extractor.extractText({ input: url, type: 'url' }) 11 12// you can extract text from a file too, like so: 13const path = 'stuff/boring.pdf' 14const text = await extractor.extractText({ input: path, type: 'file' }) 15 16// if you have a buffer with the file in it, you can pass that too: 17const buffer = await readFile(path) 18const text = await extractor.extractText({ input: buffer, type: 'buffer' }) 19 20console.log(text)
the following is an example of how to create and use your own text extraction method:
1import { type Buffer } from 'node:buffer' 2import { TextExtractor, type TextExtractionMethod } from 'office-text-extractor' 3 4/** 5 * Extracts text from images. 6 */ 7class ImageExtractor implements TextExtractionMethod { 8 /** 9 * The mime types of the file that the extractor accepts. 10 */ 11 mimes = ['image/png', 'image/jpeg'] 12 13 /** 14 * Extracts text from the image file passed by the user. 15 */ 16 apply = async (input: Buffer): Promise<string> { 17 const text = await processImage(input) 18 return text 19 } 20} 21 22// create a new extractor and register our extraction method 23const extractor = new TextExtractor() 24extractor.addMethod(new ImageExtractor()) 25 26// then use it like you would normally 27const text = await extractor.extractText({ input: '...', type: '...' } 28console.log(text)
this project is licensed under the ISC license. please see license.md
for more details.
No vulnerabilities found.
Reason
no dangerous workflow patterns detected
Reason
no binaries found in the repo
Reason
license file detected
Details
Reason
3 existing vulnerabilities detected
Details
Reason
0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0
Reason
Found 0/19 approved changesets -- score normalized to 0
Reason
detected GitHub workflow tokens with excessive permissions
Details
Reason
dependency not pinned by hash detected -- score normalized to 0
Details
Reason
no effort to earn an OpenSSF best practices badge detected
Reason
security policy file not detected
Details
Reason
project is not fuzzed
Details
Reason
branch protection not enabled on development/release branches
Details
Reason
SAST tool is not run on all commits -- score normalized to 0
Details
Score
Last Scanned on 2024-12-16
The Open Source Security Foundation is a cross-industry collaboration to improve the security of open source software (OSS). The Scorecard provides security health metrics for open source projects.
Learn More