Gathering detailed insights and metrics for chardet
Gathering detailed insights and metrics for chardet
Gathering detailed insights and metrics for chardet
Gathering detailed insights and metrics for chardet
npm install chardet
Module System
Min. Node Version
Typescript Support
Node Version
NPM Version
283 Stars
199 Commits
73 Forks
8 Watching
17 Branches
11 Contributors
Updated on 25 Nov 2024
TypeScript (99.69%)
JavaScript (0.31%)
Cumulative downloads
Total Downloads
Last day
-8.2%
4,470,486
Compared to previous day
Last week
2.1%
26,572,787
Compared to previous week
Last month
17.5%
106,611,228
Compared to previous month
Last year
11.2%
1,033,187,677
Compared to previous year
Chardet is a character detection module written in pure JavaScript (TypeScript). Module uses occurrence analysis to determine the most probable encoding.
npm i chardet
To return the encoding with the highest confidence:
1import chardet from 'chardet'; 2 3const encoding = chardet.detect(Buffer.from('hello there!')); 4// or 5const encoding = await chardet.detectFile('/path/to/file'); 6// or 7const encoding = chardet.detectFileSync('/path/to/file');
To return the full list of possible encodings use analyse
method.
1import chardet from 'chardet'; 2chardet.analyse(Buffer.from('hello there!'));
Returned value is an array of objects sorted by confidence value in descending order
1[ 2 { confidence: 90, name: 'UTF-8' }, 3 { confidence: 20, name: 'windows-1252', lang: 'fr' }, 4];
In browser, you can use Uint8Array instead of the Buffer
:
1import chardet from 'chardet'; 2chardet.analyse(new Uint8Array([0x68, 0x65, 0x6c, 0x6c, 0x6f]));
Sometimes, when data set is huge and you want to optimize performance (with a trade off of less accuracy), you can sample only the first N bytes of the buffer:
1const encoding = await chardet.detectFile('/path/to/file', { sampleSize: 32 });
You can also specify where to begin reading from in the buffer:
1const encoding = await chardet.detectFile('/path/to/file', { 2 sampleSize: 32, 3 offset: 128, 4});
In both Node.js and browsers, all strings in memory are represented in UTF-16 encoding. This is a fundamental aspect of the JavaScript language specification. Therefore, you cannot use plain strings directly as input for chardet.analyse()
or chardet.detect()
. Instead, you need the original string data in the form of a Buffer or Uint8Array.
In other words, if you receive a piece of data over the network and want to detect its encoding, use the original data payload, not its string representation. By the time you convert data to a string, it will be in UTF-16 encoding.
Note on TextEncoder: By default, it returns a UTF-8 encoded buffer, which means the buffer will not be in the original encoding of the string.
Currently only these encodings are supported.
Yes. Type definitions are included.
No vulnerabilities found.
Reason
no dangerous workflow patterns detected
Reason
no binaries found in the repo
Reason
0 existing vulnerabilities detected
Reason
license file detected
Details
Reason
packaging workflow detected
Details
Reason
2 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 1
Reason
Found 2/25 approved changesets -- score normalized to 0
Reason
detected GitHub workflow tokens with excessive permissions
Details
Reason
dependency not pinned by hash detected -- score normalized to 0
Details
Reason
no effort to earn an OpenSSF best practices badge detected
Reason
project is not fuzzed
Details
Reason
security policy file not detected
Details
Reason
SAST tool is not run on all commits -- score normalized to 0
Details
Score
Last Scanned on 2024-11-25
The Open Source Security Foundation is a cross-industry collaboration to improve the security of open source software (OSS). The Scorecard provides security health metrics for open source projects.
Learn More