Gathering detailed insights and metrics for encoding-japanese
Gathering detailed insights and metrics for encoding-japanese
Gathering detailed insights and metrics for encoding-japanese
Gathering detailed insights and metrics for encoding-japanese
Convert and detect character encoding in JavaScript
npm install encoding-japanese
Module System
Min. Node Version
Typescript Support
Node Version
NPM Version
589 Stars
238 Commits
123 Forks
22 Watching
2 Branches
5 Contributors
Updated on 27 Nov 2024
JavaScript (100%)
Cumulative downloads
Total Downloads
Last day
-4.6%
248,347
Compared to previous day
Last week
3.4%
1,432,767
Compared to previous week
Last month
7.9%
5,919,758
Compared to previous month
Last year
65%
51,986,137
Compared to previous year
Convert and detect character encoding in JavaScript.
to
as an objecttype
optionencoding.js is a JavaScript library for converting and detecting character encodings,
supporting both Japanese character encodings (Shift_JIS
, EUC-JP
, ISO-2022-JP
) and Unicode formats (UTF-8
, UTF-16
).
Since JavaScript string values are internally encoded as UTF-16 code units (ref: ECMAScript® 2019 Language Specification - 6.1.4 The String Type), they cannot directly handle other character encodings as strings. However, encoding.js overcomes this limitation by treating these encodings as arrays instead of strings, enabling the conversion between different character sets.
Each character encoding is represented as an array of numbers corresponding to character code values, for example, [130, 160]
represents "あ" in Shift_JIS.
The array of character codes used in its methods can also be utilized with TypedArray objects, such as Uint8Array
, or with Buffer
in Node.js.
Numeric arrays of character codes can be converted to strings using methods such as Encoding.codeToString
.
However, due to the JavaScript specifications mentioned above, some character encodings may not be handled properly when converted directly to strings.
If you prefer to use strings instead of numeric arrays, you can convert them to percent-encoded strings,
such as '%82%A0'
, using Encoding.urlEncode
and Encoding.urlDecode
for passing to other resources.
Similarly, Encoding.base64Encode
and Encoding.base64Decode
allow for encoding and decoding to and from base64,
which can then be passed as strings.
encoding.js is published under the package name encoding-japanese
on npm.
1npm install --save encoding-japanese
import
1import Encoding from 'encoding-japanese';
require
1const Encoding = require('encoding-japanese');
TypeScript type definitions for encoding.js are available at @types/encoding-japanese (thanks to @rhysd).
1npm install --save-dev @types/encoding-japanese
To use encoding.js in a browser environment, you can either install it via npm or download it directly from the release list.
The package includes both encoding.js
and encoding.min.js
.
Note: Cloning the repository via git clone
might give you access to the master (or main) branch, which could still be in a development state.
1<!-- To include the full version --> 2<script src="encoding.js"></script> 3 4<!-- Or, to include the minified version for production --> 5<script src="encoding.min.js"></script>
When the script is loaded, the object Encoding
is defined in the global scope (i.e., window.Encoding
).
You can use encoding.js (package name: encoding-japanese
) directly from a CDN via a script tag:
1<script src="https://unpkg.com/encoding-japanese@2.2.0/encoding.min.js"></script>
In this example we use unpkg, but you can use any CDN that provides npm packages, for example cdnjs or jsDelivr.
Value in encoding.js | detect() | convert() | MIME Name (Note) |
---|---|---|---|
ASCII | ✓ | US-ASCII (Code point range: 0-127 ) | |
BINARY | ✓ | (Binary string. Code point range: 0-255 ) | |
EUCJP | ✓ | ✓ | EUC-JP |
JIS | ✓ | ✓ | ISO-2022-JP |
SJIS | ✓ | ✓ | Shift_JIS |
UTF8 | ✓ | ✓ | UTF-8 |
UTF16 | ✓ | ✓ | UTF-16 |
UTF16BE | ✓ | ✓ | UTF-16BE (big-endian) |
UTF16LE | ✓ | ✓ | UTF-16LE (little-endian) |
UTF32 | ✓ | UTF-32 | |
UNICODE | ✓ | ✓ | (JavaScript string. *See About UNICODE below) |
UNICODE
In encoding.js, UNICODE
is defined as the internal character encoding that JavaScript strings (JavaScript string objects) can handle directly.
As mentioned in the Features section, JavaScript strings are internally encoded using UTF-16 code units.
This means that other character encodings cannot be directly handled without conversion.
Therefore, when converting to a character encoding that is properly representable in JavaScript, you should specify UNICODE
.
(Note: Even if the HTML file's encoding is UTF-8, you should specify UNICODE
instead of UTF8
when processing the encoding in JavaScript.)
When using Encoding.convert
, if you specify a character encoding other than UNICODE
(such as UTF8
or SJIS
), the values in the returned character code array will range from 0-255
.
However, if you specify UNICODE
, the values will range from 0-65535
, which corresponds to the range of values returned by String.prototype.charCodeAt()
(Code Units).
Convert character encoding from JavaScript string (UNICODE
) to SJIS
.
1const unicodeArray = Encoding.stringToCode('こんにちは'); // Convert string to code array 2const sjisArray = Encoding.convert(unicodeArray, { 3 to: 'SJIS', 4 from: 'UNICODE' 5}); 6console.log(sjisArray); 7// [130, 177, 130, 241, 130, 201, 130, 191, 130, 205] ('こんにちは' array in SJIS)
Convert character encoding from SJIS
to UNICODE
.
1const sjisArray = [ 2 130, 177, 130, 241, 130, 201, 130, 191, 130, 205 3]; // 'こんにちは' array in SJIS 4 5const unicodeArray = Encoding.convert(sjisArray, { 6 to: 'UNICODE', 7 from: 'SJIS' 8}); 9const str = Encoding.codeToString(unicodeArray); // Convert code array to string 10console.log(str); // 'こんにちは'
Detect character encoding.
1const data = [ 2 227, 129, 147, 227, 130, 147, 227, 129, 171, 227, 129, 161, 227, 129, 175 3]; // 'こんにちは' array in UTF-8 4 5const detectedEncoding = Encoding.detect(data); 6console.log(`Character encoding is ${detectedEncoding}`); // 'Character encoding is UTF8'
(Node.js) Example of reading a text file written in SJIS
.
1const fs = require('fs'); 2const Encoding = require('encoding-japanese'); 3 4const sjisBuffer = fs.readFileSync('./sjis.txt'); 5const unicodeArray = Encoding.convert(sjisBuffer, { 6 to: 'UNICODE', 7 from: 'SJIS' 8}); 9console.log(Encoding.codeToString(unicodeArray));
Detects the character encoding of the given data.
AUTO
is specified.
Supported encoding values can be found in the "Supported encodings" section.(string|boolean): Returns a string representing the detected encoding (e.g., SJIS
, UTF8
) listed in the "Supported encodings" section, or false
if the encoding cannot be detected.
If the encodings
argument is provided, it returns the name of the detected encoding if the data
matches any of the specified encodings, or false
otherwise.
Example of detecting character encoding.
1const sjisArray = [130, 168, 130, 205, 130, 230]; // 'おはよ' array in SJIS 2const detectedEncoding = Encoding.detect(sjisArray); 3console.log(`Encoding is ${detectedEncoding}`); // 'Encoding is SJIS'
Example of using the encodings
argument to specify the character encoding to be detected.
This returns a string detected encoding if the specified encoding matches, or false
otherwise:
1const sjisArray = [130, 168, 130, 205, 130, 230]; // 'おはよ' array in SJIS 2const detectedEncoding = Encoding.detect(sjisArray, 'SJIS'); 3if (detectedEncoding) { 4 console.log('Encoding is SJIS'); 5} else { 6 console.log('Encoding does not match SJIS'); 7}
Example of specifying multiple encodings:
1const sjisArray = [130, 168, 130, 205, 130, 230]; // 'おはよ' array in SJIS 2const detectedEncoding = Encoding.detect(sjisArray, ['UTF8', 'SJIS']); 3if (detectedEncoding) { 4 console.log(`Encoding is ${detectedEncoding}`); // 'Encoding is SJIS' 5} else { 6 console.log('Encoding does not match UTF8 and SJIS'); 7}
Converts the character encoding of the given data.
AUTO
is specified.
Supported encoding values can be found in the "Supported encodings" section.(Array<number>|TypedArray|string) : Returns a numeric character code array of the converted character encoding if data
is an array or a buffer,
or returns the converted string if data
is a string.
Example of converting a character code array to Shift_JIS from UTF-8:
1const utf8Array = [227, 129, 130]; // 'あ' in UTF-8 2const sjisArray = Encoding.convert(utf8Array, 'SJIS', 'UTF8'); 3console.log(sjisArray); // [130, 160] ('あ' in SJIS)
TypedArray such as Uint8Array
, and Buffer
of Node.js can be converted in the same usage:
1const utf8Array = new Uint8Array([227, 129, 130]); 2const sjisArray = Encoding.convert(utf8Array, 'SJIS', 'UTF8');
Converts character encoding by auto-detecting the encoding name of the source:
1// The character encoding is automatically detected when the argument `from` is omitted 2const utf8Array = [227, 129, 130]; 3let sjisArray = Encoding.convert(utf8Array, 'SJIS'); 4// Or explicitly specify 'AUTO' to auto-detecting 5sjisArray = Encoding.convert(utf8Array, 'SJIS', 'AUTO');
to
as an objectYou can pass the second argument to
as an object for improving readability.
Also, the following options such as type
, fallback
, and bom
must be specified with an object.
1const utf8Array = [227, 129, 130]; 2const sjisArray = Encoding.convert(utf8Array, { 3 to: 'SJIS', 4 from: 'UTF8' 5});
type
optionconvert
returns an array by default, but you can change the return type by specifying the type
option.
Also, if the argument data
is passed as a string and the type
option is not specified, then type
='string' is assumed (returns as a string).
1const sjisArray = [130, 168, 130, 205, 130, 230]; // 'おはよ' array in SJIS 2const unicodeString = Encoding.convert(sjisArray, { 3 to: 'UNICODE', 4 from: 'SJIS', 5 type: 'string' // Specify 'string' to return as string 6}); 7console.log(unicodeString); // 'おはよ'
The following type
options are supported.
Uint16Array
due to historical reasons).type: 'string'
can be used as a shorthand for converting a code array to a string,
as performed by Encoding.codeToString
.
Note: Specifying type: 'string'
may not handle conversions properly, except when converting to UNICODE
.
With the fallback
option, you can specify how to handle characters that cannot be represented in the target encoding.
The fallback
option supports the following values:
Characters that cannot be represented in the target character set are replaced with '?' (U+003F) by default,
but by specifying html-entity
as the fallback
option, you can replace them with HTML entities (Numeric character references), such as 🍣
.
Example of specifying { fallback: 'html-entity' }
option:
1const unicodeArray = Encoding.stringToCode('寿司🍣ビール🍺'); 2// No fallback specified 3let sjisArray = Encoding.convert(unicodeArray, { 4 to: 'SJIS', 5 from: 'UNICODE' 6}); 7console.log(sjisArray); // Converted to a code array of '寿司?ビール?' 8 9// Specify `fallback: html-entity` 10sjisArray = Encoding.convert(unicodeArray, { 11 to: 'SJIS', 12 from: 'UNICODE', 13 fallback: 'html-entity' 14}); 15console.log(sjisArray); // Converted to a code array of '寿司🍣ビール🍺'
Example of specifying { fallback: 'html-entity-hex' }
option:
1const unicodeArray = Encoding.stringToCode('ホッケの漢字は𩸽'); 2const sjisArray = Encoding.convert(unicodeArray, { 3 to: 'SJIS', 4 from: 'UNICODE', 5 fallback: 'html-entity-hex' 6}); 7console.log(sjisArray); // Converted to a code array of 'ホッケの漢字は𩸽'
By specifying ignore
as a fallback
option, characters that cannot be represented in the target encoding format can be ignored.
Example of specifying { fallback: 'ignore' }
option:
1const unicodeArray = Encoding.stringToCode('寿司🍣ビール🍺'); 2// No fallback specified 3let sjisArray = Encoding.convert(unicodeArray, { 4 to: 'SJIS', 5 from: 'UNICODE' 6}); 7console.log(sjisArray); // Converted to a code array of '寿司?ビール?' 8 9// Specify `fallback: ignore` 10sjisArray = Encoding.convert(unicodeArray, { 11 to: 'SJIS', 12 from: 'UNICODE', 13 fallback: 'ignore' 14}); 15console.log(sjisArray); // Converted to a code array of '寿司ビール'
If you need to throw an error when a character cannot be represented in the target character encoding,
specify error
as a fallback
option. This will cause an exception to be thrown.
Example of specifying { fallback: 'error' }
option:
1const unicodeArray = Encoding.stringToCode('おにぎり🍙ラーメン🍜'); 2try { 3 const sjisArray = Encoding.convert(unicodeArray, { 4 to: 'SJIS', 5 from: 'UNICODE', 6 fallback: 'error' // Specify 'error' to throw an exception 7 }); 8} catch (e) { 9 console.error(e); // Error: Character cannot be represented: [240, 159, 141, 153] 10}
You can add a BOM (byte order mark) by specifying the bom
option when converting to UTF16
.
The default is no BOM.
1const utf16Array = Encoding.convert(utf8Array, { 2 to: 'UTF16', 3 from: 'UTF8', 4 bom: true // Specify to add the BOM 5});
UTF16
byte order is big-endian by default.
If you want to convert as little-endian, specify the { bom: 'LE' }
option.
1const utf16leArray = Encoding.convert(utf8Array, { 2 to: 'UTF16', 3 from: 'UTF8', 4 bom: 'LE' // Specify to add the BOM as little-endian 5});
If you do not need BOM, use UTF16BE
or UTF16LE
.
UTF16BE
is big-endian, and UTF16LE
is little-endian, and both have no BOM.
1const utf16beArray = Encoding.convert(utf8Array, { 2 to: 'UTF16BE', 3 from: 'UTF8' 4});
Encodes a numeric character code array into a percent-encoded string formatted as a URI component in %xx
format.
urlEncode escapes all characters except the following, just like encodeURIComponent()
.
A-Z a-z 0-9 - _ . ! ~ * ' ( )
(string) : Returns a percent-encoded string formatted as a URI component in %xx
format.
Example of URL encoding a Shift_JIS array:
1const sjisArray = [130, 168, 130, 205, 130, 230]; // 'おはよ' array in SJIS 2const encoded = Encoding.urlEncode(sjisArray); 3console.log(encoded); // '%82%A8%82%CD%82%E6'
Decodes a percent-encoded string formatted as a URI component in %xx
format to a numeric character code array.
(Array<number>) : Returns a numeric character code array.
Example of decoding a percent-encoded Shift_JIS string:
1const encoded = '%82%A8%82%CD%82%E6'; // 'おはよ' encoded as percent-encoded SJIS string 2const sjisArray = Encoding.urlDecode(encoded); 3console.log(sjisArray); // [130, 168, 130, 205, 130, 230]
Encodes a numeric character code array into a Base64 encoded string.
(string) : Returns a Base64 encoded string.
Example of Base64 encoding a Shift_JIS array:
1const sjisArray = [130, 168, 130, 205, 130, 230]; // 'おはよ' array in SJIS 2const encodedStr = Encoding.base64Encode(sjisArray); 3console.log(encodedStr); // 'gqiCzYLm'
Decodes a Base64 encoded string to a numeric character code array.
(Array<number>) : Returns a Base64 decoded numeric character code array.
Example of base64Encode
and base64Decode
:
1const sjisArray = [130, 177, 130, 241, 130, 201, 130, 191, 130, 205]; // 'こんにちは' array in SJIS 2const encodedStr = Encoding.base64Encode(sjisArray); 3console.log(encodedStr); // 'grGC8YLJgr+CzQ==' 4 5const decodedArray = Encoding.base64Decode(encodedStr); 6console.log(decodedArray); // [130, 177, 130, 241, 130, 201, 130, 191, 130, 205]
Converts a numeric character code array to string.
(string) : Returns a converted string.
Example of converting a character code array to a string:
1const sjisArray = [130, 168, 130, 205, 130, 230]; // 'おはよ' array in SJIS 2const unicodeArray = Encoding.convert(sjisArray, { 3 to: 'UNICODE', 4 from: 'SJIS' 5}); 6const unicodeStr = Encoding.codeToString(unicodeArray); 7console.log(unicodeStr); // 'おはよ'
Converts a string to a numeric character code array.
(Array<number>) : Returns a numeric character code array converted from the string.
Example of converting a string to a character code array:
1const unicodeArray = Encoding.stringToCode('おはよ'); 2console.log(unicodeArray); // [12362, 12399, 12424]
The following methods convert Japanese full-width (zenkaku) and half-width (hankaku) characters,
suitable for use with UNICODE
strings or numeric character code arrays of UNICODE
.
Returns a converted string if the argument data
is a string.
Returns a numeric character code array if the argument data
is a code array.
(Array<number>|string) : Returns a converted string or numeric character code array.
Example of converting zenkaku and hankaku strings:
1console.log(Encoding.toHankakuCase('abcDEF123@!#*=')); // 'abcDEF123@!#*=' 2console.log(Encoding.toZenkakuCase('abcDEF123@!#*=')); // 'abcDEF123@!#*=' 3console.log(Encoding.toHiraganaCase('アイウエオァィゥェォヴボポ')); // 'あいうえおぁぃぅぇぉゔぼぽ' 4console.log(Encoding.toKatakanaCase('あいうえおぁぃぅぇぉゔぼぽ')); // 'アイウエオァィゥェォヴボポ' 5console.log(Encoding.toHankanaCase('アイウエオァィゥェォヴボポ')); // 'アイウエオァィゥェォヴボポ' 6console.log(Encoding.toZenkanaCase('アイウエオァィゥェォヴボポ')); // 'アイウエオァィゥェォヴボポ' 7console.log(Encoding.toHankakuSpace('あいうえお abc 123')); // 'あいうえお abc 123' 8console.log(Encoding.toZenkakuSpace('あいうえお abc 123')); // 'あいうえお abc 123'
Example of converting zenkaku and hankaku code arrays:
1const unicodeArray = Encoding.stringToCode('abc123!# あいうアイウ ABCアイウ'); 2console.log(Encoding.codeToString(Encoding.toHankakuCase(unicodeArray))); 3// 'abc123!# あいうアイウ ABCアイウ' 4console.log(Encoding.codeToString(Encoding.toZenkakuCase(unicodeArray))); 5// 'abc123!# あいうアイウ ABCアイウ' 6console.log(Encoding.codeToString(Encoding.toHiraganaCase(unicodeArray))); 7// 'abc123!# あいうあいう ABCアイウ' 8console.log(Encoding.codeToString(Encoding.toKatakanaCase(unicodeArray))); 9// 'abc123!# アイウアイウ ABCアイウ' 10console.log(Encoding.codeToString(Encoding.toHankanaCase(unicodeArray))); 11// 'abc123!# あいうアイウ ABCアイウ' 12console.log(Encoding.codeToString(Encoding.toZenkanaCase(unicodeArray))); 13// 'abc123!# あいうアイウ ABCアイウ' 14console.log(Encoding.codeToString(Encoding.toHankakuSpace(unicodeArray))); 15// 'abc123!# あいうアイウ ABCアイウ' 16console.log(Encoding.codeToString(Encoding.toZenkakuSpace(unicodeArray))); 17// 'abc123!# あいうアイウ ABCアイウ'
Fetch API
and Typed Arrays (Uint8Array)This example reads a text file encoded in Shift_JIS as binary data, and displays it as a string after converting it to Unicode using Encoding.convert.
1(async () => { 2 try { 3 const response = await fetch('shift_jis.txt'); 4 const buffer = await response.arrayBuffer(); 5 6 // Code array with Shift_JIS file contents 7 const sjisArray = new Uint8Array(buffer); 8 9 // Convert encoding to UNICODE (JavaScript Code Units) from Shift_JIS 10 const unicodeArray = Encoding.convert(sjisArray, { 11 to: 'UNICODE', 12 from: 'SJIS' 13 }); 14 15 // Convert to string from code array for display 16 const unicodeString = Encoding.codeToString(unicodeArray); 17 console.log(unicodeString); 18 } catch (error) { 19 console.error('Error loading the file:', error); 20 } 21})();
1const req = new XMLHttpRequest(); 2req.open('GET', 'shift_jis.txt', true); 3req.responseType = 'arraybuffer'; 4 5req.onload = (event) => { 6 const buffer = req.response; 7 if (buffer) { 8 // Code array with Shift_JIS file contents 9 const sjisArray = new Uint8Array(buffer); 10 11 // Convert encoding to UNICODE (JavaScript Code Units) from Shift_JIS 12 const unicodeArray = Encoding.convert(sjisArray, { 13 to: 'UNICODE', 14 from: 'SJIS' 15 }); 16 17 // Convert to string from code array for display 18 const unicodeString = Encoding.codeToString(unicodeArray); 19 console.log(unicodeString); 20 } 21}; 22 23req.send(null);
This example uses the File API to read the content of a selected file, detects its character encoding,
and converts the file content to UNICODE from any character encoding such as Shift_JIS
or EUC-JP
.
The converted content is then displayed in a textarea.
1<input type="file" id="file"> 2<div id="encoding"></div> 3<textarea id="content" rows="5" cols="80"></textarea> 4 5<script> 6function onFileSelect(event) { 7 const file = event.target.files[0]; 8 9 const reader = new FileReader(); 10 reader.onload = function(e) { 11 const codes = new Uint8Array(e.target.result); 12 13 const detectedEncoding = Encoding.detect(codes); 14 const encoding = document.getElementById('encoding'); 15 encoding.textContent = `Detected encoding: ${detectedEncoding}`; 16 17 // Convert encoding to UNICODE 18 const unicodeString = Encoding.convert(codes, { 19 to: 'UNICODE', 20 from: detectedEncoding, 21 type: 'string' 22 }); 23 document.getElementById('content').value = unicodeString; 24 }; 25 26 reader.readAsArrayBuffer(file); 27} 28 29document.getElementById('file').addEventListener('change', onFileSelect); 30</script>
We welcome contributions from everyone. For bug reports and feature requests, please create an issue on GitHub.
Before submitting a pull request, please run npm run test
to ensure there are no errors.
We only accept pull requests that pass all tests.
This project is licensed under the terms of the MIT license. See the LICENSE file for details.
No vulnerabilities found.
Reason
no dangerous workflow patterns detected
Reason
GitHub workflow tokens follow principle of least privilege
Details
Reason
no binaries found in the repo
Reason
license file detected
Details
Reason
6 commit(s) and 1 issue activity found in the last 90 days -- score normalized to 5
Reason
dependency not pinned by hash detected -- score normalized to 3
Details
Reason
Found 2/17 approved changesets -- score normalized to 1
Reason
no effort to earn an OpenSSF best practices badge detected
Reason
security policy file not detected
Details
Reason
project is not fuzzed
Details
Reason
branch protection not enabled on development/release branches
Details
Reason
SAST tool is not run on all commits -- score normalized to 0
Details
Reason
10 existing vulnerabilities detected
Details
Score
Last Scanned on 2024-11-18
The Open Source Security Foundation is a cross-industry collaboration to improve the security of open source software (OSS). The Scorecard provides security health metrics for open source projects.
Learn More