Gathering detailed insights and metrics for any-ascii
Gathering detailed insights and metrics for any-ascii
Gathering detailed insights and metrics for any-ascii
Gathering detailed insights and metrics for any-ascii
npm install any-ascii
Module System
Min. Node Version
Typescript Support
Node Version
NPM Version
285 Stars
848 Commits
24 Forks
7 Watching
2 Branches
4 Contributors
Updated on 22 Nov 2024
Kotlin (38.44%)
Java (11.12%)
C (8.16%)
Rust (6.17%)
Elixir (5.83%)
Python (4.52%)
C# (4.32%)
Ruby (4.2%)
Shell (3.44%)
PHP (3.36%)
Julia (3.34%)
Go (3.23%)
JavaScript (3.14%)
CMake (0.44%)
Perl (0.31%)
Cumulative downloads
Total Downloads
Last day
-24%
518
Compared to previous day
Last week
-9.4%
3,245
Compared to previous week
Last month
73.2%
14,900
Compared to previous month
Last year
-5.3%
78,828
Compared to previous year
No dependencies detected.
Unicode to ASCII transliteration
Converts Unicode characters to their best ASCII representation
AnyAscii provides ASCII-only replacement strings for practically all Unicode characters. Text is converted character-by-character without considering the context. The mappings for each script are based on popular existing romanization systems. Symbolic characters are converted based on their meaning or appearance. All ASCII characters in the input are left unchanged, every other character is replaced with printable ASCII characters. Unknown characters and some known characters are replaced with an empty string and removed.
Representative examples for different languages comparing the AnyAscii output to the conventional romanization:
Language (Script) | Input | Output | Conventional |
---|---|---|---|
French (Latin) | René François Lacôte | Rene Francois Lacote | Rene Francois Lacote |
German (Latin) | Blöße | Blosse | Bloesse |
Vietnamese (Latin) | Trần Hưng Đạo | Tran Hung Dao | Tran Hung Dao |
Norwegian (Latin) | Nærøy | Naeroy | Naroy |
Ancient Greek (Greek) | Φειδιππίδης | Feidippidis | Pheidippides |
Modern Greek (Greek) | Δημήτρης Φωτόπουλος | Dimitris Fotopoylos | Dimitris Fotopoulos |
Russian (Cyrillic) | Борис Николаевич Ельцин | Boris Nikolaevich El'tsin | Boris Nikolayevich Yeltsin |
Ukrainian (Cyrillic) | Володимир Горбулін | Volodimir Gorbulin | Volodymyr Horbulin |
Bulgarian (Cyrillic) | Търговище | T'rgovishche | Targovishte |
Mandarin Chinese (Han) | 深圳 | ShenZhen | Shenzhen |
Cantonese Chinese (Han) | 深水埗 | ShenShuiBu | Sham Shui Po |
Korean (Hangul) | 화성시 | HwaSeongSi | Hwaseong-si |
Korean (Han) | 華城市 | HuaChengShi | Hwaseong-si |
Japanese (Hiragana) | さいたま | saitama | Saitama |
Japanese (Han) | 埼玉県 | QiYuXian | Saitama-ken |
Amharic (Ethiopic) | ደብረ ዘይት | debre zeyt | Debre Zeyit |
Tigrinya (Ethiopic) | ደቀምሓረ | dek'emhare | Dekemhare |
Arabic | دمنهور | dmnhwr | Damanhur |
Armenian | Աբովյան | Abovyan | Abovyan |
Georgian | სამტრედია | samt'redia | Samtredia |
Hebrew | אברהם הלוי פרנקל | 'vrhm hlvy frnkl | Abraham Halevi Fraenkel |
Unified English Braille (Braille) | ⠠⠎⠁⠽⠀⠭⠀⠁⠛ | +say x ag | Say it again |
Bengali | ময়মনসিংহ | mymnsimh | Mymensingh |
Burmese (Myanmar) | ထန်တလန် | thntln | Thantlang |
Gujarati | પોરબંદર | porbmdr | Porbandar |
Hindi (Devanagari) | महासमुंद | mhasmumd | Mahasamund |
Kannada | ಬೆಂಗಳೂರು | bemgluru | Bengaluru |
Khmer | សៀមរាប | siemrab | Siem Reap |
Lao | ສະຫວັນນະເຂດ | sahvannaekhd | Savannakhet |
Malayalam | കളമശ്ശേരി | klmsseri | Kalamassery |
Odia | ଗଜପତି | gjpti | Gajapati |
Punjabi (Gurmukhi) | ਜਲੰਧਰ | jlmdhr | Jalandhar |
Sinhala | රත්නපුර | rtnpur | Ratnapura |
Tamil | கன்னியாகுமரி | knniyakumri | Kanniyakumari |
Telugu | శ్రీకాకుళం | srikakulm | Srikakulam |
Thai | สงขลา | sngkhla | Songkhla |
Symbols | Input | Output |
---|---|---|
Emojis | 👑 🌴 | :crown: :palm_tree: |
Misc. | ☆ ♯ ♰ ⚄ ⛌ | * # + 5 X |
Letterlike | № ℳ ⅋ ⅍ | No M & A/S |
AnyAscii is implemented across multiple programming languages with the same behavior and versioning
https://raw.githubusercontent.com/anyascii/anyascii/master/impl/c/anyascii.h https://raw.githubusercontent.com/anyascii/anyascii/master/impl/c/anyascii.c
https://hex.pm/packages/any_ascii
1iex> AnyAscii.transliterate("άνθρωποι") |> IO.iodata_to_binary() 2"anthropoi"
https://pkg.go.dev/github.com/anyascii/go
1import "github.com/anyascii/go" 2 3s := anyascii.Transliterate("άνθρωποι") 4// anthropoi
Go 1.10+ compatible
https://mvnrepository.com/artifact/com.anyascii/anyascii
1String s = AnyAscii.transliterate("άνθρωποι"); 2assert s.equals("anthropoi");
Java 6+ compatible
1<dependency> 2 <groupId>com.anyascii</groupId> 3 <artifactId>anyascii</artifactId> 4 <version>LATEST</version> 5</dependency>
https://npmjs.com/package/any-ascii
1import anyAscii from 'any-ascii'; 2 3const s = anyAscii('άνθρωποι'); 4// anthropoi
npm install any-ascii
https://juliahub.com/ui/Packages/AnyAscii/wYZIV
1julia> using AnyAscii 2julia> anyascii("άνθρωποι") 3"anthropoi"
Julia 1.0+ compatible
pkg> add AnyAscii
https://packagist.org/packages/anyascii/anyascii
1$s = AnyAscii::transliterate('άνθρωποι'); 2// anthropoi
PHP 5.3+ compatible
composer require anyascii/anyascii
https://pypi.org/project/anyascii
1from anyascii import anyascii 2 3s = anyascii('άνθρωποι') 4assert s == 'anthropoi'
Python 3.3+ compatible
pip install anyascii
https://rubygems.org/gems/any_ascii
1require 'any_ascii' 2 3s = AnyAscii.transliterate('άνθρωποι') 4# anthropoi
Ruby 2.0+ compatible
gem install any_ascii
https://crates.io/crates/any_ascii
1use any_ascii::any_ascii; 2 3let s = any_ascii("άνθρωποι"); 4// anthropoi
Rust 1.42+ compatible
cargo add any_ascii
Install executable: cargo install any_ascii
1$ anyascii άνθρωποι 2anthropoi 3 4$ echo άνθρωποι | anyascii 5anthropoi
https://raw.githubusercontent.com/anyascii/anyascii/master/impl/sh/anyascii
1$ anyascii άνθρωποι 2anthropoi 3 4$ echo άνθρωποι | anyascii 5anthropoi
POSIX-compliant
https://nuget.org/packages/AnyAscii
1// C# 2using AnyAscii; 3 4string s = "άνθρωποι".Transliterate(); 5// anthropoi
.NET Core 3.0+ and .NET 5.0+ compatible
Unicode is the universal character encoding. This encoding standard provides the basis for processing, storage and interchange of text data in any language in all modern software and information technology protocols. [Unicode's scope] covers all the characters for all the writing systems of the world, modern and ancient. It also includes technical symbols, punctuations, and many other characters used in writing text. *
Unicode provides a unique numeric value for each character and uses UTF-8 to encode sequences of characters into bytes. UTF-8 uses a variable number of bytes for each character and is backwards compatible with ASCII. UTF-16 and UTF-32 are also specified but not common. There is a name and various properties for each character along with algorithms for casing, collation, equivalence, line breaking, segmentation, text direction, and more.
ASCII is the lowest common denominator character encoding, established in 1967 and using 7 bits for 128 characters. The printable characters are English letters, digits, and punctuation, with the remaining being control characters. The characters found on a standard US keyboard are from ASCII. Most legacy 8-bit encodings were backwards compatible with ASCII.
... expressed only in the original non-control ASCII range so as to be as widely compatible with as many existing tools, languages, and serialization formats as possible and avoid display issues in text editors and source control *
A language is written using characters from a script. Some languages use multiple scripts and some scripts are used by multiple languages. English uses the Latin script which is based on the alphabet the Romans used for writing Latin. Other languages using the Latin script may require additional letters and diacritics.
The Unicode Standard encodes scripts rather than languages. When writing systems for more than one language share sets of graphical symbols that have historically related derivations, the union of all of those graphical symbols ... is identified as a single script. *
When converting text between languages there are multiple properties that can be preserved:
Original | Transliteration (Spelling) | Transcription (Sound) | Translation (Meaning) |
---|---|---|---|
ευαγγέλιο | euaggelio | evangelio | gospel |
Romanization is the conversion into the Latin script using transliteration and transcription, it is most commonly used when representing the names of people and places. Some nations have an official romanization standard for their language. Several organizations publish romanization standards for multiple languages.
Geographical names are Romanized to help foreigners find the place they intend to go to and help them remember cities, villages and mountains they visited and climbed. But it is Koreans who make up the Roman transcription of their proper names to print on their business cards and draw up maps for international tourists. Sometimes, they write the lyrics of a Korean song in Roman letters to help foreigners join in a singing session or write part of a public address (in Korean) in Roman letters for a visiting foreign VIP. In this sense, it is for both foreigners and the local public. The Romanization system must not be a code only for the native English-speaking community here but an important tool for international communication between Korean society, foreign residents in the country and the entire external world. *
Supports Unicode 15.0 (2022). Covers 100k of the 149k total Unicode characters, missing 47k very rare CJK characters and 2k other rare characters.
Bundled data files total 200-500 KB depending on the implementation
AnyAscii is an alternative to (and inspired by) Unidecode and its many ports. Unidecode only supports a subset of the basic mulitlingual plane. AnyAscii gives better results, supports more than twice as many characters, and often has a smaller file size. To compare the mappings see table.tsv
and unidecode/unidecode.tsv
.
ALA-LC, BGN/PCGN, Discord, ISO, KNAB, NFKD, UNGEGN, Unihan, national standards, and more
No vulnerabilities found.
Reason
no dangerous workflow patterns detected
Reason
license file detected
Details
Reason
no binaries found in the repo
Reason
0 existing vulnerabilities detected
Reason
0 commit(s) and 2 issue activity found in the last 90 days -- score normalized to 1
Reason
Found 2/30 approved changesets -- score normalized to 0
Reason
no effort to earn an OpenSSF best practices badge detected
Reason
detected GitHub workflow tokens with excessive permissions
Details
Reason
security policy file not detected
Details
Reason
project is not fuzzed
Details
Reason
branch protection not enabled on development/release branches
Details
Reason
dependency not pinned by hash detected -- score normalized to 0
Details
Reason
SAST tool is not run on all commits -- score normalized to 0
Details
Score
Last Scanned on 2024-11-25
The Open Source Security Foundation is a cross-industry collaboration to improve the security of open source software (OSS). The Scorecard provides security health metrics for open source projects.
Learn More