- Baraha Fonts Documents
See listing of "Glyph codes" under "Baraha Fonts Documents", i.e. encoding tables for various non-Unicode encodings in Kannada, Devanagari etc.
in Languages > Indic Languages with encoding indic
- Char encoding mappings
Unicode Consortium's list of public mappings between old encodings and Unicode
in Computational Linguistics > Character Encoding > Unicode with character encoding unicode
- Character encoding tools
Encoding validator: used for cleansing corpora that include unexpected characters. Character encoding converter: can optionally emulate non-ascii characters with ascii strings.
in Computational Linguistics > Character Encoding with encoding
- CSets: Supplemental Unicode Mapping Tables
The CSets collection is a set of mapping tables between various character sets and Unicode, and is intended to provide mappings not included in most character set conversion tools available today. The origin of this distribution was several projects that involved text encoded in many obscure character encodings. Many of these encodings are not supported in the most frequently used character set conversion tools (i.e. iconv), so this package was put together to provide the encoding information in a simple, consistent format. No program is provided to actually do the conversion between characters sets because of the wide variety of text file formats they appear in. It is up to the developer/user to write their own conversion programs using this data.
in Computational Linguistics > Character Encoding with conversion encoding
- Font converters
Font converters, mostly for Kannada. See also http://www.baraha.com/documents.htm for "Glyph codes", i.e. tables of non-Unicode encodings.
in Languages > Indic Languages > Dravidian > Kannada with encoding kannada
- Free foreign fonts
Links to websites where you can download fonts for many different alphabet and writing systems
in Computational Linguistics > Character Encoding with computational conversion encoding fonts foreign free linguistics
- ICU
ICU user guide to character set (and language) detection.
The International Components for Unicode (ICU) library user guide.
code components for g11n globalization i18n icu icu4c icu4j international internationalization l10n localization nls unicode
in Computational Linguistics > Character Encoding with character encoding identification language
- Indictrans
Indictrans: Online Conversion Tool for Legacy to Unicode
in Languages > Indic Languages > Indo-Aryan > Hindi with converter devanagari encoding hindi
- International Register of Coded Character Sets
ISO/IEC International Register of Coded Character Sets To Be Used With Escape Sequences
in Computational Linguistics > Character Encoding with character encoding iso
- Malayalam encoding converter
Since 1998, developing the free library support for Malayalam English transliteration. Thus facilitating communication in Malayalam through internet.
in Computational Linguistics > Character Encoding with computational encoding linguistics transliteration
- mozdev encoding converters
in Computational Linguistics > Character Encoding with computational conversion converters encoding linguistics mozdev
- Penreader (encoding info)
in Languages with encoding info languages penreader
- Simons--Electronic Encoding of Lexical Resources
in Computational Linguistics > Lexicography with computational encoding lexicography linguistics
- Transliteration of Indic Scripts
Draft transliteration schemes and pages on transliteration of Indic scripts in theory and practice
Draft transliteration schemes and pages on transliteration of Indic scripts in theory and practice
15919 assamese bengali devanagari gujarati gurmukhi hindi india indic iso iso15919 kannada lat panjabi scripts transliteration
in Computational Linguistics > Character Encoding with computational conversion draft encoding linguistics pages schemes transliteration
- Unicodify
in Computational Linguistics > Character Encoding > Unicode with computational conversion encoding linguistics unicodify
- Utrac
It is a command line tool and a library that recognize the encoding of an input file (ex: UTF-8, ISO-8859-1, CP437...) and its end-of-line type (CR, LF, CRLF).
Universal Text Recognizer And Converter. Detect encoding and end of line type.
charset code codepage converter convertion detection encoding page recognition recognizer utf-8
in Computational Linguistics > Language ID with character encoding identification
- XCES (Corpus Encoding Std for XML)
in Computational Linguistics > Corpora with computational corpora corpus encoding linguistics std xces xml
encoding from all users