Netvouz - encoding bookmarks by mcswell

Baraha Fonts Documents
See listing of "Glyph codes" under "Baraha Fonts Documents", i.e. encoding tables for various non-Unicode encodings in Kannada, Devanagari etc.
in Languages > Indic Languages with encoding indic
Char encoding mappings
Unicode Consortium's list of public mappings between old encodings and Unicode
in Computational Linguistics > Character Encoding > Unicode with character encoding unicode
Character encoding tools
Encoding validator: used for cleansing corpora that include unexpected characters. Character encoding converter: can optionally emulate non-ascii characters with ascii strings.
in Computational Linguistics > Character Encoding with encoding
CSets: Supplemental Unicode Mapping Tables
The CSets collection is a set of mapping tables between various character sets and Unicode, and is intended to provide mappings not included in most character set conversion tools available today. The origin of this distribution was several projects that involved text encoded in many obscure character encodings. Many of these encodings are not supported in the most frequently used character set conversion tools (i.e. iconv), so this package was put together to provide the encoding information in a simple, consistent format. No program is provided to actually do the conversion between characters sets because of the wide variety of text file formats they appear in. It is up to the developer/user to write their own conversion programs using this data.
in Computational Linguistics > Character Encoding with conversion encoding
Font converters
Font converters, mostly for Kannada. See also http://www.baraha.com/documents.htm for "Glyph codes", i.e. tables of non-Unicode encodings.
in Languages > Indic Languages > Dravidian > Kannada with encoding kannada
Free foreign fonts

Links to websites where you can download fonts for many different alphabet and writing systems
in Computational Linguistics > Character Encoding with computational conversion encoding fonts foreign free linguistics
ICU
ICU user guide to character set (and language) detection.
The International Components for Unicode (ICU) library user guide.
code components for g11n globalization i18n icu icu4c icu4j international internationalization l10n localization nls unicode
in Computational Linguistics > Character Encoding with character encoding identification language
Indictrans
Indictrans: Online Conversion Tool for Legacy to Unicode
in Languages > Indic Languages > Indo-Aryan > Hindi with converter devanagari encoding hindi
International Register of Coded Character Sets
ISO/IEC International Register of Coded Character Sets To Be Used With Escape Sequences
in Computational Linguistics > Character Encoding with character encoding iso
Malayalam encoding converter
Since 1998, developing the free library support for Malayalam English transliteration. Thus facilitating communication in Malayalam through internet.
in Computational Linguistics > Character Encoding with computational encoding linguistics transliteration
mozdev encoding converters
in Computational Linguistics > Character Encoding with computational conversion converters encoding linguistics mozdev
Penreader (encoding info)
in Languages with encoding info languages penreader
Simons--Electronic Encoding of Lexical Resources
in Computational Linguistics > Lexicography with computational encoding lexicography linguistics
Transliteration of Indic Scripts
Draft transliteration schemes and pages on transliteration of Indic scripts in theory and practice
Draft transliteration schemes and pages on transliteration of Indic scripts in theory and practice
15919 assamese bengali devanagari gujarati gurmukhi hindi india indic iso iso15919 kannada lat panjabi scripts transliteration
in Computational Linguistics > Character Encoding with computational conversion draft encoding linguistics pages schemes transliteration
Unicodify
in Computational Linguistics > Character Encoding > Unicode with computational conversion encoding linguistics unicodify
Utrac
It is a command line tool and a library that recognize the encoding of an input file (ex: UTF-8, ISO-8859-1, CP437...) and its end-of-line type (CR, LF, CRLF).
Universal Text Recognizer And Converter. Detect encoding and end of line type.
charset code codepage converter convertion detection encoding page recognition recognizer utf-8
in Computational Linguistics > Language ID with character encoding identification
XCES (Corpus Encoding Std for XML)
in Computational Linguistics > Corpora with computational corpora corpus encoding linguistics std xces xml

encoding from all users

Common Tags