- 1984 corpus
Multext-East D2.1 F
mte-d21f
in Languages with bulgarian corpus czech estonian hungarian languages romanian slovene
- Chula Corpus Project
Chulalongkorn University: Chula to create 80m-word Thai 'corpus'
in Languages > Thai with corpus thai
- Corpus building for minority languages
Home page for corpus building web crawler
api corpora corpus crawler google languages minority web
in Computational Linguistics > Corpora with building computational corpora corpus languages linguistics minority
- Corpus Pattern Analysis
Corpus Pattern Analysis (CPA) is a new technique for mapping meaning onto words in text. It is currently being used to build a 'Pattern Dictionary of English Verbs'.
in Computational Linguistics > Corpora with corpus linguistics
- EMILLE Corpus Full Release
in Languages > Indic Languages > Indo-Aryan > Hindi with corpus emille hindi indic indo-aryan languages release
- ERDC Corpus
PDF describing Indic corpus clxn, but old, and ERDIC doesn't seem to have any corpora work anymore. Link to ERDIC NOIDA's web site at http://www.ipu.ac.in/affiliates/institutes/webpage/AIERDCI.HTM,
but it's broken.
in Languages > Indic Languages > Indo-Aryan > Hindi with corpus erdc hindi indic indo-aryan languages
- Grammatical incompleteness
Discussion on corpus syntax (and how we can use it to code meaning), originally from Corpus List.
in Computational Linguistics > Corpora with chomsky corpus syntax
- Helsinki corpus of Turkic
Was at http://www.ling.helsinki.fi/uhlcs/data/turkic-lgs. The URL above is the URL of their index of corpora, and Turkish is not mentioned (although many other Turkic languages are).
in Languages > Turkic with corpus helsinki languages turkic
- Historical Spanish Corpus
[BYU/NEH] 100 million words, 1200s-1900s. Search by word, phrase, PoS, lemma, synonym, collocates, historical period, dialect, and register.
corpora corpus espanol español frecuencia frequency historical histórico listado lists palabras spanish spoken word wordlists
in Languages > Romance Languages > Spanish with corpus historical languages romance spanish
- Hungarian corpus
Hungarian
mte-d21f
in Languages > Hungarian with corpus hungarian languages
- Manuel Barbera, Corpus resources
language-specific links to corpora, e-texts and NLP resources in general.
Go to CLR Guide
goto
in Languages > General Resources with barbera computational corpora corpus linguistics manuel resources
- Manuel Barbera, Corpus resources
language-specific links to corpora, e-texts and NLP resources in general.
Go to CLR Guide
goto
in Languages > General Resources with barbera computational corpora corpus linguistics manuel resources
- OPUS parallel corpus
OPUS is an attempt to collect translated texts from the web, to convert and align the entire collection, to add linguistic data, and to provide the community with a publicly available parallel corpus. OPUS is based on open source products and is also delivered as an open source package. --- In early July 2007, the server was timing out, but it seems to be fixed now.
in Languages with corpus languages opus parallel
- OSU Linguistics Corpus Resources
in Languages with corpus languages linguistics osu resources
- Sketch Engine
The Sketch Engine (SkE, also known as Word Sketch Engine) is a Corpus Query System incorporating word sketches, grammatical relations, and a distributional thesaurus. A word sketch is a one-page, automatic, corpus-derived summary of a word’s grammatical and collocational behaviour.
in Computational Linguistics > Corpora with corpus linguistics
- Thai National Corpus
(in Thai)
in Languages > Thai with corpus thai
- The Corpus Project--Hebrew
in Languages > Semitic > Hebrew with corpus hebrew languages project--hebrew semitic
- Tibetan corpus
in Languages > Tibetan with corpus languages tibetan
- Turkish Corpus
METU Turkish Corpus is a collection of 2 million words of post-1990 written Turkish samples; METU-Sabanci Turkish Treebank is a morphologically and syntactically annotated treebank corpus sentences
in Public bookmarks with corpus turkish
- XCES (Corpus Encoding Std for XML)
in Computational Linguistics > Corpora with computational corpora corpus encoding linguistics std xces xml
corpus from all users