- Corpus building for minority languages
Home page for corpus building web crawler
api corpora corpus crawler google languages minority web
- Corpus Pattern Analysis
Corpus Pattern Analysis (CPA) is a new technique for mapping meaning onto words in text. It is currently being used to build a 'Pattern Dictionary of English Verbs'.
- David Lee's Corpus-based Linguistics LINKS
These annotated links (c. 1,000 of them) are meant mainly for linguists and language teachers who work with corpora, not computational linguists/NLP (natural language processing) people, so although the language-engineering-type links here are fairly extensive, they are not exhaustive (for such info, you'll have to look elsewhere).
- Grammatical incompleteness
Discussion on corpus syntax (and how we can use it to code meaning), originally from Corpus List.
JBootCat is a Java front end to Marco Baroni's BootCat, and is used for generating corpora from the Internet.
The website of Andrew Roberts, Schoolof Computing, University of Leeds. Research interests: Natural LanguageLearning (NLL), Grammar Inference, Natual Language Processing (NLP),Machine Learning, Clustering, Arabic concordance.
grammarinfer language languageprocessing learning leeds machine natural nlp ofcomputing research school university unsupervised
- Sketch Engine
The Sketch Engine (SkE, also known as Word Sketch Engine) is a Corpus Query System incorporating word sketches, grammatical relations, and a distributional thesaurus. A word sketch is a one-page, automatic, corpus-derived summary of a word’s grammatical and collocational behaviour.
- The End of the Irrelevant Text: Electronic Texts, Linguistics, and Literary Theory
I argue that the marginalization of textual analysis and other text-centered approaches owes something to the dominance of Chomskyan linguistics and the popularity of high theory... I argue for a return to the text, specifically the electronic, computable text, to see what corpora, text-analysis, statistical stylistics, and authorship attribution can reveal about meanings and style. The recent resurgence of interest in scholarly editions, corpora, text- analysis, stylistics, and authorship suggest that the electronic text may finally reach its full potential.
- Transcription-annotation projects
- UCREL, Lancaster UK.
UCREL home page, Lancaster UK.
annotation bnc computational corpora corpus grammatical linguistics part-of-speech pos tagger tagging
- Wortschatz - International Portal
Search in 136 Corpus-Based Monolingual Dictionaries ("dictionaries" means wordform lists, with left- and right- cooccurrences, etc.)
by 4 users
- XCES (Corpus Encoding Std for XML)
1 - 13