- Corpora in CTS
Large Corpora used in CTS (U of Leeds Centre for Translation Studies)
in Languages > General Resources > Corpora with chinese corpora english german japanese portuguese russian spanish
- Corpus building for minority languages
Home page for corpus building web crawler
api corpora corpus crawler google languages minority web
in Computational Linguistics > Corpora with building computational corpora corpus languages linguistics minority
- CorpusBuilder
in Computational Linguistics > Corpora with computational corpora corpusbuilder linguistics
- David Lee's Corpus-based Linguistics LINKS
These annotated links (c. 1,000 of them) are meant mainly for linguists and language teachers who work with corpora, not computational linguists/NLP (natural language processing) people, so although the language-engineering-type links here are fairly extensive, they are not exhaustive (for such info, you'll have to look elsewhere).
in Computational Linguistics > Corpora with corpora
- Digital Text Resources for the Humanities
The session "Digital Text Resources for the Humanities ā From the Digital Humanities 2007 Conference Abstracts book (although these appear to be very extended abstracts!). "Legal Issues" consists of three papers that address the legal aspects connected to several crucial phases of handling text resources: collecting, compiling, curating, analysing, distributing, and archiving text resources such as corpora, are tasks carried out on a day-to-day basis by people involved in fields such as, for example, humanities computing, computational and corpus linguistics, information retrieval and text mining. Unfortunately, this conference report seems to have disappeared from the site.
in Computational Linguistics > Archiving with archiving corpora
- Digital Text Resources for the Humanities ā Legal Issues
The session "Digital Text Resources for the Humanities ā From the Digital Humanities 2007 Conference Abstracts book (although these appear to be very extended abstracts!). "Legal Issues" consists of three papers that address the legal aspects connected to several crucial phases of handling text resources: collecting, compiling, curating, analysing, distributing, and archiving text resources such as corpora, are tasks carried out on a day-to-day basis by people involved in fields such as, for example, humanities computing, computational and corpus linguistics, information retrieval and text mining.
in Computational Linguistics > Archiving with archiving corpora
- EMILLE
in Computational Linguistics > Corpora with computational corpora emille linguistics
- JBootCat
JBootCat is a Java front end to Marco Baroni's BootCat, and is used for generating corpora from the Internet.
The website of Andrew Roberts, Schoolof Computing, University of Leeds. Research interests: Natural LanguageLearning (NLL), Grammar Inference, Natual Language Processing (NLP),Machine Learning, Clustering, Arabic concordance.
grammarinfer language languageprocessing learning leeds machine natural nlp ofcomputing research school university unsupervised
in Computational Linguistics > Corpora with corpora
- Kevin Scannel Corpora
api corpora corpus crawler density google languages low minority nlp pi-languages spell spellchecker spider under-resourced web
in Computational Linguistics > Language ID with computational corpora kevin language linguistics scannel
- Language Resources and Evaluation
Language Resources and Evaluation is the first publication devoted to the acquisition, creation, annotation, and use of language resources, together with methods for evaluation of resources, technologies, and applications.
in Computational Linguistics > Journals with computational corpora language linguistics resources
- Language@Internet
LANGUAGE@INTERNET, an open-access, peer-reviewed, scholarly electronic journal that publishes original research on language and language use mediated by the Internet, the World Wide Web, and mobile technologies. Manuscripts are solicited on all aspects of language and language use in digital media. Submissions are welcomed that make use of analytical methods from linguistics and other language-related disciplines, as well as language-focused studies of digitally-mediated communication from other disciplines. Research methods may be qualitative or quantitative, and corpus studies making use of computational tools are encouraged.
in Linguistics with corpora internet linguistics
- Manuel Barbera, Corpus resources
language-specific links to corpora, e-texts and NLP resources in general.
Go to CLR Guide
goto
in Languages > General Resources with barbera computational corpora corpus linguistics manuel resources
- Manuel Barbera, Corpus resources
language-specific links to corpora, e-texts and NLP resources in general.
Go to CLR Guide
goto
in Languages > General Resources with barbera computational corpora corpus linguistics manuel resources
- OLAC
Worldwide network of language archives
archive archives controlledvocabulary core data dublin language linguistic linguistics metadata open resource resources
in Computational Linguistics > Archiving > OLAC with archiving corpora olac
- The End of the Irrelevant Text: Electronic Texts, Linguistics, and Literary Theory
I argue that the marginalization of textual analysis and other text-centered approaches owes something to the dominance of Chomskyan linguistics and the popularity of high theory... I argue for a return to the text, specifically the electronic, computable text, to see what corpora, text-analysis, statistical stylistics, and authorship attribution can reveal about meanings and style. The recent resurgence of interest in scholarly editions, corpora, text- analysis, stylistics, and authorship suggest that the electronic text may finally reach its full potential.
in Computational Linguistics > Corpora with chomsky corpora
- UCREL, Lancaster UK.
UCREL home page, Lancaster UK.
annotation bnc computational corpora corpus grammatical linguistics part-of-speech pos tagger tagging
in Computational Linguistics > Corpora with computational corpora lancaster linguistics ucrel
- Workshop on Frontiers in Linguistically Annotated Corpora
Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006, including Low density languages working group paper
in Computational Linguistics > Conferences with corpora lodls
- Wortschatz - International Portal
Search in 136 Corpus-Based Monolingual Dictionaries ("dictionaries" means wordform lists, with left- and right- cooccurrences, etc.)
in Computational Linguistics > Corpora with corpora by 4 users
- XCES (Corpus Encoding Std for XML)
in Computational Linguistics > Corpora with computational corpora corpus encoding linguistics std xces xml
corpora from all users