NLP Resources

NLP systems often require access to large amounts of multilingual training data, lexicons, ontologies, knowledge bases, etc. ÌýOn this page we’ve made a start at listing useful resources, many developed here.

Associated Publications

LiLT

Linguistic Issues in Language Technology (LiLT) is a new open-access journal that focusses on relationships between linguistic insights, which can prove valuable to language technology, and language technology, which can enrich linguistic research. The Editorial Board of LiLT believes that, in conjunction with machine learning and statistical techniques, deeper and more sophisticated models of language and speech are needed to make significant progress in newly emerging areas of computational language analysis. LiLT provides a forum for such work. LiLT takes an eclectic view on methodology.

Ìý

ACL Anthology

The ACL Anthology contains open access links to all of the ACL related journals and conferences for our field, including the Computational Linguistics Journal, Transactions of the ACL and all of the ACL and LRE related conferences. Several other relevant publications are hosted there as well.

Ìý

Ìý

Ìý

NLP Corpora

Sketch Engine

Sketch Engine is a corpus manager and text analysis software that was developed by Lexical Computing Limited in 2003 and is regularly updated. Its purpose is to enable people studying language behavior to search large text collections according to complex and linguistically motivated queries.

Ìý

LDC-Corpora

Colorado has been a member of the Linguistic Data Consortium for decades, and we have accumulated a fair amount of data from that source. ÌýHere are links to frequently requested datasets that are available on a local server here at CU. ÌýYou will need a ‘verbs’ account to access this data. ÌýTo access the discs in the LDC library, contact Ghazaleh Kazeminejad.

Ìý

Ìý

Ìý

Computational Lexical Resources

PropBank

The Proposition Bank (PropBank) is first a valency lexicon consisting of sense-specific argument structures for Ìýwell over 6000 verb lemmas. Second it comprises the millions of words of annotated text data that associates those predicate argument structures with the syntactic trees of sentences in context (Kingsbury & Palmer, 2002; Palmer, et. al., 2005; Palmer, et.al., 2010)

Ìý

Ìý

VerbNetÌý

VerbNet (VN) is a large, hierarchical, domain-independent broad-coverage verb lexicon that is intended for use in Natural Language Processing applications (Dang, et. al., 1998, Kipper, et. al., 2000, Kipper Schuler, 2006). ÌýIt groups semantically similar verbs into classes and provides syntactic realizations, thematic roles and pre-conditions and post-conditions in first order logic as semantic representations for every class.Ìý

Ìý

Unified Verb Index

There are several computational lexical resources that are frequently incorporated into Natural Language Processing systems. ÌýSeveral of these are hosted here at CU, includingÌýÌýandÌý. ÌýAdditional popular resources areÌý,Ìý,ÌýÌýandÌý. The more coarse-grained groupings of WordNet verb senses known as theÌýÌýwere also developed at CU. The CU Unified Verb Index web site facilitates searching through all of these resources for individual lexical items.

Ìý

Ìý

Ìý

CompSem Wiki

This wiki page has information about the Computational Semantics lab meetings that are held every Wednesday in Fleming 279 at 10:30am. ÌýIt also contains a link to theÌý.Ìý