Corpus-based lexicon building: an overview across projects, problems, approaches

Calzolari, N

Abstract - The paper aims at providing an overview of the current situation with respect to the interaction between two different but connected language resources, i.e. lexicons and corpora. The connection has become more and more evident in the field of natural language processing in the last years owing to a series of converging factors, such as the availability of increasingly larger on-line corpora, the trend to use corpus evidence in printed lexicography, the presence of more robust automatic tools for corpus analysis, annotation, and extraction of information, the need for computational lexicons to adhere to real usage of language as evidenced in corpora. I shall illustrate some issues regarding the corpus-lexicon relation as it emerges in particular from several representative European projects regarding both the construction of large-scale harmonised resources to be used for various applicative purposes, also of multilingual nature, and the acquisition of lexical information from corpora to enhance and tune existing lexicons. I conclude by hinting at a few issues, related to corpus-lexicon interaction, to be considered a priority in the near future.