Abstract - The paper aims at providing an overview of the current situation with respect to the interaction between two different but connected language resources, i.e. lexicons and corpora. The connection has become more and more evident in the field of natural language processing in the last years owing to a series of converging factors, such as the availability of increasingly larger on-line corpora, the trend to use corpus evidence in “printed” lexicography, the presence of more robust automatic tools for corpus analysis, annotation, and extraction of information, the need for computational lexicons to adhere to real usage of language as evidenced in corpora. I shall illustrate some issues regarding the corpus-lexicon relation as it emerges in particular from several representative European projects regarding both the construction of large-scale harmonised resources to be used for various applicative purposes, also of multilingual nature, and the acquisition of lexical information from corpora to enhance and tune existing lexicons. I conclude by hinting at a few issues, related to corpus-lexicon interaction, to be considered a priority in the near future.

Corpus-based lexicon building: an overview across projects, problems, approaches

2003

Abstract

Abstract - The paper aims at providing an overview of the current situation with respect to the interaction between two different but connected language resources, i.e. lexicons and corpora. The connection has become more and more evident in the field of natural language processing in the last years owing to a series of converging factors, such as the availability of increasingly larger on-line corpora, the trend to use corpus evidence in “printed” lexicography, the presence of more robust automatic tools for corpus analysis, annotation, and extraction of information, the need for computational lexicons to adhere to real usage of language as evidenced in corpora. I shall illustrate some issues regarding the corpus-lexicon relation as it emerges in particular from several representative European projects regarding both the construction of large-scale harmonised resources to be used for various applicative purposes, also of multilingual nature, and the acquisition of lexical information from corpora to enhance and tune existing lexicons. I conclude by hinting at a few issues, related to corpus-lexicon interaction, to be considered a priority in the near future.
2003
Istituto di linguistica computazionale "Antonio Zampolli" - ILC
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/37660
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact