Presentazione, 2015, ENG

A Comparison of Distributional Semantics Models for Polylingual Text Classification

Esuli A.; Moreo Fernández A.; Sebastiani F.

CNR-ISTI, Pisa, Italy; CNR-ISTI, Pisa, Italy; CNR-ISTI, Pisa, Italy;

Polylingual Text Classification (PLTC) is a supervised learning task that consists of assigning class labels to documents belonging to different languages, assuming a representative set of training documents is available for each language. This scenario is more and more frequent, given the large quantity of multilingual platforms and communities emerging on the Internet. This task is receiving increased attention in the text classification community also due to the new challenge it poses, i.e., how to effectively leverage polylingual resources in order to infer a multilingual classifier and to improve the performance of a monolingual one. As a response, the use of machine translation tools or multilingual dictionaries has been proposed. However, these resources are not always available, or not always free to use. In this work we analyse some important methods proposed in the literature that are machine translation-free and dictionary-free, including Random Indexing, a method that, to the best of our knowledge, no-one before had tested on PLTC. We offer an analysis on the basis of space and time efficiency, and propose a particular configuration of the Random Indexing method (that we dub Lightweight Random Indexing), that outperforms (showing also a significantly reduced computational cost) all other compared algorithms.

6th Italian Information Retrieval Workshop, Cagliari, Italy, 25-26/05/2015

Keywords

Polylingual text classification, Distributional semantic models, Random Indexing

CNR authors

Moreo Fernandez Alejandro, Esuli Andrea, Sebastiani Fabrizio

CNR institutes

ISTI – Istituto di scienza e tecnologie dell'informazione "Alessandro Faedo"

ID: 344534

Year: 2015

Type: Presentazione

Creation: 2016-01-14 11:52:32.000

Last update: 2021-02-12 18:59:40.000

External links

OAI-PMH: Dublin Core

OAI-PMH: Mods

OAI-PMH: RDF

External IDs

CNR OAI-PMH: oai:it.cnr:prodotti:344534