Rapporto tecnico, 2016, ENG

JaTeCS, a Java library focused on automatic text categorization

Esuli A.; Fagni T.; Moreo Fernández A.

CNR-ISTI, Pisa, Italy; CNR-ISTI, Pisa, Italy; CNR-ISTI, Pisa, Italy

JaTeCS is an open source Java library focused on automatic text categorization. It covers all the steps of an experimental activity, from reading the corpus to the evaluation of the results. JaTeCS focuses on text as the central input, and its code is optimized for this type of data. As with many other machine learning (ML) frameworks, JaTeCS provides data readers for many formats and well-known corpora, NLP tools, feature selection and weighting methods, the implementation of many ML algorithms as well as wrappers for well-known external software (e.g., libSVM, SVMlight). JaTeCS also provides the implementation of methods related to text classification that are rarely, if never, provided by other ML framework (e.g., active learning, quantification, transfer learning).

Keywords

Machine learning, Text categorization, Text mining, Natural language processing, Artificial intelligence, Natural language processing, Pattern recognition, Clustering, Learning

CNR authors

Moreo Fernandez Alejandro, Esuli Andrea, Fagni Tiziano

CNR institutes

ISTI – Istituto di scienza e tecnologie dell'informazione "Alessandro Faedo"

ID: 354510

Year: 2016

Type: Rapporto tecnico

Creation: 2016-04-29 19:08:04.000

Last update: 2021-01-04 10:03:04.000

External links

OAI-PMH: Dublin Core

OAI-PMH: Mods

OAI-PMH: RDF

External IDs

CNR OAI-PMH: oai:it.cnr:prodotti:354510