Articolo in rivista, 2022, ENG, 10.3233/IDA-205720

HELD: Hierarchical entity-label disambiguation in named entity recognition task using deep learning

Neves Oliveira B.S.; Fernandes De Oliveira A.; Monteiro De Lira V.; Linhares Coelho Da Silva T.; Fernandes De MacEdo J.A.

Insight Data Science Lab, Federal University of Ceará, Ceará, Brazil; Insight Data Science Lab, Federal University of Ceará, Ceará, Brazil; CNR-ISTI, Pisa, Italy; Insight Data Science Lab, Federal University of Ceará, Ceará, Brazil; Insight Data Science Lab, Federal University of Ceará, Ceará, Brazil

Named Entity Recognition (NER) is a challenging learning task of identifying and classifying entity mentions in texts into predefined categories. In recent years, deep learning (DL) methods empowered by distributed representations, such as word- and character-level embeddings, have been employed in NER systems. However, for information extraction in Police narrative reports, the performance of a DL-based NER approach is limited due to the presence of fine-grained ambiguous entities. For example, given the narrative report 'Anna stole Ada's car', imagine that we intend to identify the VICTIM and the ROBBER, two sub-labels of PERSON. Traditional NER systems have limited performance in categorizing entity labels arranged in a hierarchical structure. Furthermore, it is unfeasible to obtain information from knowledge bases to give a disambiguated meaning between the entity mentions and the actual labels. This information must be extracted directly from the context dependencies. In this paper, we deal with the Hierarchical Entity-Label Disambiguation problem in Police reports without the use of knowledge bases. To tackle such a problem, we present HELD, an ensemble model that combines two components for NER: a BLSTM-CRF architecture and a NER tool. Experiments conducted on a real Police reports dataset show that HELD significantly outperforms baseline approaches.

Intelligent data analysis 26 (3), pp. 637–657

Keywords

Deep Learning, Fine-grained entity labels, Hierarchical entity-label disambiguation using context, Named entity recognition, Police reports domain

CNR authors

Monteiro De Lira Vinicius Cezar

CNR institutes

ISTI – Istituto di scienza e tecnologie dell'informazione "Alessandro Faedo"

ID: 469225

Year: 2022

Type: Articolo in rivista

Creation: 2022-07-18 13:09:41.000

Last update: 2022-07-18 13:09:41.000

External IDs

CNR OAI-PMH: oai:it.cnr:prodotti:469225

DOI: 10.3233/IDA-205720

Scopus: 2-s2.0-85129260802

ISI Web of Science (WOS): 000789145100006