Articolo in rivista, 2022, ENG, 10.3389/fpsyg.2022.707630
Dominique Brunato, Felice Dell'Orletta, Giulia Venturi
Istituto di Linguistica Computazionale "A. Zampolli" (ILC-CNR)
In this paper, we present an overview of existing parallel corpora for Automatic Text Simplification (ATS) in different languages focusing on the approach adopted for their construction. We make the main distinction between manual and (semi)-automatic approaches in order to investigate in which respect complex and simple texts vary and whether and how the observed modifications may depend on the underlying approach. To this end, we perform a two-level comparison on Italian corpora, since this is the only language, with the exception of English, for which there are large parallel resources derived through the two approaches considered. The first level of comparison accounts for the main types of sentence transformations occurring in the simplification process, the second one examines the results of a linguistic profiling analysis based on Natural Language Processing techniques and carried out on the original and the simple version of the same texts. For both levels of analysis, we chose to focus our discussion mostly on sentence transformations and linguistic characteristics that pertain to the morpho-syntactic and syntactic structure of the sentence.
Frontiers in Psychology 13 , pp. 1–19
linguistic complexity, Italian language, corpus construction, text simplification, aligned corpora
Dell Orletta Felice, Venturi Giulia, Brunato Dominique Pierina
ILC – Istituto di linguistica computazionale "Antonio Zampolli"
ID: 464954
Year: 2022
Type: Articolo in rivista
Creation: 2022-03-09 13:02:57.000
Last update: 2023-11-06 19:31:16.000
External links
OAI-PMH: Dublin Core
OAI-PMH: Mods
OAI-PMH: RDF
DOI: 10.3389/fpsyg.2022.707630
URL: https://www.frontiersin.org/articles/10.3389/fpsyg.2022.707630/full
External IDs
CNR OAI-PMH: oai:it.cnr:prodotti:464954
DOI: 10.3389/fpsyg.2022.707630