Articolo in rivista, 2023, ENG, 10.1007/s10579-021-09574-0

The ParlaMint corpora of parliamentary proceedings

Erjavec T.; Ogrodniczuk M.; Osenova P.; Ljubesic N.; Simov K.; Pancur A.; Rudolf M.; Kopp M.; Barkarson S.; Steingrimsson S.; Coltekin C.; de Does J.; Depuydt K.; Agnoloni T.; Venturi G.; Perez M.C.; de Macedo L.D.; Navarretta C.; Luxardo G.; Coole M.; Rayson P.; Morkevicius V.; Krilavicius T.; Dargis R.; Ring O.; van Heusden R.; Marx M.; Fiser D.

Department of Knowledge Technologies, Jo?ef Stefan Institute, Ljubljana, Department of Knowledge Technologies, Jo?ef Stefan Institute, Ljubljana, Slovenia, , Slovenia; Institute of Computer Science, Polish Academy of Sciences, Warsaw, Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland, , , Poland; Institute of Computer Science, Polish Academy of Sciences, Warsaw, Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland, , , Poland; Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, and Sofia University "St. Kl. Ohridski", Sofia, Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, and Sofia University "St. Kl. Ohridski", Sofia, Bulgaria, , , , Bulgaria; Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, and Sofia University "St. Kl. Ohridski", Sofia, Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, and Sofia University "St. Kl. Ohridski", Sofia, Bulgaria, , , , Bulgaria; Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, and Sofia University "St. Kl. Ohridski", Sofia, Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, and Sofia University "St. Kl. Ohridski", Sofia, Bulgaria, , , , Bulgaria; Department of Knowledge Technologies, Jo?ef Stefan Institute and Faculty of Computer Science and Informatics, University of Ljubljana, Ljubljana, Department of Knowledge Technologies, Jo?ef Stefan Institute and Faculty of Computer Science and Informatics, University of Ljubljana, Ljubljana, Slovenia, , , Slovenia; Department of Knowledge Technologies, Jo?ef Stefan Institute and Faculty of Computer Science and Informatics, University of Ljubljana, Ljubljana, Department of Knowledge Technologies, Jo?ef Stefan Institute and Faculty of Computer Science and Informatics, University of Ljubljana, Ljubljana, Slovenia, , , Slovenia; Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Sofia, Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Sofia, Bulgaria, , , Bulgaria; Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Sofia, Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Sofia, Bulgaria, , , Bulgaria; Institute for Contemporay History, Ljubljana, Institute for Contemporay History, Ljubljana, Slovenia, , Slovenia; Institute of Computer Science, Polish Academy of Sciences, Warsaw, Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland, , , Poland; Institute of Computer Science, Polish Academy of Sciences, Warsaw, Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland, , , Poland; Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University, Prague, Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic, , Czech Republic; The Árni Magnússon Institute for Icelandic Studies, Reykjavík, The Árni Magnússon Institute for Icelandic Studies, Reykjavík, Iceland, , Iceland; University of Tübingen, Tübingen, University of Tübingen, Tübingen, Germany, , Germany; Dutch Language Institute, Hague, Dutch Language Institute, Hague, The Netherlands, , Netherlands; Institute of Legal Informatics and Judicial Systems CNR-IGSG, Florence, Institute of Legal Informatics and Judicial Systems CNR-IGSG, Florence, Italy, , Italy; Institute of Computational Linguistics CNR-ILC, Pis, Institute of Computational Linguistics CNR-ILC, Pis, Italy, , Italy; Universitat Jaume I, Castellón de la Plana, Universitat Jaume I, Castellón de la Plana, Spain, , Spain; Univ. Federal de Minas Gerais, Belo Horizonte, Univ. Federal de Minas Gerais, Belo Horizonte, Brazil, , Brazil; University of Copenhagen, Copenhagen, University of Copenhagen, Copenhagen, Denmark, , Denmark; Univ. Paul Valéry Montpellier 3, Montpellier, Univ. Paul Valéry Montpellier 3, Montpellier, France, , France; Lancaster University, Lancaster, Lancaster University, Lancaster, UK, , United Kingdom; Kaunas University of Technology, Kaunas, Kaunas University of Technology, Kaunas, Lithuania, , Lithuania; Vytautas Magnus University, Kaunas, Vytautas Magnus University, Kaunas, Lithuania, , Lithuania; University of Latvia, Riga, University of Latvia, Riga, Latvia, , Latvia; Centre for Social Sciences, Budapest, Centre for Social Sciences, Budapest, Hungary, , Hungary; Universiteit van Amsterdam, Amsterdam, Universiteit van Amsterdam, Amsterdam, The Netherlands, , Netherlands; Arts Faculty, University of Ljubljana, and Institute of Contemporary History, Ljubljana, Arts Faculty, University of Ljubljana, and Institute of Contemporary History, Ljubljana, Slovenia, , Slovenia

This paper presents the ParlaMint corpora containing transcriptions of the sessions of the 17 European national parliaments with half a billion words. The corpora are uniformly encoded, contain rich meta-data about 11 thousand speakers, and are linguistically annotated following the Universal Dependencies formalism and with named entities. Samples of the corpora and conversion scripts are available from the project's GitHub repository, and the complete corpora are openly available via the CLARIN.SI repository for download, as well as through the NoSketch Engine and KonText concordancers and the Parlameter interface for on-line exploration and analysis.

Language resources and evaluation (Print) , pp. 1–34

Keywords

Parlamentary proceedings, Linguistic annotation, Universal Dependencies

CNR authors

Agnoloni Tommaso, Venturi Giulia

CNR institutes

ILC – Istituto di linguistica computazionale "Antonio Zampolli", IGSG – Istituto di Informatica Giuridica e Sistemi Giudiziari

ID: 470080

Year: 2023

Type: Articolo in rivista

Creation: 2022-08-23 16:30:38.000

Last update: 2023-07-08 17:42:56.000

External IDs

CNR OAI-PMH: oai:it.cnr:prodotti:470080

DOI: 10.1007/s10579-021-09574-0

Scopus: 2-s2.0-85124105199