Contributo in atti di convegno, 2019, ENG, 10.1145/3289600.3290962

Fast dictionary-based compression for inverted indexes

Pibiri G. E.; Petri M.; Moffat A.

University of Pisa, Pisa, Italy; CNR-ISTI, Pisa, Italy; The University of Melbourne, Melbourne, Australia; The University of Melbourne, Melbourne, Australia

Dictionary-based compression schemes provide fast decoding operation, typically at the expense of reduced compression effectiveness compared to statistical or probability-based approaches. In this work, we apply dictionary-based techniques to the compression of inverted lists, showing that the high degree of regularity that these integer sequences exhibit is a good match for certain types of dictionary methods, and that an important new trade-off balance between compression effectiveness and compression efficiency can be achieved. Our observations are supported by experiments using the document-level inverted index data for two large text collections, and a wide range of other index compression implementations as reference points. Those experiments demonstrate that the gap between efficiency and effectiveness can be substantially narrowed.

International Conference on Web Search and Data Mining, pp. 6–14, 11/02/2019,15/02/2019

Keywords

Compression, Decoding, Efficiency, Inverted index

CNR authors

Pibiri Giulio Ermanno

CNR institutes

ISTI – Istituto di scienza e tecnologie dell'informazione "Alessandro Faedo"

ID: 402784

Year: 2019

Type: Contributo in atti di convegno

Creation: 2019-05-14 14:19:29.000

Last update: 2021-04-02 17:28:41.000

External IDs

CNR OAI-PMH: oai:it.cnr:prodotti:402784

DOI: 10.1145/3289600.3290962

Scopus: 2-s2.0-85061708232

ISI Web of Science (WOS): 000482120400006