CNR ExploRA

Research products Authors Stats

RESULTS FROM 1 TO 20 OF 27

2023, Articolo in rivista, ENG

Induced permutations for approximate metric search

Vadicamo L.; Amato G.; Gennaro C.

Permutation-based Indexing (PBI) approaches have been proven to be particularly effective for conducting large-scale approximate metric searching. These methods rely on the idea of transforming the original metric objects into permutation representations, which can be efficiently indexed using data structures such as inverted files. The standard conceptualization of permutation associated with a metric object involves only the use of object distances and their relative orders from a set of anchors called pivots. In this paper, we generalized this definition in order to enlarge the class of permutation representations that can be used by PBI approaches. In particular, we introduced the concept of permutation induced by a space transformation and a sorting function, and we investigated which properties these transformations should possess to produce permutations that are effective for metric search. Furthermore, as a practical outcome, we defined a new type of permutation representation that is calculated using distances from pairs of pivots. This proposed technique allowed us to produce longer permutations than traditional ones for the same number of object-pivot distance calculations. The advantage lies in the fact that when longer permutations are employed, the use of inverted files built on permutation prefixes leads to greater efficiency in the search phase.

Information systems (Oxf.) 119

DOI: 10.1016/j.is.2023.102286

2021, Contributo in atti di convegno, ENG

On generalizing permutation-based representations for approximate search

Vadicamo L.; Gennaro C.; Amato G.

In the domain of approximate metric search, the Permutation-based Indexing (PBI) approaches have been proved to be particularly suitable for dealing with large data collections. These methods employ a permutation-based representation of the data, which can be efficiently indexed using data structures such as inverted files. In the literature, the definition of the permutation of a metric object was derived by reordering the distances of the object to a set of pivots. In this paper, we aim at generalizing this definition in order to enlarge the class of permutations that can be used by PBI approaches. As a practical outcome, we defined a new type of permutation that is calculated using distances from pairs of pivots. The proposed technique permits us to produce longer permutations than traditional ones for the same number of object-pivot distance calculations. The advantage is that the use of inverted files built on permutation prefixes leads to greater efficiency in the search phase when longer permutations are used.

SISAP 2021 - 14th International Conference on Similarity Search and Applications, Dortmund, Germany, 29/09/2021 - 1/10/2021

DOI: 10.1007/978-3-030-89657-7_6

2019, Articolo in rivista, ENG

Supermetric search

Connor R.; Vadicamo L.; Cardillo F.A.; Rabitti F.

Metric search is concerned with the efficient evaluation of queries in metric spaces. In general, a large space of objects is arranged in such a way that, when a further object is presented as a query, those objects most similar to the query can be efficiently found. Most mechanisms rely upon the triangle inequality property of the metric governing the space. The triangle inequality property is equivalent to a finite embedding property, which states that any three points of the space can be isometrically embedded in two-dimensional Euclidean space. In this paper, we examine a class of semimetric space which is finitely four-embeddable in three-dimensional Euclidean space. In mathematics this property has been extensively studied and is generally known as the four-point property. All spaces with the four-point property are metric spaces, but they also have some stronger geometric guarantees. We coin the term supermetric(1) space as, in terms of metric search, they are significantly more tractable. Supermetric spaces include all those governed by Euclidean, Cosine,(2) Jensen-Shannon and Triangular distances, and are thus commonly used within many domains. In previous work we have given a generic mathematical basis for the supermetric property and shown how it can improve indexing performance for a given exact search structure. Here we present a full investigation into its use within a variety of different hyperplane partition indexing structures, and go on to show some more of its flexibility by examining a search structure whose partition and exclusion conditions are tailored, at each node, to suit the individual reference points and data set present there. Among the results given, we show a new best performance for exact search using a well-known benchmark. (C) 2018 Elsevier Ltd. All rights reserved.

Information systems (Oxf.) 80, pp. 108–123

DOI: 10.1016/j.is.2018.01.002

2017, Articolo in rivista, ENG

Hilbert exclusion: improved metric search through finite isometric embeddings

Connor R.; Cardillo F. A.; Vadicamo L.; Rabitti F.

Most research into similarity search in metric spaces relies on the triangle inequality property. This property allows the space to be arranged according to relative distances to avoid searching some subspaces. We show that many common metric spaces, notably including those using Euclidean and Jensen-Shannon distances, also have a stronger property, sometimes called the four-point property: In essence, these spaces allow an isometric embedding of any four points in three-dimensional Euclidean space, as well as any three points in two-dimensional Euclidean space. In fact, we show that any space that is isometrically embeddable in Hilbert space has the stronger property. This property gives stronger geometric guarantees, and one in particular, which we name the Hilbert Exclusion property, allows any indexing mechanism which uses hyperplane partitioning to perform better. One outcome of this observation is that a number of state-of-the-art indexing mechanisms over high-dimensional spaces can be easily refined to give a significant increase in performance; furthermore, the improvement given is greater in higher dimensions. This therefore leads to a significant improvement in the cost of metric search in these spaces.

ACM transactions on information systems 35 (3), pp. 17–27

DOI: 10.1145/3001583

2015, Articolo in rivista, ENG

A comparison of pivot selection techniques for permutation-based indexing

Amato G.; Esuli A.; Falchi F.

Abstract Recently, permutation based indexes have attracted interest in the area of similarity search. The basic idea of permutation based indexes is that data objects are represented as appropriately generated permutations of a set of pivots (or reference objects). Similarity queries are executed by searching for data objects whose permutation representation is similar to that of the query, following the assumption that similar objects are represented by similar permutations of the pivots. In the context of permutation-based indexing, most authors propose to select pivots randomly from the data set, given that traditional pivot selection techniques do not reveal better performance. However, to the best of our knowledge, no rigorous comparison has been performed yet. In this paper we compare five pivot selection techniques on three permutation-based similarity access methods. Among those, we propose a novel technique specifically designed for permutations. Two significant observations emerge from our tests. First, random selection is always outperformed by at least one of the tested techniques. Second, there is no technique that is universally the best for all permutation-based access methods; rather different techniques are optimal for different methods. This indicates that the pivot selection technique should be considered as an integrating and relevant part of any permutation-based access method.

Information systems (Oxf.) 52, pp. 176–188

DOI: 10.1016/j.is.2015.01.010

2013, Contributo in atti di convegno, ENG

Pivot selection strategies for permutation-based similarity search

Amato G., Esuli A., Falchi F.

Recently, permutation based indexes have attracted interest in the area of similarity search. The basic idea of permutation based indexes is that data objects are represented as appropriately generated permutations of a set of pivots (or reference objects). Similarity queries are executed by searching for data objects whose permutation representation is similar to that of the query. This, of course assumes that similar objects are represented by similar permutations of the pivots. In the context of permutation-based indexing, most authors propose to select pivots randomly from the data set, given that traditional pivot selection strategies do not reveal better performance. However, to the best of our knowledge, no rigorous comparison has been performed yet. In this paper we compare five pivots selection strategies on three permutation-based similarity access methods. Among those, we propose a novel strategy specifically designed for permutations. Two significant observations emerge from our tests. First, random selection is always outperformed by at least one of the tested strategies. Second, there is not a strategy that is universally the best for all permutation-based access methods; rather different strategies are optimal for different methods.

SISAP 2013 - Similarity Search and Applications. 6th International Conference, A Coruña, Spain, 2-4 October 2013

DOI: 10.1007/978-3-642-41062-8_10

2012, Software, ENG

Metric inverted file.

Amato G.

MI-File (Metric Inverted File) allows you to perform approximate similarity search on huge datasets. As an example give a look at the Image Similarity Search Engine, which allows you searching in a dataset of more than 100 millions images, that was built using the MI-File library. The technique is based on the use of a space transformation where data objects are represented by ordered sequences of reference objects. The sequence of reference objects that represent a data object is ordered according to the distance of the reference objects from the data object being represented. Distance between two data objects is measured by computing the spearmann footrule distance between the two sequence of reference objects that represent them. The closer the two data objects the most similar the two sequence of reference objects. The index is based on the use of inverted files.

2011, Rapporto di progetto (Project report), ENG

Sviluppo componente per la ricerca efficiente e scalabile di immagini

Falchi F.; Amato G.; Bolettieri P.; Gennaro C.

In this document, the developing of the software component for approximate image matching is reported. This document is an output of the Activity A4.3 "Indici efficienti per il matching approssimato e la classificazione delle immagini" of the VISITO Tuscany project. Planned from month 8 to month 17, the activity started and ended as planned. The output of Activity 4.3 is this document and the software component made available inside on the VISITO Tuscany project wiki. The software component is optimized for working on images related to cultural heritage objects.

2010, Rapporto di progetto (Project report), ITA

VISITO Tuscany - Rapporto con la specifica dettagliata delle funzionalità della piattaforma VISITO Tuscany

Bolettieri P.; Benedetti L.; La Torre F.; Loschiavo D.; Lucchese C.; Lungarotti F.; Salvadori S.; Scopigno R.; Venturini R.

Questo documento è il rapporto con la specifica dettagliata delle funzionalità svolta all'interno del Progetto VISITO Tuscany a partire dal mese 4 fino al mese 5 nell'ambito dell'Obbiettivo Operativo 2, Attività A2.1 "Rapporto con la specifica dettagliata delle funzionalità della piattaforma Visito Tuscany". Il rapporto descriverà dettagliatamente le funzionalità offerte dal sistema, ponendo particolare attenzione nell'identificare funzionalità, di notevole importanza per i potenziali utenti, che non sono tuttora fornite da altri sistemi e che potranno essere realizzate capitalizzando sulla sinergia dei membri del consorzio del progetto e sulle loro attività pregresse.

2010, Rapporto di progetto (Project report), ITA

G2 - Piano di controllo della qualità

Amato G.

Lo scopo di questo deliverable è quello di stabilire le procedure e metodologie da adottare durante l'esecuzione del progetto VISITO Tuscany. L'obbiettivo è quello di garantire un'efficace ed efficiente gestione del progetto, il raggiungimento degli obbiettivi , ed assicurare il consenso sulle attività da parte dei membri del progetto. Il PQ definisce un insieme di regole per la gestione del lavoro cooperativo nel progetto, includendo anche le procedure da adottare, i meccanismi di reportistica, l'organizzazione degli incontri, il controllo del flusso di informazione, l'affidabilità dei risultati, la preparazione della documentazione per la sottomissione alla Regione Toscana (RT).

2010, Rapporto di progetto (Project report), ITA

VISITO - G3.1 Relazione avanzamento progetto VISITO Tuscany

Amato G.; Falchi F.; Bolettieri P.; Lucchese C.; Scopigno R.; La Torre F.; Minelli S.; Tavanti F.; Scartoni R.; Salvadori S.; Zanetti N.; Loschiavo D.

Questo documento descrive i risultati ottenuti dal progetto VISITO-Tuscany nei primi otto mesi di lavoro. Dopo una breve panoramica degli obbiettivi generali del progetto, si evidenzieranno i risultati previsti alla fine dell'ottavo mese e si esporrà quale è stato il lavoro effettivamente sostenuto e i risultati raggiunti, in maniera dettagliata per le varie attività previste.

2010, Articolo in rivista, ENG

Building a web-scale image similarity search system

Batko M.; Falchi F.; Lucchese C.; Novak D.; Perego R.; Rabitti F.; Sedmidubsky J.; Zezula P.

As the number of digital images is growing fast and Content-based Image Retrieval (CBIR) is gaining in popularity, CBIR systems should leap towards Web- scale datasets. In this paper, we report on our experience in building an experimental similarity search system on a test collection of more than 50 million images. The first big challenge we have been facing was obtaining a collection of images of this scale with the corresponding descriptive features. We have tackled the non-trivial process of image crawling and extraction of several MPEG-7 descriptors. The result of this effort is a test collection, the first of such scale, opened to the research community for experiments and comparisons. The second challenge was to develop indexing and searching mechanisms able to scale to the target size and to answer similarity queries in real-time. We have achieved this goal by creating sophisticated centralized and distributed structures based purely on the metric space model of data. We have joined them together which has resulted in an extremely flexible and scalable solution. In this paper, we study in detail the performance of this technology and its evolvement as the data volume grows by three orders of magnitude. The results of the experiments are very encouraging and promising for future applications.

Multimedia tools and applications 47 (3), pp. 599–629

DOI: 10.1007/s11042-009-0339-z

2009, Contributo in atti di convegno, ENG

Caching content-based queries for robust and efficient image retrieval

Falchi F.; Lucchese C.; Orlando S.; Perego R.; Rabitti F.

In order to become an effective complement to traditional Web-scale text-based image retrieval solutions, content-based image retrieval must address scalability and effciency issues. In this paper we investigate the possibility of caching the answers to content-based image retrieval queries in metric space, with the aim of reducing the average cost of query processing, and boosting the overall system throughput. Our proposal exploits the similarity between the query object and the cache content, and allows the cache to return approximate answers with acceptable quality guarantee even if the query processed has never been encountered in the past. Moreover, since popular images that are likely to be used as query have several near-duplicate versions, we show that our caching algorithm is robust, and does not suffer of cache pollution problems due to near-duplicate query objects. We report on very promising results obtained with a collection of one million high-quality digital photos. We show that it is worth pursuing caching strategies also in similarity search systems, since the proposed caching techniques can have a signicant impact on performance, like caching on text queries has been proven effective for traditional Web search engines.

12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT'09, Saint-Petersburg, Russia, 24-26 Marzo 2009

DOI: 10.1145/1516360.1516450

2008, Contributo in atti di convegno, ENG

Crawling, indexing, and similarity searching images on the Web

Batko M.; Falchi F.; Lucchese C.; Novak D.; Perego R.; Rabitti F.; Sedmidubsky J.; Zezula P.

In this paper, we report on our experience in building an experimental similarity search system on a test collection of more than 50 million images, to show the possibility to scale Content-based Image Retrieval (CBIR) systems towards the Web size. First, we had to tackle the non-trivial process of image crawling and descriptive feature extraction, performed by using the European EGEE computer GRID, building a test collection, the first of such scale, that will be opened to the research community for experiments and comparisons. Then, we had to develop indexing and searching mechanisms which can scale up to these volumes and answer similarity queries in real-time. The results of our experiments are very encouraging for future applications.

Sixteenth Italian Symposium on Advanced Database Systems, Mondello (PA), Italy, 22-25 June 2008

2008, Articolo in rivista, ENG

Efficient video-stream filtering

Falchi F.; Gennaro C.; Savino P.; Stanchev P.

A new approach for video-stream filtering that makes use of the features representing video content and exploits the properties of metric spaces can help reduce the filtering receiver's computational load.

IEEE multimedia 15 (1), pp. 52–62

DOI: 10.1109/MMUL.2008.6

2008, Articolo in rivista, ENG

Erratum to "Nearest neighbor search in metric spaces through Content-Addressable Networks" [Information Processing and Management 43 (2007) 665-683]

Falchi F.; Gennaro C.; Zezula P.

Most of the peer-to-peer search techniques proposed in the recent years have focused on the single-key retrieval. However, similarity search in metric spaces represents an important paradigm for content-based retrieval in many applications. In this paper we introduce an extension of the well-known Content-Addressable Network paradigm to support storage and retrieval of more generic metric space objects. In particular we address the problem of executing the nearest neighbors queries, and propose three different algorithms of query propagation. An extensive experimental study on real-life data sets explores the performance characteristics of the proposed algorithms by showing their advantages and disadvantages.

Information processing & management 44 (1), pp. 411–429

DOI: 10.1016/j.ipm.2007.03.002

2004, Presentazione, ENG

A Scalable Nearest Neighbor Search in P2P Systems

Batko M.; Gennaro C.; Zezula P.

Similarity search in metric spaces represents an important paradigm for content-based retrieval of many applications. Existing centralized search structures can speed-up retrieval, but they do not scale up to large volume of data because the response time is linearly increasing with the size of the searched file. In this article, we study the problem of executing the nearest neighbor(s) queries in a distributed metric structure, which is based on the P2P communication paradigm and the generalized hyperplane partitioning. By exploiting parallelism in a dynamic network of computers, the query execution scales up very well considering both the number of distance computations and the hop count between the peers. Results are verified by experiments on real-life data sets.

International Workshop on Databases, Information Systems and Peer-to-Peer Computing, Toronto, Canada, 29-30 August 2004

2004, Rapporto tecnico, ENG

Efficient Video Filtering of MPEG-7 Streams

Falchi F.; Gennaro C.; Savino P.

The production of digital multimedia content is continuously increasing. With the advent of the digital cable, satellite and terrestrial televisions, thousands of channels are practically available for users. Moreover, multimedia is widely used in many professional applications, such as video program production, video surveillance, and e-learning. Consequently, the use of techniques for video retrieval as well as video filtering is becoming of crucial importance. The adoption of the MPEG-7 standard is a significant step forward in simplifying the video retrieval and filtering. However, the performance issue can be relevant if the retrieval must be accomplished in real time, as in some applications such as the video surveillance or video filtering in general. Moreover, the number of streams to search, the number of queries, and the computational complexity of the feature similarity measure can heavily affect the effectiveness of such real time filtering applications. In this paper we present the Pivoted Stream, a novel approach for efficient filtering of a video stream by using the MPEG-7 descriptors. Our proposal exploits the properties of the metric spaces, in order to reduce the computational load of the filtering receiver.

2003, Contributo in atti di convegno, ENG

Access structures for advanced similarity search in metric spaces

Dohnal V.; Gennaro C.; Savino P.; Zezula P.

Similarity retrieval is an important paradigm for searching in environments where exact match has little meaning. Moreover, in order to enlarge the set of data types for which the similarity search can efficiently be performed, the notion of mathematical metric space provides a useful abstraction for similarity. In this paper we consider the problem of organizing and searching large data-sets from arbitrary metric spaces, and a novel access structure for similarity search in metric data, called D-Index, is discussed. D-Index combines a novel clustering technique and the pivot-based distance searching strategy to speed up execution of similarity range and nearest neighbor queries for large files with objects stored in disk memories. Moreover, we propose an extension of this access structure (eD-Index) which is able to deal with the problem of similarity self join. Though this approach is not able to eliminate the intrinsic quadratic complexity of similarity joins, significant performance improvements are confirmed by experiments.

11th Symposium on Advanced Database Systems, Cetraro, Italy, 24-27 June

2003, Presentazione, ENG

Scalable and Distributed Similarity Search in Metric Spaces

Batko M.; Gennaro C.; Zezula P.

In this paper we propose a new access structure, called GHT*, based on generalized hyperplane tree (GHT) and distributed dynamic hashing (DDH) techniques. GHT* is a distributed structure which allows to perform range search in a metric space according to a distance function d. The structure does not require a central directory and it is able to gracefully scale through splits of one bucket at a time.

Workshop on Distributed Data and Structures, n.5, Thessaloniki, Greece, 13-14 June 2003

RESULTS FROM 1 TO 20 OF 27

About CNR ExploRA

Resources

oapublications@cnr.it

Licensed under CC BY 4.0