RESULTS FROM 1 TO 10 OF 10

2016, Contributo in atti di convegno, ENG

Supermetric search with the four-point property

Connor R.; Vadicamo L.; Cardillo F.A.; Rabitti F.

Metric indexing research is concerned with the efficient evaluation of queries in metric spaces. In general, a large space of objects is arranged in such a way that, when a further object is presented as a query, those objects most similar to the query can be efficiently found. Most such mechanisms rely upon the triangle inequality property of the metric governing the space. The triangle inequality property is equivalent to a finite embedding property, which states that any three points of the space can be isometrically embedded in two-dimensional Euclidean space. In this paper, we examine a class of semimetric space which is finitely 4-embeddable in three-dimensional Euclidean space. In mathematics this property has been extensively studied and is generally known as the four-point property. All spaces with the four-point property are metric spaces, but they also have some stronger geometric guarantees. We coin the term supermetric space as, in terms of metric search, they are significantly more tractable. We show some stronger geometric guarantees deriving from the four-point property which can be used in indexing to great effect, and show results for two of the SISAP benchmark searches that are substantially better than any previously published.

Similarity Search and Applications. 9th International Conference, Tokyo, Japan, 24-26 October 2016

DOI: 10.1007/978-3-319-46759-7_4

2010, Contributo in atti di convegno, ENG

Scalability issues for self similarity join in distributed systems

Gennaro C.; Rabitti F.

Efficient processing of similarity joins is important for a large class of data analysis and data-mining applications. This primitive finds all pairs of records within a predefined distance threshold of each other. However, most of the existing approaches have been based on spatial join techniques designed primarily for data in a vector space. Treating data collections as metric objects brings a great advantage in generality, because a single metric technique can be applied to many specific search problems quite different in nature. In this paper, we concentrate our attention on a special form of join, the Self Similarity Join, which retrieves pairs from the same dataset. In particular, we consider the case in which the dataset is split into subsets that are searched for self similarity join independently (e.g, in a distributed computing environment). To this end, we formalize the abstract concept of epsilon-Cover, prove its correctness, and demonstrate its effectiveness by applying it to two real implementations on a real-life large dataset.

The 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), Pisa, 17-19 February 2010

DOI: 10.1109/PDP.2010.73

2009, Contributo in atti di convegno, ENG

Scalable similarity self join in a metric DHT system

Gennaro C.

Efficient processing of similarity joins is important for a large class of data analysis and data-mining applications. This primitive finds all pairs of records within a predefined distance threshold of each other. We present MCAN+, an extension of MCAN (a Content-Addressable Network for metric objects) to support similarity self join queries. The challenge of the proposed approach is to address the problem of the intrinsic quadratic complexity of similarity joins, with the aim of bounding the elaboration time, by involving an increasing number of computational nodes as the dataset size grows. To test the scalability of MCAN+, we used a real-life dataset of color features extracted from one million images of the Flickr photo sharing website.

17th Italian Symposium on Advanced Database Systems, Camogli, Genova, 21-24 June 2009

2009, Rapporto tecnico, ENG

A theoretical approach to the self similarity join in a distributed enviroment

Gennaro C.

Efficient processing of similarity joins is important for a large class of data analysis and data-mining applications. This primitive finds all pairs of records within a predefined distance threshold of each other. However, most of the existing approaches have been based on spatial join techniques designed primarily for data in a vector space. Treating data collections as metric objects brings a great advantage in generality, because a single metric technique can be applied to many specific search problems quite different in nature. In this paper, we concentrate our attention on a special form of join, the Self Similarity Join, which retrieves pairs from the same dataset. In particular, we consider the case in which the dataset is split into subsets that are searched for self similarity join independently (e.g, in a distributed computing environment). To this end, we formalize the abstract concept of epsilon-Cover, prove its correctness, and demonstrate its effectiveness by applying it to two real implementations on a real-life large dataset.

2008, Contributo in atti di convegno, ENG

A content-addressable network for similarity join in metric spaces

Gennaro C.

Similarity join is an interesting complement of the well-established similarity range and nearest neighbors search primitives in metric spaces. However, the quadratic computational complexity of similarity join prevents from applications on large data collections. We present MCAN+, an extension of MCAN (a Content-Addressable Network for metric objects) to support similarity self join queries. The challenge of the proposed approach is to address the problem of the intrinsic quadratic complexity of similarity joins, with the aim of limiting the elaboration time, by involving an increasing number of computational nodes as the dataset size grows. To test the scalability of MCAN+, we used a real-life dataset of color features extracted from one million images of the Flickr photo sharing website.

Third Interational ICST Conference on Scalable Information Systems. Infoscale'08, Vico Equense, Italy, 4-6 June 2008

2007, Articolo in rivista, ENG

The SAPIR Project: Executing A/V Complex Queries in Peer-to-Peer Systems

Gennaro C.; Perego R.; Rabitti F.

Searching for non-text data (eg, images) is mostly done by means of metadata annotations or by extracting the text close to the data. However, supporting real content-based audio-visual search, based on similarity search on features, is significantly more expensive than searching for text. Moreover, the search exhibits linear scalability with respect to the data set size. The European project SAPIR is currently addressing this problem.

ERCIM news 70, pp. 62–63

2007, Contributo in volume, ENG

A content-addressable network for similarity search in metric spaces

Falchi F.; Gennaro C.; Zezula P.

In this paper we present a scalable and distributed access structure for similarity search in metric spaces. The approach is based on the Content-addressable Network (CAN) paradigm, which provides a Distributed Hash Table (DHT) abstraction over a Cartesian space. We have extended the CAN structure to support storage and retrieval of generic metric space objects. We use pivots for projecting objects of the metric space in an N-dimensional vector space, and exploit the CAN organization for distributing the objects among the computing nodes of the structure. We obtain a Peer-to-Peer network, called the MCAN, which is able to search metric space objects by means of the similarity range queries. Experiments conducted on our prototype system confirm full scalability of the approach.

DOI: 10.1007/978-3-540-71661-7_9

2005, Presentazione, ENG

A content-addressable network for similarity search in metric spaces

Falchi F.; Gennaro C.; Zezula P.

In this paper we present a scalable and distributed access structure for similarity search in metric spaces. The approach is based on the Content-addressable Network (CAN) paradigm, which provides a Distributed Hash Table (DHT) abstraction over a Cartesian space. We have extended the CAN structure to support storage and retrieval of more generic metric space objects. We use pivots for projecting objects of the metric space in an N-dimensional vector space, and exploit the CAN organization for distributing the objects among computer nodes of the structure. We obtain a Peer-to-Peer network, called the MCAN, which is able to search metric space objects by means of the similarity range queries. Experiments conducted on our prototype system confirm full scalability of the approach.

International Workshop on Databases, Information Systems and Peer-to-Peer Computing, Trondheim, Norway, August 2005

2004, Contributo in atti di convegno, ENG

A P2P-based system for searching in metric spaces

Batko M.; Gennaro G.; Zezula F.

In this paper, we elaborate on a scalable and distributed similarity search structure, that is the problem, which has previously been studied only for single computers. Our structure is scalable in that it distributes the data over more and more independent peer computers. It has no hot spot {--} all peers use as precise addressing scheme as possible and they all incrementally learn from misaddressing. Updates are performed locally and a node splitting never requires sending multiple messages to many peers. Experiments conducted on a prototype system are also reported.

12th Convegno Nazionale su Sistemi Evoluti per Basi di Dati, Margherita di Pula, Cagliari, 21-23 June 2004

2004, Presentazione, ENG

Scalable Similarity Search in Metric Spaces

Batko M.; Gennaro C.; Savino P.; Zezula P.

Similarity search in metric spaces represents an important paradigm for content-based retrieval of many applications. Existing centralized search structures can speed-up retrieval, but they do not scale up to large volume of data because the response time is linearly increasing with the size of the searched file. The proposed GHT* index is a scalable and distributed structure. By exploiting parallelism in a dynamic network of computers, the GHT* achieves practically constant search time for similarity range queries in data-sets of arbitrary size. The amount of replicated routing information on each server increases logarithmically. At the same time, the potential for interquery parallelism is increasing with the growing data-sets because the relative number of servers utilized by individual queries is decreasing. All these properties are verified by experiments on a prototype system using real-life data-sets.

DELOS Workshop on Digital Library Architectures: Peer-to-Peer, Grid, and Service-Orientation, S. Margherita di Pula, Cagliari, Italy, 24-25 June 2004
InstituteSelected 0/2
    ISTI, Istituto di scienza e tecnologie dell'informazione "Alessandro Faedo" (10)
    ILC, Istituto di linguistica computazionale "Antonio Zampolli" (1)
AuthorSelected 0/7
    Gennaro Claudio (9)
    Falchi Fabrizio (2)
    Rabitti Fausto (2)
    Cardillo Franco Alberto (1)
    Perego Raffaele (1)
    Savino Pasquale (1)
    Vadicamo Lucia (1)
TypeSelected 0/5
    Contributo in atti di convegno (5)
    Presentazione (2)
    Articolo in rivista (1)
    Contributo in volume (1)
    Rapporto tecnico (1)
Research programSelected 0/3
    ICT.P08.010.002, Digital Libraries (6)
    IC.P02.004.003, Modelli teorici e computazionali di acquisizione lessicale in contesti mono- e multi-lingui (1)
    ICT.P08.010.001, Digital Libraries (1)
EU Funding ProgramSelected 0/0
No values ​​available
EU ProjectSelected 0/0
No values ​​available
YearSelected 0/7
    2004 (2)
    2007 (2)
    2009 (2)
    2005 (1)
    2008 (1)
    2010 (1)
    2016 (1)
LanguageSelected 0/1
    Inglese (10)
Keyword

Metric Space

RESULTS FROM 1 TO 10 OF 10