RESULTS FROM 1 TO 14 OF 14

2022, Articolo in rivista, ENG

Distilled neural networks for efficient learning to rank

Nardini F.M.; Rulli C.; Trani S.; Venturini R.

Recent studies in Learning to Rank have shown the possibility to effectively distill a neural network from an ensemble of regression trees. This result leads neural networks to become a natural competitor of tree-based ensembles on the ranking task. Nevertheless, ensembles of regression trees outperform neural models both in terms of efficiency and effectiveness, particularly when scoring on CPU. In this paper, we propose an approach for speeding up neural scoring time by applying a combination of Distillation, Pruning and Fast Matrix multiplication. We employ knowledge distillation to learn shallow neural networks from an ensemble of regression trees. Then, we exploit an efficiency-oriented pruning technique that performs a sparsification of the most computationally-intensive layers of the neural network that is then scored with optimized sparse matrix multiplication. Moreover, by studying both dense and sparse high performance matrix multiplication, we develop a scoring time prediction model which helps in devising neural network architectures that match the desired efficiency requirements. Comprehensive experiments on two public learning-to-rank datasets show that neural networks produced with our novel approach are competitive at any point of the effectiveness-efficiency trade-off when compared with tree-based ensembles, providing up to 4x scoring time speed-up without affecting the ranking quality.

IEEE transactions on knowledge and data engineering (Online) 35 (5), pp. 4695–4712

DOI: 10.1109/TKDE.2022.3152585

2019, Contributo in volume, ENG

Retrieval of Educational Resources from the Web: A Comparison Between Google and Online Educational Repositories

De Medio C.; Limongelli C.; Marani A.; Taibi D.

The retrieval and composition of educational material are topics that attract many studies from the field of Information Retrieval and Artificial Intelligence. The Web is gradually gaining popularity among teachers and students as a source of learning resources. This transition is, however, facing skepticism from some scholars in the field of education. The main concern is about the quality and reliability of the teaching on the Web. While online educational repositories are explicitly built for educational purposes by competent teachers, web pages are designed and created for offering different services, not only education. In this study, we analyse if the Internet is a good source of teaching material compared to the currently available repositories in education. Using a collection of 50 queries related to educational topics, we compare how many useful learning resources a teacher can retrieve in Google and three popular learning object repositories. The results are very insightful and in favour of Google supported by the t-tests. For most of the queries, Google retrieves a larger number of useful web pages than the repositories ( p<.01 p<.01), and no queries resulted in zero useful items. Instead, the repositories struggle to find even one relevant material for many queries. This study is clear evidence that even though the repositories offer a richer description of the learning resources through metadata, it is time to undertake more research towards the retrieval of web pages for educational applications.

DOI: 10.1007/978-3-030-35758-0_3

2018, Articolo in rivista, ENG

Efficient query processing for scalable web search

Tonellotto N.; MacDonald C.; Ounis I.

Search engines are exceptionally important tools for accessing information in today's world. In satisfying the information needs of millions of users, the effectiveness (the quality of the search results) and the efficiency (the speed at which the results are returned to the users) of a search engine are two goals that form a natural trade-off, as techniques that improve the effectiveness of the search engine can also make it less efficient. Meanwhile, search engines continue to rapidly evolve, with larger indexes, more complex retrieval strategies and growing query volumes. Hence, there is a need for the development of efficient query processing infrastructures that make appropriate sacrifices in effectiveness in order to make gains in efficiency. This survey comprehensively reviews the foundations of search engines, from index layouts to basic term-at-a-time (TAAT) and document-at-a-time (DAAT) query processing strategies, while also providing the latest trends in the literature in efficient query processing, including the coherent and systematic reviews of techniques such as dynamic pruning and impact-sorted posting lists as well as their variants and optimisations. Our explanations of query processing strategies, for instance the WAND and BMW dynamic pruning algorithms, are presented with illustrative figures showing how the processing state changes as the algorithms progress. Moreover, acknowledging the recent trends in applying a cascading infrastructure within search systems, this survey describes techniques for efficiently integrating effective learned models, such as those obtained from learning-to-rank techniques. The survey also covers the selective application of query processing techniques, often achieved by predicting the response times of the search engine (known as query efficiency prediction), and making per-query tradeoffs between efficiency and effectiveness to ensure that the required retrieval speed targets can be met. Finally, the survey concludes with a summary of open directions in efficient search infrastructures, namely the use of signatures, real-time, energy-efficient and modern hardware and software architectures.

Foundations and trends in information retrieval 12, pp. 319–500

DOI: 10.1561/1500000057

2015, Contributo in atti di convegno, ENG

LSDS-IR'15: 2015 Workshop on large-scale and distributed systems for information retrieval

Altingovde I.S.; Barla Cambazoglu B.; Tonellotto N.

The growth of the Web and other Big Data sources lead to important performance problems for large-scale and distributed information retrieval systems. The scalability and efficiency of such information retrieval systems have an impact on their effectiveness, eventually affecting the experience of their users and monetization as well. The LSDS-IR'15 workshop will provide space for researchers to discuss the existing performance problems in the context of large-scale and distributed information retrieval systems and define new research directions in the modern Big Data era. The workshop expects to bring together information retrieval practitioners from the industry, as well as academic researchers concerned with any aspect of large-scale and distributed information retrieval systems.

24th ACM International Conference on Information and Knowledge Management, CIKM 2015, Melbourne, Australia, 19-23 October, 2015

DOI: 10.1145/2806416.2806877

2015, Contributo in atti di convegno, ENG

Fast and space-efficient entity linking in queries

Blanco R.; Ottaviano G.; Meij E.

Entity linking deals with identifying entities from a knowledge base in a given piece of text and has become a fundamental building block for web search engines, enabling numerous downstream improvements from better document ranking to enhanced search results pages. A key problem in the context of web search queries is that this process needs to run under severe time constraints as it has to be performed before any actual retrieval takes place, typically within milliseconds. In this paper we propose a probabilistic model that lever-ages user-generated information on the web to link queries to entities in a knowledge base. There are three key ingredi-ents that make the algorithm fast and space-effcient. First, the linking process ignores any dependencies between the different entity candidates, which allows for a O (k 2 ) imple-mentation in the number of query terms. Second, we leverage hashing and compression techniques to reduce the memory footprint. Finally, to equip the algorithm with contextual knowledge without sacrificing speed, we factor the distance between distributional semantics of the query words and entities into the model. We show that our solution significantly outperforms several state-of-the-art baselines by more than 14% while being able to process queries in sub-millisecond times|at least two orders of magnitude faster than existing systems.

WSDM'15 - Eighth ACM International Conference on Web Search and Data Mining, Shanghai, Popular Republic of China, 31 January - 6 February 2015

DOI: 10.1145/2684822.2685317

2015, Contributo in atti di convegno, ENG

Asia: An investigation platform for exploiting open source information in the fight against tax evasion

Bacciu C., Valsecchi F., Abrate M., Tesconi M., Marchetti A.

Tax evasion is a widespread phenomenon confirmed by numerous European and American reports. To contrast it, governments already adopt software solutions that support tax inspectors in their investigations. However, the currently existing systems do not normally take advantage of the constant stream of data published on the Web. Instead, the ASIA project aims to prove the effectiveness of combining this kind of open source information with official data contained in Public Administration archives to fight tax evasion. Our prototype platform deals with two cases of investigation, people and businesses. Public officers have been involved throughout the project, and took part in a preliminary test phase which showed very promising results.

International Conference on Web Information System and Technologies (WEBIST 2015), Lisbona, Portugal, 20-22/05/ 2015

2013, Contributo in atti di convegno, ENG

Modeling and predicting the task-by-task behavior of search engine users

Lucchese C.; Orlando S.; Perego R.; Tolomei G.; Silvestri F.

Web search engines answer user needs on a query-by-query fashion, namely they retrieve the set of the most relevant results to each issued query, independently. However, users often submit queries to perform multiple, related tasks. In this paper, we first discuss a methodology to discover from query logs the latent tasks performed by users. Furthermore, we introduce the Task Relation Graph (TRG) as a representation of users' search behaviors on a task-by-task perspective. The task-by-task behavior is captured by weighting the edges of TRG with a relatedness score computed between pairs of tasks, as mined from the query log. We validate our approach on a concrete application, namely a task recommender system, which suggests related tasks to users on the basis of the task predictions derived from the TRG. Finally, we show that the task recommendations generated by our solution are beyond the reach of existing query suggestion schemes, and that our method recommends tasks that user will likely perform in the near future.

OAIR 2013 - 10th Conference on Open Research Areas in Information Retrieval, Lisbon, Portugal, 15-17 May 2013

2013, Contributo in volume, ENG

Web search

Ferragina P., Venturini R.

Faced with the massive amount of information on the Web, which includes not only texts but nowadays any kind of file (audio, video, images, etc.), Web users tend to lose their way when browsing the Web, falling into what psychologists call "getting lost in hyperspace". Search engines alleviate this by presenting the most relevant pages that better match the user's information needs. Collecting a large part of the pages in the Web, extrapolating a user information need expressed by means of often ambiguous queries, establishing the importance of Web pages and their relevance for a query, are just a few examples of the difficult problems that search engines address every day to achieve their ambitious goal. In this chapter, we introduce the concepts and the algorithms that lie at the core of modern search engines by providing running examples that simplify understanding, and we comment on some recent and powerful tools and functionalities that should increase the ability of users to match in the Web their information needs.

DOI: 10.1007/978-3-642-39652-6_5

2012, Articolo in rivista, ENG

Ontology-based semantic search on the Web and its combination with the power of inductive reasoning

d'Amato, Claudia; Fanizzi, Nicola; Fazzinga, Bettina; Gottlob, Georg; Lukasiewicz, Thomas

Semantic Web search is currently one of the hottest research topics in both Web search and the Semantic Web. In previous work, we have presented a novel approach to Semantic Web search, which allows for evaluating ontology-based complex queries that involve reasoning over the Web relative to an underlying background ontology. We have developed the formal model behind this approach, and provided a technique for processing Semantic Web search queries, which consists of an offline ontological inference step and an online reduction to standard Web search. In this paper, we continue this line of research. We further enhance the above approach by the use of inductive rather than deductive reasoning in the offline inference step. This increases the robustness of Semantic Web search, as it adds the important ability to handle inconsistencies, noise, and incompleteness, which are all very likely to occur in distributed and heterogeneous environments such as the Web. The inductive variant also allows to infer new (not logically deducible) knowledge (from training individuals). We report on a prototype implementation of (both the deductive and) the inductive variant of our approach in desktop search, and we provide extensive new experimental results, especially on the running time and the precision and the recall of our new approach.

Annals of mathematics and artificial intelligence 65 (2-3), pp. 83–121

DOI: 10.1007/s10472-012-9309-7

2011, Articolo in rivista, ENG

Semantic Web search based on ontological conjunctive queries

Fazzinga, Bettina; Gianforme, Giorgio; Gottlob, Georg; Lukasiewicz, Thomas

Many experts predict that the next huge step forward in Web information technology will be achieved by adding semantics to Web data, and will possibly consist of (some form of) the Semantic Web. In this paper, we present a novel approach to Semantic Web search, called Serene, which allows for a semantic processing of Web search queries, and for evaluating complex Web search queries that involve reasoning over the Web. More specifically, we first add ontological structure and semantics to Web pages, which then allows for both attaching a meaning to Web search queries and Web pages, and for formulating and processing ontology-based complex Web search queries (i.e., conjunctive queries) that involve reasoning over the Web. Here, we assume the existence of an underlying ontology (in a lightweight ontology language) relative to which Web pages are annotated and Web search queries are formulated. Depending on whether we use a general or a specialized ontology, we thus obtain a general or a vertical Semantic Web search interface, respectively. That is, we are actually mapping the Web into an ontological knowledge base, which then allows for Semantic Web search relative to the underlying ontology. The latter is then realized by reduction to standard Web search on standard Web pages and logically completed ontological annotations. That is, standard Web search engines are used as the main inference motor for ontology-based Semantic Web search. We develop the formal model behind this approach and also provide an implementation in desktop search. Furthermore, we report on extensive experiments, including an implemented Semantic Web search on the Internet Movie Database. (C) 2011 Elsevier B. V. All rights reserved.

Journal of web semantics 9 (4), pp. 453–473

DOI: 10.1016/j.websem.2011.08.003

2010, Articolo in rivista, ENG

Semantic search on the Web

Fazzinga, Bettina; Lukasiewicz, Thomas

Web search is a key technology of the Web, since it is the primary way to access content on the Web. Current standard Web search is essentially based on a combination of textual keyword search with an importance ranking of the documents depending on the link structure of the Web. For this reason, it has many limitations, and there are a plethora of research activities towards more intelligent forms of search on the Web, called semantic search on the Web, or also Semantic Web search. In this paper, we give a brief overview of existing such approaches, including own ones, and sketch some possible future directions of research.

Semantic web (Print) 1 (1-2), pp. 89–96

DOI: 10.3233/SW-2010-0023

2008, Articolo in rivista, ENG

Design trade offs for search engine caching

Baeza Yates R.; Gionis A.; Junqueira F.; Murdock V.; Plachouras V.; Silvestri F.

In this article we study the trade-offs in designing ef?cient caching systems for Web search engines. We explore the impact of different approaches, such as static vs. dynamic caching, and caching query results vs. caching posting lists. Using a query log spanning a whole year, we explore the limitations of caching and we demonstrate that caching posting lists can achieve higher hit rates than caching query answers. We propose a new algorithm for static caching of posting lists, which outperforms previous methods. We also study the problem of ?nding the optimal way to split the static cache between answers and posting lists. Finally, we measure how the changes in the query log in?uence the effectiveness of static caching, given our observation that the distribution of the queries changes slowly over time. Our results and observations are applicable to different levels of the data-access hierarchy, for instance, for a memory/disk layer or a broker/remote server layer.

ACM transactions on the web 2 (4), pp. 20–28

DOI: 10.1145/1409220.1409223

2007, Contributo in atti di convegno, ENG

Mining query logs to optimize index partitioning in parallel Web search engines

Perego R.; Lucchese C.; Orlando S.; Silvestri F.

Large-scale Parallel Web Search Engines (WSEs) needs to adopt a strategy for partitioning the inverted index among a set of parallel server nodes. In this paper we are interested in devising an effective term-partitioning strategy, according to which the global vocabulary of terms and the associated inverted lists are split into disjoint subsets, and assigned to distinct servers. Due to the workload imbalance caused by the skewed distribution of terms in user queries, finding an effective partitioning strategy is considered a very complex task. In this paper we first formally introduce Term Partitioning as a new optimization problem. Then we show how the knowledge mined from past WSE query logs can be profitably used to discover good solutions of this problem. Finally, we report many results to show that we are able to effectively reduce both the average number of servers activated per each query, along with the workload imbalance. Experiments are conducted on large query logs of real WSEs.

INFOSCALE 2007, Suzhou, China, June 6-8 2007

2007, Contributo in atti di convegno, ENG

The impact of caching on search engines

Baeza-Yates R.; Gionis A.; Junqueira F.; Murdock V.; Plachouras V.; Silvestri F.

In this paper we study the trade-offs in designing efficient caching systems for Web search engines. We explore the impact of different approaches, such as static vs. dynamic caching, and caching query results vs. caching posting lists. Using a query log spanning a whole year we explore the limitations of caching and we demonstrate that caching posting lists can achieve higher hit rates than caching query answers. We propose a new algorithm for static caching of posting lists, which outperforms previous methods. We also study the problem of finding the optimal way to split the static cache between answers and posting lists. Finally, we measure how the changes in the query log affect the effectiveness of static caching, given our observation that the distribution of the queries changes slowly over time. Our results and observations are applicable to different levels of the data-access hierarchy, for instance, for a memory/disk layer or a broker/remote server layer.

30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, Netherland, 23-27 July 2007

DOI: 10.1145/1277741.1277775

InstituteSelected 0/2
    ISTI, Istituto di scienza e tecnologie dell'informazione "Alessandro Faedo" (9)
    IIT, Istituto di informatica e telematica (1)
AuthorSelected 0/15
    Fazzinga Bettina (3)
    Orlando Salvatore (2)
    Perego Raffaele (2)
    Silvestri Fabrizio (2)
    Abrate Matteo (1)
    Bacciu Clara (1)
    Lucchese Claudio (1)
    Marchetti Andrea (1)
    Nardini Franco Maria (1)
    Rulli Cosimo (1)
TypeSelected 0/3
    Articolo in rivista (6)
    Contributo in atti di convegno (6)
    Contributo in volume (2)
Research programSelected 0/3
    ICT.P09.006.001, Sistemi e algoritmi per Big Data (5)
    ICT.P09.004.001, Metodologie, algoritmi ed applicazioni per Grid di collaborazione (1)
    INT.P01.007.003, XML Technologies For Semantic Web Applications and Secure Workflows (1)
EU Funding ProgramSelected 0/1
    H2020 (1)
EU ProjectSelected 0/1
    BigDataGrapes (1)
YearSelected 0/10
    2015 (3)
    2007 (2)
    2013 (2)
    2008 (1)
    2010 (1)
    2011 (1)
    2012 (1)
    2018 (1)
    2019 (1)
    2022 (1)
LanguageSelected 0/1
    Inglese (14)
Keyword

Web search

RESULTS FROM 1 TO 14 OF 14