2023, Contributo in atti di convegno, ENG
Naretto F.; Pellungrini R.; Rinzivillo S.; Fadda D.
Human mobility data play a crucial role in understand- ing mobility patterns and developing analytical services across various domains such as urban planning, transportation, and public health. How- ever, due to the sensitive nature of this data, accurately identifying pri- vacy risks is essential before deciding to release it to the public. Recent work has proposed the use of machine learning models for predicting privacy risk on raw mobility trajectories and the use of shap for risk explanation. However, applying shap to mobility data results in expla- nations that are of limited use both for privacy experts and end-users. In this work, we present a novel version of the Expert privacy risk predic- tion and explanation framework specifically tailored for human mobility data. We leverage state-of-the-art algorithms in time series classification, as Rocket and InceptionTime, to improve risk prediction while reduc- ing computation time. Additionally, we address two key issues with shap explanation on mobility data: first, we devise an entropy-based mask to efficiently compute shap values for privacy risk in mobility data; second, we develop a module for interactive analysis and visualization of shap values over a map, empowering users with an intuitive understanding of shap values and privacy risk.
2023, Contributo in atti di convegno, ENG
Marchiori Manerba M.
This research proposal is framed in the interdisciplinary exploration of the socio-cultural implications that AI exerts on individual and groups. The focus concerns contexts where models can amplify discriminations through algorithmic biases, e.g., in recommendation and ranking systems or abusive language detection classifiers, and the debiasing of their automated decisions to become beneficial and just for everyone. To address these issues, the main objective of the proposed research project is to develop a framework to perform fairness auditing and debiasing of both classifiers and datasets, starting with, but not limited to, abusive language detection, thus broadening the approach toward other NLP tasks. Ultimately, by questioning the effectiveness of adjusting and debiasing existing resources, the project aims at developing truly inclusive, fair, and explainable models by design.
2023, Contributo in atti di convegno, ENG
Giannotti F.; Guidotti R.; Monreale A.; Pappalardo L.; Pedreschi D.; Pellungrini R.; Pratesi F.; Rinzivillo S.; Ruggieri S.; Setzu M.; Deluca R.
This document summarizes the activities regarding the development of Responsible AI (Responsible Artificial Intelligence) conducted by the Knowledge Discovery and Data mining group (KDD-Lab), a joint research group of the Institute of Information Science and Technologies "Alessandro Faedo" (ISTI) of the National Research Council of Italy (CNR), the Department of Computer Science of the University of Pisa, and the Scuola Normale Superiore of Pisa.
2023, Contributo in atti di convegno, ENG
Rizzo M.; Veneri A.; Albarelli A.; Lucchese C.; Nobile M.; Conati C.
EXplainable Artificial Intelligence (XAI) is a vibrant research topic in the artificial intelligence community. It is raising growing interest across methods and domains, especially those involving high stake decision-making, such as the biomedical sector. Much has been written about the subject, yet XAI still lacks shared terminology and a framework capable of providing structural soundness to explanations. In our work, we address these issues by proposing a novel definition of explanation that synthesizes what can be found in the literature. We recognize that explanations are not atomic but the combination of evidence stemming from the model and its input-output mapping, and the human interpretation of this evidence. Furthermore, we fit explanations into the properties of faithfulness (i.e., the explanation is an accurate description of the model's inner workings and decision-making process) and plausibility (i.e., how much the explanation seems convincing to the user). Our theoretical framework simplifies how these properties are operationalized, and it provides new insights into common explanation methods that we analyze as case studies. We also discuss the impact that our framework could have in biomedicine, a very sensitive application domain where XAI can have a central role in generating trust.
2023, Contributo in atti di convegno, ENG
State L., Bringas Colmenarejo A., Beretta A., Ruggieri S., Turini F., Law S.
The Explanation Dialogues project is an expert focus study that aims to uncover expectations, reasoning, and rules of legal experts and practitioners towards explainable artificial intelligence (XAI). We examine legal perceptions and disputes that arise in a fictional scenario that resembles a daily life situation - a bank's use of an automated decision-making (ADM) system to decide on credit allocation to individuals. Through this simulation, the study aims to provide insights into the legal value and validity of explanations of ADMs, identify potential gaps and issues that may arise in the context of compliance with European legislation, and provide guidance on how to address these shortcomings.
2023, Contributo in atti di convegno, ENG
Manerba M.M.; Morini V.
Biases can arise and be introduced during each phase of a supervised learning pipeline, eventually leading to harm. Within the task of automatic abusive language detection, this matter becomes particularly severe since unintended bias towards sensitive topics such as gender, sexual orientation, or ethnicity can harm underrepresented groups. The role of the datasets used to train these models is crucial to address these challenges. In this contribution, we investigate whether explainability methods can expose racial dialect bias attested within a popular dataset for abusive language detection. Through preliminary experiments, we found that pure explainability techniques cannot effectively uncover biases within the dataset under analysis: the rooted stereotypes are often more implicit and complex to retrieve.
2023, Contributo in atti di convegno, ENG
Lucchese C.; Orlando S.; Perego R.; Veneri A.
Most accurate machine learning models unfortunately produce black-box predictions, for which it is impossible to grasp the internal logic that leads to a specific decision. Unfolding the logic of such black-box models is of increasing importance, especially when they are used in sensitive decision-making processes. In thisworkwe focus on forests of decision trees, which may include hundreds to thousands of decision trees to produce accurate predictions. Such complexity raises the need of developing explanations for the predictions generated by large forests.We propose a post hoc explanation method of large forests, named GAM-based Explanation of Forests (GEF), which builds a Generalized Additive Model (GAM) able to explain, both locally and globally, the impact on the predictions of a limited set of features and feature interactions.We evaluate GEF over both synthetic and real-world datasets and show that GEF can create a GAM model with high fidelity by analyzing the given forest only and without using any further information, not even the initial training dataset.
2022, Contributo in volume, ENG
Pratesi F.; Trasarti R.; Giannotti F.
This chapter analyses some of the ethical implications of recent developments in artificial intelligence (AI), data mining, machine learning and robotics. In particular, we start summarising the more consolidated issues and solutions related to privacy in data management systems, moving towards the novel concept of explainability. The chapter reviews the development of the right to privacy and the right to explanation, culminated in the General Data Protection Regulation. However, the new kinds of big data (such as internet logs or GPS tracking) require a different approach to managing privacy requirements. Several solutions have been developed and will be reviewed here. Our view is that generally data protection must be considered from the beginning as novel AI solutions are developing using the Privacy-by-Design paradigm. This involves a shift in perspective away from remedying problems to trying to prevent them, instead. We conclude by covering the main requirements necessary to achieve a trustworthy scenario, as advised also by the European Commission. A step in the direction towards Trustworthy AI was achieved in the Ethics Guidelines for Trustworthy Artificial Intelligence produced by an expert group for the European Commission. The key elements in these guidelines will reviewed in this chapter. To ensure European independence and leadership, we must invest wisely by bundling, connecting and opening our AI resources while also having in mind ethical priorities, such as transparency and fairness.
2022, Contributo in atti di convegno, ENG
Spinnato F.; Guidotti R.; Nanni M.; Maccagnola D.; Paciello G.; Farina A.B.
In Assicurazioni Generali, an automatic decision-making model is used to check real-time multivariate time series and alert if a car crash happened. In such a way, a Generali operator can call the customer to provide first assistance. The high sensitivity of the model used, combined with the fact that the model is not interpretable, might cause the operator to call customers even though a car crash did not happen but only due to a harsh deviation or the fact that the road is bumpy. Our goal is to tackle the problem of interpretability for car crash prediction and propose an eXplainable Artificial Intelligence (XAI) workflow that allows gaining insights regarding the logic behind the deep learning predictive model adopted by Generali. We reach our goal by building an interpretable alternative to the current obscure model that also reduces the training data usage and the prediction time.
2022, Contributo in atti di convegno, ENG
Lucchese C.; Nardini F.M.; Orlando S.; Perego R.; Veneri A.
Interpretable Learning to Rank (LtR) is an emerging field within the research area of explainable AI, aiming at developing intelligible and accurate predictive models. While most of the previous research efforts focus on creating post-hoc explanations, in this paper we investigate how to train effective and intrinsically-interpretable ranking models. Developing these models is particularly challenging and it also requires finding a trade-off between ranking quality and model complexity. State-of-the-art rankers, made of either large ensembles of trees or several neural layers, exploit in fact an unlimited number of feature interactions making them black boxes. Previous approaches on intrinsically-interpretable ranking models address this issue by avoiding interactions between features thus paying a significant performance drop with respect to full-complexity models. Conversely, ILMART, our novel and interpretable LtR solution based on LambdaMART, is able to train effective and intelligible models by exploiting a limited and controlled number of pairwise feature interactions. Exhaustive and reproducible experiments conducted on three publicly-available LtR datasets show that ILMART outperforms the current state-of-the-art solution for interpretable ranking of a large margin with a gain of nDCG of up to 8%.
2022, Rapporto di progetto (Project report), ENG
Virginia Morini, Lorenzo Bellomo, Paolo Ferragina, Dino Pedreschi, Giulio Rossetti
The tangible objective of this micro-project was to develop a massive dataset for European News with a political leaning labeling. This was needed to tackle the next step of the project, which was the one of building a bias-minimizing recommender system for European news. The dataset comprehends millions of European news, and it has been enriched with metadata coming from Eurotopics.net. Each entry in the dataset contains the maintext, title, detected topic, publishment date, language, and news source together with news source metadata. This metadata comprehends the political leaning of the news source and its country. We then built an article bias classifier, in the attempt of predicting the political label of single articles using the labels obtained through distant supervision. We then applied explainable AI to our classifier and concluded that the classifier is effectively predicting the news source, rather than the political leaning.
2021, Contributo in atti di convegno, ENG
Cinquini M.; Giannotti F.; Guidotti R.
Synthetic data generation has been widely adopted in software testing, data privacy, imbalanced learning, artificial intelligence explanation, etc. In all such contexts, it is important to generate plausible data samples. A common assumption of approaches widely used for data generation is the independence of the features. However, typically, the variables of a dataset de-pend on one another, and these dependencies are not considered in data generation leading to the creation of implausible records. The main problem is that dependencies among variables are typically unknown. In this paper, we design a synthetic dataset generator for tabular data that is able to discover nonlinear causalities among the variables and use them at generation time. State-of-the-art methods for nonlinear causal discovery are typically inefficient. We boost them by restricting the causal discovery among the features appearing in the frequent patterns efficiently retrieved by a pattern mining algorithm. To validate our proposal, we design a framework for generating synthetic datasets with known causalities. Wide experimentation on many synthetic datasets and real datasets with known causalities shows the effectiveness of the proposed method.
2021, Articolo in rivista, ENG
Iadarola G. (1); Martinelli F. (1); Mercaldo F. (1); Santone A. (2)
Mobile devices are pervading everyday activities of our life. Each day we store a plethora of sensitive and private information in smart devices such as smartphones or tablets, which are typically equipped with an always-on internet connection. These information are of interest for malicious writers that are developing more and more aggressive harmful code for stealing sensitive and private information from mobile devices. Considering the weaknesses exhibited from current antimalware signature-based detection, in this paper we propose a method relying on application representation in terms on images used to input an explainable deep learning model designed by authors for Android malware detection and family identification. Moreover, we show how the explainability can be considered from the analyst to assess different models. Experimental results demonstrated the effectiveness of the proposed method, obtaining an average accuracy ranging from 0.96 to 0.97; we evaluated 8446 Android samples belonging to six different malware families and one more family for trusted samples, by providing also interpretability about the predictions performed by the model.