CNR ExploRA

Research products Authors Stats

RESULTS FROM 1 TO 20 OF 116

2023, Contributo in atti di convegno, ENG

Fighting Misinformation, Radicalization and Bias in Social Media

Erica Coppolillo, Carmela Comito, Marco Minici, Ettore Ritacco, Gianluigi Folino, Francesco Sergio Pisani, Massimo Guarascio, Giuseppe Manco

Social media have become the ideal place for black hats and malicious individuals to target susceptible users through different attack vectors and then manipulate their opinions and interests. Fake news, radicalization, and pushing bias into the data represent some popular ways noxious users adopt to perpetrate their criminal intents. In this evolving scenario, Artificial Intelligence techniques represent a valuable tool to early detect and mitigate the risk due to the spreading of these emerging attacks. In this work, we describe the Machine Learning based solutions developed to address the problems mentioned above and our current research.

Workshop AI per Cybersecurity, congiuntamente con il Terzo Convegno Nazionale CINI sull'Intelligenza Artificiale (Ital-IA23), Pisa, 29-30/05/2023

2023, Articolo in rivista, ENG

Learning ensembles of deep neural networks for extreme rainfall event detection

Folino, Gianluigi; Guarascio, Massimo; Chiaravalloti, Francesco

Accurate rainfall estimation is crucial to adequately assess the risk associated with extreme events capable of triggering floods and landslides. Data gathered from Rain Gauges (RGs), sensors devoted to measuring the intensity of the rain at individual points, are commonly used to feed interpolation methods (e.g., the Kriging geostatistical approach) and estimate the precipitation field over an area of interest. However, the information provided by RGs could be insufficient to model complex phenomena, and computationally expensive interpolation methods could not be used in real-time environments. Integrating additional data sources (e.g., radar and geostationary satellites) is an effective solution for improving the quality of the estimate, but it needs to cope with Big Data issues. To overcome all these issues, we propose a Rainfall Estimation Model (REM) based on an Ensemble of Deep Neural Networks (DeepEns-REM) that can automatically fuse heterogeneous data sources. The usage of Residual Blocks in the base models and the adoption of a Snapshot procedure to build the ensemble guarantees a fast convergence and scalability. Experimental results, conducted on a real dataset concerning a southern region in Italy, demonstrate the quality of the proposal in comparison with the Kriging interpolation technique and other machine learning techniques, especially in the case of exceptional rainfall events.

Neural computing & applications (Print) 35, pp. 10347–10360

DOI: 10.1007/s00521-023-08238-0

2022, Articolo in rivista, ENG

A Scalable Ensemble-based Framework to Analyse Users' Digital Footprints for Cybersecurity

Gianluigi Folino and Carla Otranto Godano and Francesco Sergio Pisani

In the field of cybersecurity, it is of great interest to analyse user logs in order to prevent data breach issues caused by user behaviour (human factor). A scalable framework based on the Elastic Stack (ELK) to process and store log data coming from digital footprints of different users and from applications is proposed. The system exploits the scalable architecture of ELK by running on top of a Kubernetes platform, and adopts ensemble-based machine learning algorithms to classify user behaviour and to eventually detect anomalies in behaviour.

ERCIM news 2022 (129)–0

2022, Contributo in atti di convegno, ENG

A Scalable Architecture Exploiting Elastic Stack and Meta Ensemble of Classifiers for Profiling User Behaviour

Gianluigi Folino and Carla Otranto Godano and Francesco Sergio Pisani

Large user and application logs are generated and stored by many organisations at a rate that makes it really hard to analyse, especially in real-time. In particular, in the field of cybersecurity, it is of great interest to analyse fast user logs, coming from different and heterogeneous sources, in order to prevent data breach issues caused by user behaviour. In addition to these problems, often part of the data or some entire sources are missing. To overcome these issues, we propose a framework based on the Elastic Stack (ELK) to process and store log data coming from different users and applications to generate an ensemble of classifiers, in order to classify the user behaviour, and eventually to detect anomalies. The system exploits the scalable architecture of ELK by running on top of a Kubernetes platform and adopts a distributed evolutionary algorithm for classifying the users, on the basis of their digital footprints, derived by many sources of data. Preliminary experiments show that the system is effective in classifying the behaviour of the different users and that this can be considered as an auxiliary task for detecting anomalies in their behaviour, by helping to reduce the number of false alarms.

30th PDP 2022, 9-11/03/2022

DOI: 10.1109/PDP55904.2022.00037

2022, Contributo in atti di convegno, ENG

Combining Active Learning and Fast DNN Ensembles for Process Deviance Discovery

Francesco Folino, Gianluigi Folino, Massimo Guarascio, Luigi Pontieri

Detecting deviant traces in business process logs is a crucial task in modern organizations due to the detrimental effect of certain deviant behaviors (e.g., attacks, frauds, faults). Training a Deviance Detection Model (DDM)only over labeled traces with supervised learning methods unfits real-life contexts where a small fraction of the traces are labeled. Thus, we here propose an Active-Learning-based approach to discovering a deep DDM ensemble that exploits a temporal ensembling method to train and fuse multiple DDMs sharing the same DNN architecture, devised in a way ensuring rapid convergence in relatively few training epochs. Experts' supervision is required only on small numbers of unlabelled traces exhibiting high values of (epistemic) prediction uncertainty, estimated in an ensemble-driven fashion. Tests on real data confirmed the approach's effectiveness, even compared to the results obtained by state-of-the-art supervised methods in the ideal case where all the data are labeled.

International Symposium on Methodologies for Intelligent Systems (ISMIS 2022), Cosenza, Italy, 3-5 Ottobre 2022

2022, Articolo in rivista, ENG

Combining deep ensemble learning and explanation for intelligent ticket management

Zicari, P.; Folino, G.; Guarascio, M.; Pontieri, L.

Intelligent Ticket Management Systems, equipped with automated ticket classification tools, are an advanced solution for handling customer-support activities. Some recent approaches to ticket classification leverage Deep Learning (DL) methods, in place of traditional ones using standard Machine Learning and feature engineering techniques. However, two challenging objectives should be addressed when applying DL methods to real-life contexts: (i) curbing the risk of having an overfitting model that hinges on spurious ticket features, and (ii) trying to explain the ticket classifications returned by such black-box models. In this work, we propose a comprehensive ticket classification framework, which relies on training a novel kind of ensemble of deep classifiers, and on providing AI-based interpretation methods to help both the operator in recognizing misclassification errors and the analyst in improving and fine-tuning the model. Tests on real data confirmed the accuracy of the classifications returned by the framework, and the practical value of their associated explanations.

Expert systems with applications 206

DOI: 10.1016/j.eswa.2022.117815

2022, Articolo in rivista, ENG

Semi-Supervised Discovery of DNN-Based Outcome Predictors from Scarcely-Labeled Process Logs

Francesco Folino; Gianluigi Folino; Massimo Guarascio; Luigi Pontieri

Predicting the final outcome of an ongoing process instance is a key problem in many real-life contexts. This problem has been addressed mainly by discovering a prediction model by using traditional machine learning methods and, more recently, deep learning methods, exploiting the supervision coming from outcome-class labels associated with historical log traces. However, a supervised learning strategy is unsuitable for important application scenarios where the outcome labels are known only for a small fraction of log traces. In order to address these challenging scenarios, a semi-supervised learning approach is proposed here, which leverages a multi-target DNN model supporting both outcome prediction and the additional auxiliary task of next-activity prediction. The latter task helps the DNN model avoid spurious trace embeddings and overfitting behaviors. In extensive experimentation, this approach is shown to outperform both fully-supervised and semi-supervised discovery methods using similar DNN architectures across different real-life datasets and label-scarce settings.

Business & information systems engineering (Print) 64, pp. 729–749

DOI: 10.1007/s12599-022-00749-9

2022, Presentazione, ITA

Strumenti Intelligenti per Threat Detection e Response

Francesco Sergio Pisani, Silvia Biasotti, Nunziato Cassavia, Luca Caviglione, Gianluigi Folino, Massimo Guarascio, Giuseppe Manco, Marco Zuppelli

L'identificazione tempestiva di attacchi o software malevoli, la mitigazione del rischio, e la condivi- sione di informazioni per la "threat intelligence", rappresentano temi di estremo interesse in ambito accademico e aziendale. Ad oggi, non vi sono soluzioni in grado di affrontare tutte le problematiche legate all'utilizzo di standard non consolidati, di garantire la privacy e di integrare diversi strumenti di Machine Learning. Obiettivo del documento è presentare alcune delle tematiche di ricerca emer- genti di maggior interesse in questo ambito per poi focalizzarsi sulle soluzioni e metodologie definite dal gruppo di ricerca congiunto ICAR-IMATI.

Ital-IA: Secondo Convegno Nazionale CINI sull'Intelligenza Artificiale, Torino, online, 9-11/02/2022

2021, Contributo in atti di convegno, ENG

Discovering accurate deep learning based predictive models for automatic customer support ticket classification

Paolo Zicari ; Gianluigi Folino; Massimo Guarascio; Luigi Pontieri

Ticket Management Systems are widespread in disparate kinds of companies and organizations, as they represent a fundamental tool for handling customer requests and issues in an efficient and effective manner. In particular, accurately categorizing incoming tickets is a key task in real-life application settings (e.g., helpdesk/CRM systems and bug tracking systems), in order to improve ticket processing efficiency and effectiveness (e.g., in terms of customer satisfaction). In this work, we propose a comprehensive ticket-categorization analysis that relies on inducing and exploiting a heterogeneous ensemble of deep learning architectures, in addition to a range of functionalities for acquiring, integrating and pre-processing ticket-related information coming from different channels (e.g. mail, chat, web form, etc.). Experimental results conducted on the specific application scenario concerning the data of a publicly available ticket-mining dataset have proven the effectiveness of the framework in different ticket categorization tasks.

36th Annual ACM Symposium on Applied Computing (SAC '21), Virtual Event (Republic of Korea), March 22 - 26, 2021

DOI: 10.1145/3412841.3442109

2021, Articolo in rivista, ENG

SDCOR: Scalable density-based clustering for local outlier detection in massive-scale datasets

Sayyed-Ahmad Naghavi-Nozad and Maryam Amir Haeri and Gianluigi Folino

This paper presents a batch-wise density-based clustering approach for local outlier detection in massive-scale datasets. Unlike the well-known traditional algorithms, which assume that all the data is memory-resident, our proposed method is scalable and processes the input data chunk-by-chunk within the confines of a limited memory buffer. A temporary clustering model is built at the first phase; then, it is gradually updated by analyzing consecutive memory loads of points. Subsequently, at the end of scalable clustering, the approximate structure of the original clusters is obtained. Finally, by another scan of the entire dataset and using a suitable criterion, an outlying score is assigned to each object called SDCOR (Scalable Density-based Clustering Outlierness Ratio). Evaluations on real-life and synthetic datasets demonstrate that the proposed method has a low linear time complexity and is more effective and efficient compared to best-known conventional density-based methods, which need to load all data into the memory; and also, to some fast distance-based methods, which can perform on data resident in the disk.

Knowledge-based systems 228–107256

DOI: 10.1016/j.knosys.2021.107256

2021, Articolo in rivista, ENG

On learning effective ensembles of deep neural networks for intrusion detection

Francesco Folino and Gianluigi Folino and Massimo Guarascio and Francesco Sergio Pisani and Luigi Pontieri

Classification-oriented Machine Learning methods are a precious tool, in modern Intrusion Detection Systems (IDSs), for discriminating between suspected intrusion attacks and normal behaviors. Many recent proposals in this field leveraged Deep Neural Network (DNN) methods, capable of learning effective hierarchical data representations automatically. However, many of these solutions were validated on data featuring stationary distributions and/or large amounts of training examples. By contrast, in real IDS applications, different kinds of attack tend to occur over time, and only a small fraction of the data instances is labeled (usually with far fewer examples of attacks than of normal behavior). A novel ensemble-based Deep Learning framework is proposed here that tries to face the challenging issues above. Basically, the non-stationary nature of IDS log data is faced by maintaining an ensemble consisting of a number of specialized base DNN classifiers, trained on disjoint chunks of the data instances' stream, plus a combiner model (reasoning on both the base classifiers predictions and original instance features). In order to learn deep base classifiers effectively from small training samples, an ad-hoc shared DNN architecture is adopted, featuring a combination of dropout capabilities, skip- connections, along with a cost-sensitive loss (for dealing with unbalanced data). Tests results, conducted on two benchmark IDS datasets and involving several competitors, confirmed the effectiveness of our proposal (in terms of both classification accuracy and robustness to data scarcity), and allowed us to evaluate different ensemble combination schemes.

Information fusion (Print) 72, pp. 48–69

DOI: 10.1016/j.inffus.2021.02.007

2020, Articolo in rivista, ENG

A Machine Learning Approach for Rainfall Estimation Integrating Heterogeneous Data Sources

M. Guarascio and G. Folino and F. Chiaravalloti and S. Gabriele and A. Procopio and P. Sabatino

Providing an accurate rainfall estimate at individual points is a challenging problem in order to mitigate risks derived from severe rainfall events, such as floods and landslides. Dense networks of sensors, named rain gauges (RGs), are typically used to obtain direct measurements of precipitation intensity in these points. These measurements are usually interpolated by using spatial interpolation methods for estimating the precipitation field over the entire area of interest. However, these methods are computationally expensive, and to improve the estimation of the variable of interest in unknown points, it is necessary to integrate further information. To overcome these issues, this work proposes a machine learning-based methodology that exploits a classifier based on ensemble methods for rainfall estimation and is able to integrate information from different remote sensing measurements. The proposed approach supplies an accurate estimate of the rainfall where RGs are not available, permits the integration of heterogeneous data sources exploiting both the high quantitative precision of RGs and the spatial pattern recognition ensured by radars and satellites, and is computationally less expensive than the interpolation methods. Experimental results, conducted on real data concerning an Italian region, Calabria, show a significant improvement in comparison with Kriging with external drift (KED), a well-recognized method in the field of rainfall estimation, both in terms of the probability of detection (0.58 versus 0.48) and mean-square error (0.11 versus 0.15).

IEEE transactions on geoscience and remote sensing (Online) 60, pp. 1–11

DOI: 10.1109/TGRS.2020.3037776

2020, Articolo in rivista, ENG

Using Deep Learning and Data Integration for Accurate Rainfall Estimates

Gianluigi Folino and Massimo Guarascio and Francesco Chiaravalloti and Salvatore Gabriele

Accurate rainfall estimates are critical for areas presenting high hydrological risks. We have devised a general machine learning framework based on a deep learning architecture, which also integrates information derived from remote sensing measurements, such as weather radars and satellites. Experimental results conducted on real data from a southern region in Italy, provided by the Department of Civil Protection (DCP), show significant improvements compared to current state-of-the-art methods.

ERCIM news 2020 (122)

2020, Articolo in rivista, ENG

A GP-based ensemble classification framework for time-changing streams of intrusion detection data

Gianluigi Folino and Francesco Sergio Pisani and Luigi Pontieri

Intrusion detection tools have largely benefitted from the usage of supervised classification methods developed in the field of data mining. However, the data produced by modern system/network logs pose many problems, such as the streaming and non-stationary nature of such data, their volume and velocity, and the presence of imbalanced classes. Classifier ensembles look a valid solution for this scenario, owing to their flexibility and scalability. In particular, data-driven schemes for combining the predictions of multiple classifiers have been shown superior to traditional fixed aggregation criteria (e.g., predictions' averaging and weighted voting). In intrusion detection settings, however, such schemes must be devised in an efficient way, since (part of) the ensemble may need to be re-trained frequently. A novel ensemble-based framework is proposed here for the online intrusion detection, where the ensemble is updated through an incremental stream-oriented learning scheme, correspondingly to the detection of concept drifts. Differently from mainstream ensemble-based approaches in the field, our proposal relies on deriving, though an efficient genetic programming (GP) method, an expressive kind of combiner function defined in terms of (non-trainable) aggregation functions. This approach is supported by a system architecture, which integrates different kinds of functionalities, ranging from the drift detection, to the induction and replacement of base classifiers, up to the distributed computation of GP-based combiners. Experiments on both artificial and real-life datasets confirmed the validity of the approach

Soft computing (Berl., Print) 24 (23), pp. 17541–17560

DOI: 10.1007/s00500-020-05200-3

2020, Contributo in atti di convegno, ENG

A Multi-View Ensemble of Deep Models for the Detection of Deviant Process Instances

Francesco Folino, Gianluigi Folino, Massimo Guarascio, Luigi Pontieri

Mining deviances from expected behaviors in process logs is a relevant problem in modern organizations, owing to their negative impact in terms of monetary/reputation losses. Most proposals to deviance mining combine the ex- traction of behavioral features from log traces with the induction of standard classifiers. Difficulties in capturing the multi-faceted nature of deviances with a single pattern family led to explore the possibility to mix up heterogeneous data views, obtained each with a different pattern family. Unfortunately, combining many pattern families tends to produce sparse and redundant representations that likely lead to the discovery of poor deviance-oriented classifiers. Using a multi-view ensemble learning approach to combine alternative trace representations was recently proven effective for this induction task. On the other hand, Deep Learn- ing methods have been gaining momentum in prediction/classification tasks on process log data, owing to their flexibility and expressiveness. We here propose a novel multi-view ensemble-based framework for the discovery of deviance- oriented classifiers that profitably combines different single-view deep classifiers, sharing an ad hoc residual-like architecture (simulating fine-grain ensemble-like capabilities over each single data view). The approach, tested over real-life pro- cess log data, significantly improves previous solutions.

9th International Workshop on New Frontiers in Mining Complex Patterns (NFMCP@ECML-PKDD 2020), Virtual (due to COVID19 pandemic), 14/09/2020

2020, Contributo in atti di convegno, ENG

A p2p environment to validate ensemble-based approaches in the cybersecurity domain

Francesco Folino, Gianluigi Folino, Luigi Pontieri

The main techniques for preventing cybersecurity attacks are based on the analysis of application/system logs stored in a host or in another device and on the analysis of network traffic data. In order to improve the accuracy and the stability of many classical approaches, the ensemble paradigm is successfully used to combine different techniques. However, as these problems are hard to cope with and, usually, they have to analyze large and fast streams of data, different types of ensemble (and of base algorithms composing the ensemble) should have experimented and, in addition, distributed architecture should be employed to reduce the high-execution times necessary to run them. In order to handle all these issues, a p2p environment to validate ensemble- based approaches in the cybersecurity domain is proposed in this paper. Two case studies are analyzed by using this framework and the preliminary scalability results demonstrate that the approach is apt to this aim.

28th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, Västerås, Sweden, 11-13/03/2020

2019, Contributo in atti di convegno, ENG

A Deep Learning based architecture for rainfall estimation integrating heterogeneous data sources

Gianluigi Folino and Massimo Guarascio and Francesco Chiaravalloti and Salvatore Gabriele

Rain gauges are sensors providing direct measurement of precipitation intensity at individual point sites, and, usually, spatial interpolation methods are used to obtain an estimate of the precipitation field over the entire area of interest. Among them, Kriging with External Drift (KED) is a largely used and well-recognized method in this field. However, interpolation methods need to work with real-time data, and therefore can be hardly used in real-time scenarios. To overcome this issue, we propose a general machine learning framework, which can be trained offline, based on a deep learning architecture, also integrating information derived from remote sensing measurements such as weather radars and satellites. The framework allows to provide accurate estimations of the rainfall in the areas where no rain gauge data is available. Experimental results, conducted on real data concerning a southern region in Italy, provided by the Department of Civil Protection (DCP), show significant improvement in comparison with KED and other machine learning techniques.

International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14-19/07/2019

DOI: 10.1109/IJCNN.2019.8852229

2019, Contributo in atti di convegno, ENG

Learning Effective Neural Nets for Outcome Prediction from Partially Labelled Log Data

Francesco Folino, Gianluigi Folino, Massimo Guarascio, and Luigi Pontieri

The problem of inducing a model for forecasting the outcome of an ongoing process instance from historical log traces has attracted notable attention in the field of Process Mining. Approaches based on deep neural networks have become popular in this context, as a more effective alternative to previous feature- based outcome-prediction methods. However, these approaches rely on a pure supervised learning scheme, and unfit many real- life scenarios where the outcome of (fully unfolded) training traces must be provided by experts. Indeed, since in such a scenario only a small amount of labeled traces are usually given, there is a risk that an inaccurate or overfitting model is discovered. To overcome these issues, a novel outcome-discovery approach is proposed here, which leverages a fine-tuning strategy that learns general-enough trace representations from unlabelled log traces, which are then reused (and adapted) in the discovery of the outcome predictor. Results on real-life data confirmed that our proposal makes a more effective and robust solution for label- scarcity scenarios than current outcome-prediction methods

2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI), Portland, Oregon, USA, 4-6/11/2019

2019, Contributo in atti di convegno, ENG

Using Genetic Programming for Combining an Ensemble of Local and Global Outlier Algorithms to Detect New Attacks

Gianluigi Folino, Francesco Sergio Pisani, Luigi Pontieri, Pietro Sabatino, Maryam Amir Haeri Amirkabir

Modern intrusion detection systems must be able to discover new types of attacks in real-time. To this aim, automatic or semi-automatic techniques can be used; outlier detection algorithms are particularly apt to this task, as they can work in an unsupervised way. However, due to the different nature and behavior of the attacks, the performance of different outlier detection algorithms varies largely. In this ongoing work, we describe an approach aimed at understanding whether an ensemble of outlier algorithms can be used to detect effectively new types of attacks in intrusion detection systems. In particular, Genetic Programming (GP) is adopted to build the combining function of an ensemble of local and global outlier detection algorithms, which are used to detect different types of attack. Preliminary experiments, conducted on the well-known NSL-KDD dataset, are encouraging and confirm that, depending on the type of attacks, it would be better to use only local or only global detection algorithms and that the GP-based ensemble improves the performance in comparison with commonly used combining functions.

Genetic and Evolutionary Computation Conference (GECCO-2019), Prague, Czech Republic, July 13th-17th 2019

2019, Abstract in atti di convegno, ENG

A cybersecurity framework for classifying non stationary data streams exploiting genetic programming and ensemble learning

Gianluigi Folino, Francesco Sergio Pisani, Luigi Pontieri

Numerical Computations: Theory and Algorithms The 3rd International Conference, NUMTA 2019, Le Castella, Isola Capo Rizzuto (KR), Italy, June 15 - 21, 2019

RESULTS FROM 1 TO 20 OF 116

About CNR ExploRA

Resources

oapublications@cnr.it

Licensed under CC BY 4.0