2022, Contributo in atti di convegno, ENG
Sara Narteni, Vanessa Orani, Enrico Ferrari, Damiano Verda, Enrico Cambiaso, Maurizio Mongelli
Data augmentation is a widespread innovative technique in Artificial Intelligence: it aims at creating new synthetic data given an existing real baseline, thus allowing to overcome the issues arising from the lack of labelled data for proper training of classification algorithms. Our paper focuses on how a common data augmentation methodology, the Generative Adversarial Networks (GANs), which is widespread for images and timeseries data, can be also applied to generate multivariate data. We propose a novel scheme for GANs evaluation, based on the performance of an explainable AI (XAI) algorithm and an innovative definition of rule similarity. In particular, we will consider an application dealing with the augmentation of Inertial Movement Units (IMU) data for physical fatigue monitoring in two age subgroups (under and over 40 years old) of the original data. We will show how our innovative rule similarity metric can drive the selection of the best fake dataset among a set of different candidates, corresponding to different GAN training runs.
2022, Contributo in atti di convegno, ENG
Degani, Luca; Bergadano, Francesco; Mirheidari, Seyed Ali; Martinelli, Fabio; Crispo, Bruno
Subdomain enumeration is a fundamental step of many security processes (i.e., vulnerability discovery, OSINT, host enumeration, etc.). Up to now, this has been achieved with deterministic procedures that have shown some limitations. For instance, the process typically requires the generation of a candidate, which is subsequently checked for validity. While the validation is a straightforward procedure, the definition of an optimal candidate generation strategy is still an open problem. This paper presents a novel subdomain enumeration tool that allows the generation of high-quality sub-domain candidates. We employ a Generative Adversarial Network (GAN) to sample unseen candidates from the distribution of valid subdomain names. The model learns this distribution from publicly available datasets. Moreover, by sampling from the trained model, we address the limitations of traditional algorithms. Our experiments were carried out against 15 domains and a ground truth of 1164 other targets. The 15 domains were carefully selected from bug bounty platforms to avoid terms of use violations. Several factors influenced the choices, including the popularity, the expected number of subdomains, and the available services. Our experiments aim to validate our approach by testing the performance increase in subdomain enumeration processes against the state-of-the-art. We benchmark our proposal in terms of candidates' validity and sample uniqueness. The results showed that, with our GAN, the performance of a traditional subdomain enumeration workflow increased by up to 61%. In addition, according to our ground truth experiments, the GAN was able to guess, on average, 32% of subdomains.
2022, Articolo in rivista, ENG
Bianco, Vittorio; Priscoli, Mattia Delli; Pirone, Daniele; Zanfardino, Gennaro; Memmolo, Pasquale; Bardozzo, Francesco; Miccio, Lisa; Ciaparrone, Gioele; Ferraro, Pietro; Tagliaferri, Roberto
Fourier ptychographic microscopy probes label-free samples from multiple angles and achieves super resolution phase-contrast imaging according to a synthetic aperture principle. Thus, it is particularly suitable for high-resolution imaging of tissue slides over a wide field of view. Recently, in order to make the optical setup robust against misalignments-induced artefacts, numerical multi-look has been added to the conventional phase retrieval process, thus allowing the elimination of related phase errors but at the cost of a long computational time. Here we train a generative adversarial network to emulate the process of complex amplitude estimation. Once trained, the network can accurately reconstruct in real-time Fourier ptychographic images acquired using a severely misaligned setup. We benchmarked the network by reconstructing images of animal neural tissue slides. Above all, we show that important morphometric information, relevant for diagnosis on neural tissues, are retrieved using the network output. These are in very good agreement with the parameters calculated from the ground-truth, thus speeding up significantly the quantitative phase-contrast analysis of tissue samples.
2022, Articolo in rivista, ENG
Manco, Giuseppe; Ritacco, Ettore; Rullo, Antonino; Sacca, Domenico; Serra, Edoardo
The development of platforms and techniques for emerging Big Data and Machine Learning applications requires the availability of real-life datasets. A possible solution is to synthesize datasets that reflect patterns of real ones using a two-step approach: first, a real dataset X is analyzed to derive relevant patterns Z and, then, to use such patterns for reconstructing a new dataset X ' that preserves the main characteristics of X. This survey explores two possible approaches: (1) Constraint-based generation and (2) probabilistic generative modeling. The former is devised using inverse mining (IFM) techniques, and consists of generating a dataset satisfying given support constraints on the itemsets of an input set, that are typically the frequent ones. By contrast, for the latter approach, recent developments in probabilistic generative modeling (PGM) are explored that model the generation as a sampling process from a parametric distribution, typically encoded as neural network. The two approaches are compared by providing an overview of their instantiations for the case of discrete data and discussing their pros and cons. This article is categorized under: Fundamental Concepts of Data and Knowledge > Big Data Mining Technologies > Machine Learning Algorithmic Development > Structure Discovery
DOI: 10.1002/widm.1450
2021, Contributo in atti di convegno, ENG
Liguori, Angelica and Manco, Giuseppe and Pisani, Francesco Sergio and Ritacco, Ettore
We propose ARN, a semisupervised anomaly detection and generation method based on adversarial reconstruction. ARN exploits a regularized autoencoder to optimize the reconstruction of variants of normal examples with minimal differences, that are recognized as outliers. The combination of regularization and adversarial reconstruction helps to stabilize the learning process, which results in both realistic outlier generation and substantial detection capability. Experiments on several benchmark datasets show that our model improves the current state-of-the-art by valuable margins because of its ability to model the true boundaries of the data manifold.