Articolo in rivista, 2022, ENG, 10.1002/widm.1450
Manco, Giuseppe; Ritacco, Ettore; Rullo, Antonino; Sacca, Domenico; Serra, Edoardo
IICAR CNR; Univ Calabria; Boise State Univ
The development of platforms and techniques for emerging Big Data and Machine Learning applications requires the availability of real-life datasets. A possible solution is to synthesize datasets that reflect patterns of real ones using a two-step approach: first, a real dataset X is analyzed to derive relevant patterns Z and, then, to use such patterns for reconstructing a new dataset X ' that preserves the main characteristics of X. This survey explores two possible approaches: (1) Constraint-based generation and (2) probabilistic generative modeling. The former is devised using inverse mining (IFM) techniques, and consists of generating a dataset satisfying given support constraints on the itemsets of an input set, that are typically the frequent ones. By contrast, for the latter approach, recent developments in probabilistic generative modeling (PGM) are explored that model the generation as a sampling process from a parametric distribution, typically encoded as neural network. The two approaches are compared by providing an overview of their instantiations for the case of discrete data and discussing their pros and cons. This article is categorized under: Fundamental Concepts of Data and Knowledge > Big Data Mining Technologies > Machine Learning Algorithmic Development > Structure Discovery
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY 12 (2)
constraints-based models, data generation, generative adversarial networks, generative models, inverse frequent itemset mining, synthetic dataset, variational autoencoder
Manco Giuseppe, Ritacco Ettore
ID: 465111
Year: 2022
Type: Articolo in rivista
Creation: 2022-03-14 18:34:20.000
Last update: 2022-04-27 15:19:29.000
CNR authors
CNR institutes
External links
OAI-PMH: Dublin Core
OAI-PMH: Mods
OAI-PMH: RDF
DOI: 10.1002/widm.1450
URL: https://wires.onlinelibrary.wiley.com/doi/10.1002/widm.1450
External IDs
CNR OAI-PMH: oai:it.cnr:prodotti:465111
DOI: 10.1002/widm.1450
ISI Web of Science (WOS): 000744989000001
Scopus: 2-s2.0-85122851101