Contributo in atti di convegno, 2010, ENG
Lucchese C.; Orlando S.; Perego R.
CNR-ISTI, Pisa, Italy; Dipartimento di Informatica, Università Ca' Foscari di Venezia, Venezia, Italy; CNR-ISTI, Pisa, Italy
The discovery of patterns in binary dataset has many applications, e.g. in electronic commerce, TCP/IP networking, Web usage logging, etc. Still, this is a very challenging task in many respects: overlapping vs. non overlapping patterns, presence of noise, extraction of the most important patterns only. In this paper we formalize the problem of discovering the Top-K patterns from binary datasets in presence of noise, as the minimization of a novel cost function. According to the Minimum Description Length principle, the proposed cost function favors succinct pattern sets that may approximately describe the input data. We propose a greedy algorithm for the discovery of Patterns in Noisy Datasets, named PaNDa, and show that it outperforms related techniques on both synthetic and realworld data.
Tenth SIAM International Conference on Data Mining, pp. 165–176, Columbus, Ohio, US, April 29 - May 1 2010
Database Management. Data mining, Pattern mining
Lucchese Claudio, Perego Raffaele
ISTI – Istituto di scienza e tecnologie dell'informazione "Alessandro Faedo"
CNR authors
External links
OAI-PMH: Dublin Core
OAI-PMH: Mods
OAI-PMH: RDF
URL: http://www.siam.org/proceedings/datamining/2010/dm10_015_lucchesec.pdf
External IDs
CNR OAI-PMH: oai:it.cnr:prodotti:92091
Google Scholar: http://scholar.google.com/citations?view_op=view_citation&hl=en&user=bdoG6ScAAAAJ&sortby=pubdate&citation_for_view=bdoG6ScAAAAJ:4JMBOYKVnBMC