2024, Articolo in rivista, ENG
Chiara Marzi, Marco Giannelli, Andrea Barucci, Carlo Tessa, Mario Mascalchi, Stefano Diciotti
Pooling publicly-available MRI data from multiple sites allows to assemble extensive groups of subjects, increase statistical power, and promote data reuse with machine learning techniques. The harmonization of multicenter data is necessary to reduce the confounding effect associated with non-biological sources of variability in the data. However, when applied to the entire dataset before machine learning, the harmonization leads to data leakage, because information outside the training set may affect model building, and potentially falsely overestimate performance. We propose a 1) measurement of the efficacy of data harmonization; 2) harmonizer transformer, i.e., an implementation of the ComBat harmonization allowing its encapsulation among the preprocessing steps of a machine learning pipeline, avoiding data leakage by design. We tested these tools using brain T1-weighted MRI data from 1740 healthy subjects acquired at 36 sites. After harmonization, the site effect was removed or reduced, and we showed the data leakage effect in predicting individual age from MRI data, highlighting that introducing the harmonizer transformer into a machine learning pipeline allows for avoiding data leakage by design.
2024, Articolo in rivista, ENG
Mohammed Achite, Paraskevas Tsangaratos, Gaetano Pellicone, Babak Mohammadi, Tommaso Caloiero
This study addresses a challenging problem of predicting mean annual precipitation across arid and semi-arid areas in northern Algeria, utilizing deterministic, geostatistical (GS), and machine learning (ML) models. Through the analysis of data spanning nearly five decades and encompassing 150 monitoring stations, the result of Random Forest showed the highest training performance, with R square value (of 0.9524) and the Root Mean Square Error (of 24.98). Elevation emerges as a critical factor, enhancing prediction accuracy in mountainous and complex terrains when used as an auxiliary variable. Cluster analysis further refines our understanding of station distribution and precipitation characteristics, identifying four distinct clusters, each exhibiting unique precipitation patterns and elevation zones. This study helps for a better understanding of precipitation prediction, encouraging the integration of additional variables and the exploration of climate change impacts, thereby contributing to informed environmental management and adaptation strategies across diverse climatic and terrain scenarios.
2023, Articolo in rivista, ENG
D. Minici (1); G. Cola (2); G. Perfetti (3,4); S. Espinoza Tofalos (3,4); M. Di Bari (3,4); M. Avvenuti (1)
The COVID-19 pandemic has considerably shifted the focus of scientific research, speeding up the process of digitizing medical monitoring. Wearable technology is already widely used in medical research, as it has the potential to monitor the user's physical activity in daily life. This study aims to explore in-home collected wearable-derived signals for frailty status assessment. A sample of 35 subjects aged 70+, autonomous in basic activities of daily living and cognitively intact, was collected. After being clinically assessed for frailty according to Fried's phenotype, participants wore a wrist device equipped with inertial motion sensors for 24 h, during which they led their usual life in their homes. Signal-derived traces were split into 10-s segments and labeled classified as gaits, other motor activities, or rests. Gait and other motor activity segments were used to calculate the Subject Activity Level (SAL), an index to quantify how users were active throughout the day. The SAL index was then combined with gait-derived features to design a novel frailty status assessment algorithm. In particular, subjects were classified as robust or non-robust, a category that includes both Fried's frail and pre-frail phenotypes. For some users, activity levels alone enabled accurate frailty assessment, whereas, for others, a Gaussian Naive Bayes classifier based on the gait-derived features was required to assess frailty status. Overall, the proposed method showed extremely promising results, allowing discrimination of robust and non-robust subjects with an overall 91% accuracy, stemming from 95% sensitivity and 88% specificity. This study demonstrates the potential of unobtrusive, wearable devices in objectively assessing frailty through unsupervised monitoring in real-world settings.
2023, Articolo in rivista, ENG
Thölke, Philipp; Mantilla-Ramos, Yorguin Jose; Abdelhedi, Hamza; Maschke, Charlotte; Dehgan, Arthur; Harel, Yann; Kemtur, Anirudha; Mekki Berrada, Loubna; Sahraoui, Myriam; Young, Tammy; Bellemare Pépin, Antoine; El Khantour, Clara; Landry, Mathieu; Pascarella, Annalisa; Hadid, Vanessa; Combrisson, Etienne; O'Byrne, Jordan; Jerbi, Karim
Machine learning (ML) is increasingly used in cognitive, computational and clinical neuroscience. The reliable and efficient application of ML requires a sound understanding of its subtleties and limitations. Training ML models on datasets with imbalanced classes is a particularly common problem, and it can have severe consequences if not adequately addressed. With the neuroscience ML user in mind, this paper provides a didactic assessment of the class imbalance problem and illustrates its impact through systematic manipulation of data imbalance ratios in (i) simulated data and (ii) brain data recorded with electroencephalography (EEG), magnetoencephalography (MEG) and functional magnetic resonance imaging (fMRI). Our results illustrate how the widely-used Accuracy (Acc) metric, which measures the overall proportion of successful predictions, yields misleadingly high performances, as class imbalance increases. Because Acc weights the per-class ratios of correct predictions proportionally to class size, it largely disregards the performance on the minority class. A binary classification model that learns to systematically vote for the majority class will yield an artificially high decoding accuracy that directly reflects the imbalance between the two classes, rather than any genuine generalizable ability to discriminate between them. We show that other evaluation metrics such as the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC), and the less common Balanced Accuracy (BAcc) metric - defined as the arithmetic mean between sensitivity and specificity, provide more reliable performance evaluations for imbalanced data. Our findings also highlight the robustness of Random Forest (RF), and the benefits of using stratified cross-validation and hyperprameter optimization to tackle data imbalance. Critically, for neuroscience ML applications that seek to minimize overall classification error, we recommend the routine use of BAcc, which in the specific case of balanced data is equivalent to using standard Acc, and readily extends to multi-class settings. Importantly, we present a list of recommendations for dealing with imbalanced data, as well as open-source code to allow the neuroscience community to replicate and extend our observations and explore alternative approaches to coping with imbalanced data.
2023, Contributo in atti di convegno, ENG
Bucarelli M.S.; Cassano L.; Siciliano F.; Mantrach A.; Silvestri F.
In practical settings, classification datasets are obtained through a labelling process that is usually done by humans. Labels can be noisy as they are obtained by aggregating the different individual labels assigned to the same sample by multiple, and possibly disagreeing, annotators. The interrater agreement on these datasets can be measured while the underlying noise distribution to which the labels are subject is assumed to be unknown. In this work, we: (i) show how to leverage the inter-annotator statistics to estimate the noise distribution to which labels are subject; (ii) introduce methods that use the estimate of the noise distribution to learn from the noisy dataset; and (iii) establish generalization bounds in the empirical risk minimization framework that depend on the estimated quantities. We conclude the paper by providing experiments that illustrate our findings.
2023, Articolo in rivista, ENG
Conti F.; Banchelli M.; Bessi V.; Cecchi C.; Chiti F.; Colantonio S.; D'Andrea C.; de Angelis M.; Moroni D.; Nacmias B.; Pascali M.A.; Sorbi S.; Matteini P.
The cerebrospinal fluid (CSF) of 19 subjects who received a clinical diagnosis of Alzheimer's disease (AD) as well as of 5 pathological controls was collected and analyzed by Raman spectroscopy (RS). We investigated whether the raw and preprocessed Raman spectra could be used to distinguish AD from controls. First, we applied standard Machine Learning (ML) methods obtaining unsatisfactory results. Then, we applied ML to a set of topological descriptors extracted from raw spectra, achieving a very good classification accuracy (>87%). Although our results are preliminary, they indicate that RS and topological analysis may provide an effective combination to confirm or disprove a clinical diagnosis of AD. The next steps include enlarging the dataset of CSF samples to validate the proposed method better and, possibly, to investigate whether topological data analysis could support the characterization of AD subtypes.
2023, Contributo in atti di convegno, ENG
Rizzo M.; Veneri A.; Albarelli A.; Lucchese C.; Nobile M.; Conati C.
EXplainable Artificial Intelligence (XAI) is a vibrant research topic in the artificial intelligence community. It is raising growing interest across methods and domains, especially those involving high stake decision-making, such as the biomedical sector. Much has been written about the subject, yet XAI still lacks shared terminology and a framework capable of providing structural soundness to explanations. In our work, we address these issues by proposing a novel definition of explanation that synthesizes what can be found in the literature. We recognize that explanations are not atomic but the combination of evidence stemming from the model and its input-output mapping, and the human interpretation of this evidence. Furthermore, we fit explanations into the properties of faithfulness (i.e., the explanation is an accurate description of the model's inner workings and decision-making process) and plausibility (i.e., how much the explanation seems convincing to the user). Our theoretical framework simplifies how these properties are operationalized, and it provides new insights into common explanation methods that we analyze as case studies. We also discuss the impact that our framework could have in biomedicine, a very sensitive application domain where XAI can have a central role in generating trust.
2023, Editoriale in rivista, ENG
L. Biferale; M. Buzzicotti; M. Cencini
2023, Articolo in rivista, ENG
Trevisani, S.; Cavalli, M.; Tosti, F.
Understanding the interactions between the anthroposphere and the geosphere, such as natural hazards, land degradation, quantitative and qualitative impacts on ground and surface waters, is a challenging task. The monitoring and modelling of these interactions can be characterized by high uncertainties in data and models, especially when considering urban areas or locations near engineering infrastructures. Technological and scientific advancements, including remote sensing, geophysical prospecting, drilling equipment, and information technology, have contributed to enhancing our current understanding of these interconnected dynamics. The availability of increasingly large datasets provides better insights into the mechanisms that govern these interactions, but it also adds complexity to monitoring, modeling, and forecasting procedures. From this viewpoint, the utilization of advanced geocomputational methodologies, such as machine learning, geostatistics, pattern recognition, geomorphometry, and other computational-based approaches, plays a pivotal role.
2023, Articolo in rivista, ENG
Gennaro Tartarisco; Giovanni Cicceri; Roberta Bruschetta; Alessandro Tonacci; Simona Campisi; Salvatore Vitabile; Antonio Cerasa; Salvatore Distefano; Alessio Pellegrino; Pietro Amedeo Modesti; Giovanni Pioggia;
Cardiovascular diseases are currently the major causes of death globally. Among the strategies to prevent cardiovascular issues, the automated classification of heart sound abnormalities is an efficient way to detect early signs of cardiac conditions leading to heart failure or other, even asymptomatic, complications, quite effective for timely interventions. Despite the significant improvements in this field, there are still limitations due to the lack of solutions, available data-sets and poor (mainly binary - normal vs abnormal) classification models and algorithms. This paper presents a Medical Cyber-Physical System (MCPS) for the automatic classification of heart valve diseases onsite, in a timely manner. The proposed MCPS, indeed, can be deployed into personal and mobile devices, addressing the limitations of existing solutions for patients, healthcare practitioners, and researchers, through an efficient and easy accessible tool. It combines different neural network models trained on a new Italian dataset of 132 adult patients covering 9 heart sound categories (1 normal and 8 abnormal), also validated against two main open-access (Physionet/CinC Challenge 2016 and Korean) datasets. The overall MCPS performance (time, processing and energy resource utilization) and the high accuracy of the models (up to 98%) demonstrated the feasibility of the proposed solution, even with few data. The dataset supporting the findings of this paper is available upon request to the authors.
2023, Articolo in rivista, ENG
Vivone, Gemine
The fusion of multispectral (MS) and hyperspectral (HS) images has recently been put in the spotlight. The combination of high spatial resolution MS images with HS data showing a lower spatial resolution but a more accurate spectral resolution is the aim of these techniques. This survey presents a deep review of the literature designed for students and professionals who want to know more about the topic. The basis aspects of the MS and HS image fusion are presented and the related approaches are classified into three different classes (pansharpening-based, decomposition-based, and machine learning-based). The ending part of this survey is devoted to the description of widely used datasets for this task and the performance assessment problem, even describing open issues and drawing guidelines for future research.
2023, Software, ENG
Papini O.
This software consists of a Python 3 implementation of the Mesoscale Events Classifier (MEC) algorithm, which has been developed as part of the activities of Task 8.5 of the NAUTILOS project. The algorithm uses Sea Surface Temperature data coming from satellite missions to detect and classify patterns associated with "mesoscale events" in an upwelling ecosystem.
2023, Contributo in atti di convegno, CPE
Monteiro de Lira V.; Pallonetto F.; Gabrielli L.; Renso C.
The global electric car sales continued to exceed the expectations climbing to over 3 millions and reaching a market share of over 4%. However, uncertainty of generation caused by higher penetration of renewable energies and the advent of Electrical Vehicles (EV) with their additional electricity demand could cause strains to the power system, both at distribution and transmission levels. The present work fits this context in supporting charging optimization for EV in parking premises assuming a incumbent high penetration of EVs in the system. We propose a methodology to predict an estimation of the parking duration in shared parking premises. The final objective is estimating the energy requirement of a specific parking lot, evaluate optimal EVs charging schedule and integrate the scheduling into a smart controller. We formalize the prediction problem as a supervised machine learning task to predict the duration of the parking event before the car leaves the slot. We test the proposed approach in a combination of datasets from 2 different campus facilities in Italy and Brazil. The overall results of the models shows an higher accuracy compared to a statistical analysis based on frequency, indicating a viable route for the development of accurate predictors for sharing parking premises energy management systems.
2023, Articolo in rivista, ENG
Luca M.; Pappalardo L.; Lepri B.; Barlacchi G.
Next-location prediction, consisting of forecasting a user's location given their historical trajectories, has important implications in several fields, such as urban planning, geo-marketing, and disease spreading. Several predictors have been proposed in the last few years to address it, including last-generation ones based on deep learning. This paper tests the generalization capability of these predictors on public mobility datasets, stratifying the datasets by whether the trajectories in the test set also appear fully or partially in the training set. We consistently discover a severe problem of trajectory overlapping in all analyzed datasets, highlighting that predictors memorize trajectories while having limited generalization capacities. We thus propose a methodology to rerank the outputs of the next-location predictors based on spatial mobility patterns. With these techniques, we significantly improve the predictors' generalization capability, with a relative improvement in the accuracy up to 96.15% on the trajectories that cannot be memorized (i.e., low overlap with the training set).
2023, Articolo in rivista, ENG
De Rango, Floriano; Guerrieri, Antonio; Raimondo, Pierfrancesco; Spezzano, Giandomenico
The increasing data produced by IoT devices and the need to harness intelligence in our environments impose the shift of computing and intelligence at the edge, leading to a novel computing paradigm called Edge Intelligence/Edge AI. This paradigm combines Artificial Intelligence and Edge Computing, enables the deployment of machine learning algorithms to the edge, where data is generated, and is able to overcome the drawbacks of a centralized approach based on the cloud (e.g., performance bottleneck, poor scalability, and single point of failure). Edge AI supports the distributed Federated Learning (FL) model that maintains local training data at the end devices and shares only globally learned model parameters in the cloud. This paper proposes a novel, energy -efficient, and dynamic FL-based approach considering a hierarchical edge FL architecture called HED-FL, which supports a sustainable learning paradigm using model parameters aggregation at different layers and considering adaptive learning rounds at the edge to save energy but still preserving the learning model's accuracy. Performance evaluations of the proposed approach have also been led out considering model accuracy, loss, and energy consumption.& COPY; 2023 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
2023, Articolo in rivista, CPE
Delre, Pietro; Contino, Marialessandra; Alberga, Domenico; Saviano, Michele; Corriero, Nicola; Mangiatordi, Giuseppe Felice
The development of small molecules that selectively target the cannabinoid receptor subtype 2 (CB2R) is emerging as an intriguing therapeutic strategy to treat neurodegeneration, as well as to contrast the onset and progression of cancer. In this context, in-silico tools able to predict CB2R affinity and selectivity with respect to the subtype 1 (CB1R), whose modulation is responsible for undesired psychotropic effects, are highly desirable. In this work, we developed a series of machine learning classifiers trained on high-quality bioactivity data of small molecules acting on CB2R and/or CB1R extracted from ChEMBL v30. Our classifiers showed strong predictive power in accurately determining CB2R affinity, CB1R affinity, and CB2R/CB1R selectivity. Among the built models, those obtained using random forest as algorithm proved to be the top-performing ones (AUC in validation >=0.96) and were made freely accessible through a user-friendly web platform developed ad hoc and called ALPACA (https://www.ba.ic.cnr.it/softwareic/alpaca/). Due to its user-friendly interface and robust predictive power, ALPACA can be a valuable tool in saving both time and resources involved in the design of selective CB2R modulators.
2023, Articolo in rivista, ENG
Narteni, Sara; Muselli, Marco; Dabbene, Fabrizio; Mongelli, Maurizio
Nowadays, machine learning (ML) is a viable solution for the allocation of equivalent bandwidth (EqB) in telecommunication networks, i.e. the minimum service rate required by a traffic buffer to guarantee a satisfactory Quality of Service (QoS). Moreover, trustworthy artificial intelligence (AI) is gaining importance in regulating the implementation of ML models, requiring explainable AI (XAI) and uncertainty handling. The paper extends prior works on the combined usage of control and rule-based classification for the EqB allocation, by adding the perspective of trustworthy AI. Simulation-based data collection is performed under a large setting of traffic conditions. Clopper-Pearson generalization bound is used as an efficient tool to select a rule-based model with adequate performance, also determining the minimum amount of data required for model training, which resulted in 3000 samples (~ 3.3 h of simulation). Also, robustness in terms of the model's capability to recognize out-of-distribution samples is studied, by comparing the different rates of satisfaction of rules in presence of training or operational data, which is quantified via mutual information, l1 and l2 norms. Results show that, while norms are more likely to capture the difference between training and operational data distribution, regardless its entity, mutual information seems sensitive to the entity of the separation between the training and the operational domains.
2023, Articolo in rivista, ENG
Albano D.; Gitto S.; Messina C.; Serpi F.; Salvatore C.; Castiglioni I.; Zagra L.; De Vecchi E.; Sconfienza L.M.
PurposeTo investigate whether artificial intelligence (AI) can differentiate septic from non-septic total hip arthroplasty (THA) failure based on preoperative MRI features.Materials and methodsWe included 173 patients (98 females, age: 67 +/- 12 years) subjected to first-time THA revision surgery after preoperative pelvis MRI. We divided the patients into a training/validation/internal testing cohort (n = 117) and a temporally independent external-testing cohort (n = 56). MRI features were used to train, validate and test a machine learning algorithm based on support vector machine (SVM) to predict THA infection on the training-internal validation cohort with a nested fivefold validation approach. Machine learning performance was evaluated on independent data from the external-testing cohort.ResultsMRI features were significantly more frequently observed in THA infection (P < 0.001), except bone destruction, periarticular soft-tissue mass, and fibrous membrane (P > 0.005). Considering all MRI features in the training/validation/internal-testing cohort, SVM classifier reached 92% sensitivity, 62% specificity, 79% PPV, 83% NPV, 82% accuracy, and 81% AUC in predicting THA infection, with bone edema, extracapsular edema, and synovitis having been the best predictors. After being tested on the external-testing cohort, the classifier showed 92% sensitivity, 79% specificity, 89% PPV, 83% NPV, 88% accuracy, and 89% AUC in predicting THA infection. SVM classifier showed 81% sensitivity, 76% specificity, 66% PPV, 88% NPV, 80% accuracy, and 74% AUC in predicting THA infection in the training/validation/internal-testing cohort based on the only presence of periprosthetic bone marrow edema on MRI, while it showed 68% sensitivity, 89% specificity, 93% PPV, 60% NPV, 75% accuracy, and 79% AUC in the external-testing cohort.ConclusionAI using SVM classifier showed promising results in predicting THA infection based on MRI features. This model might support radiologists in identifying THA infection.
2023, Articolo in rivista, ENG
Gitto S.; Interlenghi M.; Cuocolo R.; Salvatore C.; Giannetta V.; Badalyan J.; Gallazzi E.; Spinelli M.S.; Gallazzi M.; Serpi F.; Messina C.; Albano D.; Annovazzi A.; Anelli V.; Baldi J.; Aliprandi A.; Armiraglio E.; Parafioriti A.; Daolio P.A.; Luzzati A.; Biagini R.; Castiglioni I.; Sconfienza L.M.
PurposeTo determine diagnostic performance of MRI radiomics-based machine learning for classification of deep-seated lipoma and atypical lipomatous tumor (ALT) of the extremities.Material and methodsThis retrospective study was performed at three tertiary sarcoma centers and included 150 patients with surgically treated and histology-proven lesions. The training-validation cohort consisted of 114 patients from centers 1 and 2 (n = 64 lipoma, n = 50 ALT). The external test cohort consisted of 36 patients from center 3 (n = 24 lipoma, n = 12 ALT). 3D segmentation was manually performed on T1- and T2-weighted MRI. After extraction and selection of radiomic features, three machine learning classifiers were trained and validated using nested fivefold cross-validation. The best-performing classifier according to previous analysis was evaluated and compared to an experienced musculoskeletal radiologist in the external test cohort.ResultsEight features passed feature selection and were incorporated into the machine learning models. After training and validation (74% ROC-AUC), the best-performing classifier (Random Forest) showed 92% sensitivity and 33% specificity in the external test cohort with no statistical difference compared to the radiologist (p = 0.474).ConclusionMRI radiomics-based machine learning may classify deep-seated lipoma and ALT of the extremities with high sensitivity and negative predictive value, thus potentially serving as a non-invasive screening tool to reduce unnecessary referral to tertiary tumor centers.
2023, Articolo in rivista, ENG
Conforti, Massimo; Borrelli, Luigi; Cofone, Gino; Gulla, Giovanni
This study aimed to examine the influence of the random selection of landslide training and testing sets on the predictive performance of the shallow landslide susceptibility modelling at regional scale. The performance of frequency ratio (FR), information value (IV), logistic regression (LR), and maximum entropy (ME) methods were tested and compared for modeling shallow landslide susceptibility in Calabria region, (South Italy). A landslide database of 22,028 shallow landslides, randomly split into training (70%) and testing (30%) sets, was combined with 15 predisposing factors (lithology, soil texture, soil bulk density, soil erodibility, drainage density, land use, elevation, local relief, slope gradient, slope aspect, plan curvature, topographic wetness index, stream power index, topographic ruggedness index, and topographic position index) to calibrate and validate the models. The robustness of the models in response to changes in the landslide dataset was explored through ten training and test sets replicates. The performance of these models was evaluated using several statistical indices and the ROC curve method. The results showed that all the four methods applied achieve promising performance on the prediction of shallow landslide susceptibility at regional scale. The comparison between four methods displayed that the ME is the best performing (AUC = 0.866), followed by the LR (AUC = 0.845), FR (AUC = 0.813), and finally IV (AUC = 0.800). In addition, the findings showed that the accuracy of the four methods for modeling shallow landslide susceptibility was quite robust when the training and testing sets were changed (i.e. a very low sensitivity to varying training/testing sets).