ai_1 Artificial Intelligence and Pattern Recognition
Knowledge representation, machine learning,Clustering, Classification, Feature Selection
scientif Robotics
Perception, control, learning, aerial robots, bio-inspired robots, household robots
compbio BioInformatics
Sequence analysis, Algorithms for Bioinformatics, cDNA Microarray Data Analysis
graphics Graphics
Interactive rendering, global illumination, measurement, simulation, sound, perceptions
matrix Scientific Computing
Numerical analysis, computational geometry, physically based animation
neuro_sc NeuroScience


Recent Publications


Multi-View Ensemble Classification of Brain Connectivity Images for Neurodegeneration Type Discrimination
Michele Fratello, Giuseppina Caiazzo, Francesca Trojsi, Antonio Russo, Gioacchino Tedeschi,Roberto Tagliaferri, Fabrizio Esposito
single_view_fmriBrain connectivity analyses using voxels as features are not robust enough for single-patient classification because of the inter-subject anatomical and functional variability. To construct more robust features, voxels can be aggregated into clusters that are maximally coherent across subjects. Moreover, combining multi-modal neuroimaging and multi-view data integration techniques allows generating multiple independent connectivity features for the same patient. Structural and functional connectivity features were extracted from multi-modal MRI images with a clustering technique, and used for the multi-view classification of different phenotypes of neurodegeneration by an ensemble learning method (random forest). Two different multi-view models (intermediate and late data integration) were trained on, and tested for the classification of, individual whole-brain default-mode network (DMN) and fractional anisotropy (FA) maps, from 41 amyotrophic lateral sclerosis (ALS) patients, 37 Parkinson’s disease (PD) patients and 43 healthy control (HC) subjects. Both multi-view data models exhibited ensemble classification accuracies significantly above chance. In ALS patients, multi-view models exhibited the best performances (intermediate: 82.9%, late: 80.5% correct classification) and were more discriminative than each single-view model. In PD patients and controls, multi-view models’ performances were lower (PD: 59.5%, 62.2%; HC: 56.8%, 59.1%) but higher than at least one single-view model. Training the models only on patients, produced more than 85% patients correctly discriminated as ALS or PD type and maximal performances for multi-view models. These results highlight the potentials of mining complementary information from the integration of multiple data views in the classification of connectivity patterns from multi-modal brain images in the study of neurodegenerative diseases.
Read more…



Rotation Clustering: a Consensus Clustering Approach to Cluster Gene Expression Data
Paola Galdi, Angela Serra, Roberto Tagliaferri
Fig-3-Number-of-KEGG-gene-sets-associated-to-Glioblastoma-that-give-a-significantIn this work we present Rotation clustering, a novel method for consensus clustering inspired by the classifier ensemble model Rotation Forest. We demonstrate the effectiveness of our method in a real world application, the identification of enriched gene sets in a TCGA dataset derived from a clinical study on Glioblastoma multiforme. The proposed approach is compared with a classical clustering algorithm and with two other consensus methods. Our results show that this method has been effective in finding significant gene groups that show a common behaviour in terms of expression patterns.
Read more…


Effectiveness of Projection Techniques in Genomic Data Analysis
Paola Galdi, Angela Serra, Dario Greco, Roberto Tagliaferri
Fig-2-Supervised-Projection-The-top-left-and-bottom-left-pictures-show-the-projections.ppmReal-world datasets, such as genomic data, are noisy and high-dimensional, and are therefore difficult to analyse without a preliminary step aimed to reduce data dimensionality and to select relevant features. Projection techniques are a useful tool to pre-process high dimensional data since they allow to achieve a simpler representation of the original data that still preserves intrinsic information. In this work, we assess the effectiveness of these methods when applied to two common tasks in Bioinformatics: patient classification and gene clustering. We compared the performance of different learning models in the original space and in several projected spaces obtained with different techniques, both in a supervised and in an unsupervised setting. Our results show that projection techniques can lead to a significant improvement in the learning ability of models.
Read more…


Data integration in genomics and systems biology

Angela Serra, Michele Fratello, Dario Greco, Roberto Tagliaferri
Fig-1-Data-integration-taxonomy.ppmMulti-view learning is the branch of machine learning that deals with multi modal data, i.e. with patterns represented by different sets of features. The fast spread of this learning technique is motivated by the continuing increase of real applications based on multi-view data. For example, in bioin-formatics multiple experiments can be available (mRNA,miRNA and protein expression, genome wide association studies (GWAS) and others) for a set of samples. In bioinformatics multi-view approaches are useful since heterogeneous genome-wide data sources capture information on different aspects of complex biological systems. Each view provides a distinct facet of the same domain, encoding different biologically-relevant patterns. The integration of such views can provide a richer model of the underlying system than those produced by a single view alone. This paper provides a review of the literature with respect to bioinformatics, with the purpose to understand the principles and operation modes of the existing methods and their possible applications. In order to organize the proposed methods in literature and to find similarities between them, these approaches are organized according to three categories: the type of data used in the papers, the statistical problem and the stage of integration.
Read more…


A Radial Search Method for Fast Nearest Neighbor Search on Range Images
Federico Tombari, Samuele Salti, Luca Puglia, Giancarlo Raiconi and Luigi Di Stefano
posterIn this paper, we propose an efficient method for the problem of Nearest Neighbor Search (NNS) on 3D data provided in the form of range images. The proposed method exploits the organized structure of range images to speed up the neighborhood exploration by operating radially from the query point and terminating the search by evaluating adaptive stop conditions.
Read more…


Passive Dense Stereo Vision on the Myriad2 VPU
Luca Puglia, Mircea Ionica, Giancarlo Raiconi, David Moloney
hotchips_posterIn this work an Adaptive Semi-Global Matching approach to passive dense stereo vision is considered. Passive dense stereo is interesting commercially as it is potentially lower cost and lower power and higher resolution and can sustain higher frame rate and work over a wider range of lighting conditions than active stereo systems such as structured light and time of flight but presents some difficulties which will be outlined. As will be detailed the method is capable of delivering a sustained throughput 8 fps in the worst-case scenario with a peak of 50 fps in best case on a low cost embedded vision processor.
Read more…



Consensus Clustering in Gene Expression
Paola Galdi, Francesco Napolitano, Roberto Tagliaferri
consensus_geIn data analysis, clustering is the process of finding groups in unlabelled data according to similarities among them in such a way that data items belonging to the same group are more similar between each other than items in different groups. Consensus clustering is a methodology for combining different clustering solutions from the same data set in a new clustering, in order to obtain a more accurate and stable solution. In this work we compared different consensus approaches in combination with different clustering algorithms and ran several experiments on gene expression data sets. We show that consensus techniques lead to an improvement in clustering accuracy and give evidence of the stability of the solutions obtained with these methods.
Read more…


Impact of different metrics on multi-view clustering
Angela Serra, Dario Greco and Roberto Tagliaferri
metricheClustering of patients allows to find groups of subjects with similar characteristics. This categorization can facilitate diagnosis, treatment decision and prognosis prediction. Heterogeneous genome-wide data sources capture different biological aspects that can be integrated in order to better categorize the patients. Clustering methods work by comparing how patients are similar or dissimilar in a suitable similarity space. While several clustering methods have been proposed, there is no systematic comparative study concerning the impact of similarity metrics on the cluster quality. We compared seven popular similarity measures (Pearson, Spearman and Kendall Correlations; Euclidean, Canberra, Minkowski and Manhattan Distances) in conjunction with two classical single-view clustering algorithms and a late integration approach (partitioning around medoids, hierarchical clustering and matrix factorization approaches), on high dimensional multi-view cancer data coming from the TCGA repository. Performance was measured against tumour subcategories classification. Only Euclidean and Minkowski distances showed similar results in terms of clustering similarity indexes. On the other hand, an absolute best similarity measure did not emerge in terms of misclassification, but it strongly depends on the data.
Read more…


A multi-view genomic data integration methodology
Angela Serra, Michele Fratello, Vittorio Fortino, Griancarlo Raiconi, Roberto Tagliaferri and Dario Greco
MDVA is a multi-view genomic data integration framework. MVDA combines different types of measurements (such as mRNA expression, miRNA expression, DNA methylation, clinical data, etc) for a given set of samples (e.g. patients). The aims is to combine dimension reduction, variable selection, clustering (for each available data type) and data integration methods to find patient subtypes.
Read more…


A multi-view genomic data simulator
Michele Fratello, Angela Serra, Vittorio Fortino, Griancarlo Raiconi, Roberto Tagliaferri and Dario Greco
endOMICs technologies allow the measure of the state of a large number of different features (e.g.mRNA expression, miRNA expression, CNV, DNA methylation, etc.) from the same samples. The objective of these experiments is usually to find a reduced set of significant features, which can be used to differentiate the conditions assayed. In terms of development of novel feature selection computational methods, this task is challenging for there is lack of fully annotated biological datasets to be used as benchmark. A possible way to tackle this problem is the generation of appropriate synthetic datasets, whose composition and behaviour are fully controlled and known a priori. On the other hand, the reliability of algorithms tested with synthetic benchmarks is limited to the quality of the dataset, so it is extremely important to synthesize as plausible data as possible. Here we propose a novel method centred on the generation of networks of interactions among different biological molecules, especially related to regulation of gene expression. Our method generates synthetic datasets from ordinary differential equations (ODE)-based models with known parameters. Our results show that the generated datasets are well mimicking the behaviour of real data, for popular network reconstruction methods are able to selectively identify existing interactions.
Read more…



A comparison between Affinity Propagation and assessment based methods in finding the best number of clusters
P.Galdi, F. Napolitano, R. Tagliaferri
Clustering is an unsupervised learning technique used in data analysis to discover the underlying natural structure of data, without using prior knowledge. A fundamental issue with unsupervised clustering problems is how to find the optimal number of clusters. A few algorithms have been developed that try to find it automatically. Other approaches generate multiple solutions varying the number of clusters and select the best one according to a fitting criterion. In this paper we show how a combination of the two approaches leads to better results. In particular we mostly focus our study on Affinity Propagation (AP), a clustering algorithm based on message passing proposed by Frey and Dueck in 2007, that, although not requiring the number of clusters to be found as input, can still be wrapped in an iterative framework. Additionally, we include Dynamic Tree Cut and K-medoids as examples of automatic only and iterative-only methods respectively.
Read more…


SASCr3: A Real Time Hardware Coprocessor for Stereo Correspondence
L.Puglia, M.Vigliar, G.Raiconi
sascr3imgMain focus of this paper is to show the relevant improvements for a real time hardware co-processor for Stereo-Matching. The approach follows the well-known scheme for strings alignment proposed by Needleman&Wunsch, commonly used in bio-informatics. The principal improvement concerns the algorithm parallelization in FPGA design, in an hardware architecture many resources can work at the same time avoiding the reduction of system performance. The architecture, highly modular, was designed by using Bluespec SystemVerilog development tool and is described in detail. For many parallelism degrees the synthesis and performance results are shown, for this purpose a Lattice ECP3-70 is set as target device. The aim of this project is to build stereo vision system for embedded application, charaterized by low power usage and device cost. The actual circuit is an updated version of SASCr2 design. Performance is benchmarked against the former implementation.
Read more…


SASCr2: Enhanced Hardware String alignment Coprocessor for Stereo Correspondence
M.Vigliar, M.Fratello, L.Puglia, G.Raiconi

In this paper new and significant improvements for a recently proposed hardware co-processor for Stereo Matching are introduced. Main focus is on small memory requirements while preserving the needed accuracy. Starting from a pair of stereo images, the co-processor computes the “disparity map” used to define corresponding points on the two images. The approach follows the well-known scheme for string alignment by Needleman & Wunsch, commonly used in bioinformatics. The architecture, highly modular, was designed by using Bluespec SystemVerilog development tool and is described in detail. Synthesis results for several FPGA platforms are shown. The actual circuit is an updated version of SASC design. Performance is benchmarked against the former implementation as well as against two reference software versions.
Read more…

Transcriptome dynamics-based operon prediction in prokaryotes
V.Fortino, O. Smolander, P.Auvinen, R.Tagliaferri, D.Greco
Background: Inferring operon maps is crucial to understanding the regulatory networks of prokaryotic genomes. Recently, RNA-seq based transcriptome studies revealed that in many bacterial species the operon structure vary with the change of environmental conditions. Therefore, new computational solutions that use both static and dynamic data are necessary to create condition specific operon predictions.
Results: n this work, we propose a novel classification method that integrates RNA-seq based transcriptome profiles with genomic sequence features to accurately identify the operons that are expressed under a measured condition. The classifiers are trained on a small set of confirmed operons and then used to classify the remaining gene pairs of the organism studied. Finally, by linking consecutive gene pairs classified as operons, our computational approach produces condition-dependent operon maps. We evaluated our approach on various RNA-seq expression profiles of the bacteria Haemophilus somni, Porphyromonas gingivalis, Escherichia coli and Salmonella enterica. Our results demonstrate that, using features depending on both transcriptome dynamics and genome sequence characteristics, we can identify operon pairs with high accuracy. Moreover, the combination of DNA sequence and expression data results in more accurate predictions than each one alone.
Conclusions: We present a computational strategy for the comprehensive analysis of condition-dependent operon maps in prokaryotes. Our method can be used to generate condition specific operon maps of many bacterial organisms for which high-resolution transcriptome data is available.
Read more…



Bioinformatic pipelines in Python with Leaf
F.Napolitano, R. Mariani-Costantini, R.Tagliaferri
Background: An incremental, loosely planned development approach is often used in bioinformatic studies when dealing with custom data analysis in a rapidly changing environment. Unfortunately, the lack of a rigorous software structuring can undermine the maintainability, communicability and replicability of the process. To ameliorate this problem we propose the Leaf system, the aim of which is to seamlessly introduce the pipeline formality on top of a dynamical development process with minimum overhead for the programmer, thus providing a simple layer of software structuring.
Results: Leaf includes a formal language for the definition of pipelines with code that can be transparently inserted into the user’s Python code. Its syntax is designed to visually highlight dependencies in the pipeline structure it defines. While encouraging the developer to think in terms of bioinformatic pipelines, Leaf supports a number of automated features including data and session persistence, consistency checks between steps of the analysis, processing optimization and publication of the analytic protocol in the form of a hypertext.
Conclusions: Leaf offers a powerful balance between plan-driven and change-driven development environments in the design, management and communication of bioinformatic pipelines. Its unique features make it a valuable alternative to other related tools.
Read more…


Drug repositioning: a machine-learning approach through data integration
F.Napolitano, Y. Zhao, V. Moreira, R.Tagliaferri, J.Kere, M.D’amato, D.Greco
Existing computational methods for drug repositioning either rely only on the gene expression response of cell lines after treatment, or on drug-to-disease relationships, merging several information levels. However, the noisy nature of the gene expression and the scarcity of genomic data for many diseases are important limitations to such approaches. Here we focused on a drug-centered approach by predicting the therapeutic class of FDA-approved compounds, not considering data concerning the diseases. We propose a novel computational approach to predict drug repositioning based on state-of-the-art machine-learning algorithms. We have integrated multiple layers of information: i) on the distances of the drugs based on how similar are their chemical structures, ii) on how close are their targets within the protein-protein interaction network, and iii) on how correlated are the gene expression patterns after treatment. Our classifier reaches high accuracy levels (78%), allowing us to re-interpret the top misclassifications as re-classifications, after rigorous statistical evaluation. Efficient drug repurposing has the potential to significantly impact the whole field of drug development. The results presented here can significantly accelerate the translation into the clinics of known compounds for novel therapeutic uses.
Read more…


Integrative genetic, epigenetic and pathological analysis of paraganglioma reveals complex dysregulation of NOTCH signaling
A.Cama, F.Verginelli, L.V.Lotti, F.Napolitano, A.Morgano, A.D’Orazio, M.Vacca, S.Perconti, F.Pepe, F.Romani, F.Vitullo, F.di Lella, R.Visone, M.Mannelli, H.P.H. Neumann, G.Raiconi, C.Paties, A.Moschetta, R.Tagliaferri, A.Veronese, M.Sanna, R. Mariani-Costantini
Head and neck paragangliomas, rare neoplasms of the paraganglia composed of nests of neurosecretory and glial cells embedded in vascular stroma, provide a remarkable example of organoid tumor architecture. To identify genes and pathways commonly deregulated in head and neck paraganglioma, we integrated high-density genome-wide copy number variation (CNV) analysis with micro-RNA and immunomorphological studies. Gene-centric CNV analysis of 24 cases identified a list of 104 genes most significantly targeted by tumor-associated alterations. The “NOTCH signaling pathway” was the most significantly enriched term in the list (P = 0.002 after Bonferroni or Benjamini correction). Expression of the relevant NOTCH pathway proteins in sustentacular (glial), chief (neuroendocrine) and endothelial cells was confirmed by immunohistochemistry in 47 head and neck paraganglioma cases. There were no relationships between level and pattern of NOTCH1/ JAG2 protein expression and germline mutation status in the SDH genes, implicated in paraganglioma predisposition, or the presence/absence of immunostaining for SDHB, a surrogate marker of SDH mutations. Interestingly, NOTCH upregulation was observed also in cases with no evidence of CNVs at NOTCH signaling genes, suggesting altered epigenetic modulation of this pathway. To address this issue we performed microarray-based microRNA expression analyses. Notably 5 microRNAs (miR-200a,b,c and miR-34b,c), including those most downregulated in the tumors, correlated to NOTCH signaling and directly targeted NOTCH1 in in vitro experiments using SH-SY5Y neuroblastoma cells. Furthermore, lentiviral transduction of miR-200s and miR- 34s in patient-derived primary tympano-jugular paraganglioma cell cultures was associated with NOTCH1 downregulation and increased levels of markers of cell toxicity and cell death. Taken together, our results provide an integrated view of common molecular alterations associated with head and neck paraganglioma and reveal an essential role of NOTCH pathway deregulation in this tumor type.
Read more…



SASC: A Hardware String Alignment Coprocessor for Stereo Correspondence
M.Vigliar, M.Fratello, L.Puglia, G.Raiconi

In this paper a design scheme is proposed for a hardware co-processor that, starting from a pair of stereo images, computes the “disparity map” between them used to define corresponding points on the two images. The followed approach, based on Dynamic Programming, is that proposed in a recent paper and exploits the well known Needleman & Wunsch’s string-alignment algorithm used in bioinformatics. The architecture, highly modular, was designed using Bluespec System Verilog development tool and is described in detail. Synthesis results are shown for several FPGA platforms and demonstrates that the processor can result sufficiently small to be embedded in a totally hardware stereo images processing chain. Performance obtained and reported at the end of the paper show that the processor can run fast enough to be employed in real time instances.
Read more…