Wouldn't it be great if we could have data mining tools that could consider everything there is and build models with which we could have an interesting, even enlightening discussion? Just like talking to a friend or a colleague? I can't say that we are anywhere close. But my lab has in the past years developed some exciting algorithms that can help us digest large volume of data sets, visualize them in a number of fancy ways, and even fuse them together into a single predictive model. For large-scale data fusion, we have lately tested some predictions in the wet labs and achieved some quite astonishing accuracy.
I also enjoy building things. My Bioinformatics Lab develops Orange, a data mining suite with a cool visual programming interface. We are also authors of dictyExpress, a simple gene expression analytics that has found much use within Dictyostelium research community. Our first popular web application was GenePath: it is over ten years old but still runs! And we are teaming up with a spin-off Genialis to build data mining pipelines with simple web interfaces.
- Overcoming the curse of dimensionality with the use of background knowledge, Basic Research and Application Project, J2-5480, 2013−2016
- Post-transcriptional regulatory networks in neurodegenerative diseases, Basic Research and Application Project, J7-5460, 2013−2016
- Epidemiology and Biodiversity Studies of Plant Pathogens, Basic Research and Application Project, L4-5525, 2013−2016
- CARE-MI - Cardio repair european multidisciplinary initiative, European Project (Framework Programmes), 242038, 2010−2015
- Computational approaches for identification of bacterial resistance pathways in Dictyostelium, Bilateral Collaboration Project, BI-US/13-14-016, 2013−2014
- Artificial intelligence and inteligent systems, Research Programme, P2-0209, 2009−2014
- Data Fusion by Matrix Factorization
IEEE Transactions on Pattern Analysis & Machine Intelligence, 37(1):41-53, 2015.
- Computational models reveal genotype-phenotype associations in Saccharomyces cerevisiae
Yeast, 31:265-277, 2014.
- Gene network inference by probabilistic scoring of relationships from a factorized model of interactions
Bioinformatics, 30 (12). i246-i254, 2014.
- Matrix factorization-based data fusion for drug-induced liver injury prediction
Systems Biomedicine, 2:e28527, 2014.
- Assessment of machine learning reliability methods for quantifying the applicability domain of QSAR regression models
J Chem Inf Model, 54(2):431-441, 2014.
- Imputation of quantitative genetic interactions in epistatic MAPs by interaction propagation matrix completion
In: RECOMB, Pittsburgh, 2014.
- Matrix factorization-based data fusion for gene function prediction in baker's yeast and slime mold
In: PSB, Jan 2014, The Big Island of Hawaii.
- Heterogeneous computing architecture for fast detection of SNP-SNP interactions
BMC Bioinformatics, 15:216, 2013.
- Discovering disease-disease associations by fusing systems-level molecular data
Scientific Reports, 13:3202, 2013.
- Orange: data mining toolbox in Python
Journal of Machine Learning Research, 14:2349-2353, 2013.
- Bacterial discrimination by dictyostelid amoebae reveals the complexity of ancient interspecies interactions
Current Biology, 23(10):862-872, 2013.
- ABC transporters in Dictyostelium discoideum development
PLoS One, 8 (8). e70040, 2013.
- Computational models for prediction of yeast strain potential for winemaking from phenotypic profiles
PLoS One, 8 (7). e66523, 2013.
- NIMFA: A Python Library for Nonnegative Matrix Factorization
Journal of Machine Learning Research, 13:849-853, 2012.
- Stage prediction of embryonic stem cell differentiation from genome-wide expression data
Bioinformatics, 27(18):2546-2553, 2011.
- Characterizing the RNA targets and position-dependent splicing regulation by TDP-43
Nature Neuroscience, 14(4):452-458, 2011.
- iCLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution
Nature Structural & Molecular Biology, 17:909-915, 2010.
- Conserved developmental transcriptomes in evolutionarily divergent species
Genome Biology, 11:R35, 2010.
- Polymorphic members of the lag gene family mediate kin discrimination in Dictyostelium
Current Biology, 19(7):567-572, 2009.
- dictyExpress: a Dictyostelium discoideum gene expression database with an explorative data analysis web-based interface
BMC Bioinformatics, 10:265, 2009.
- Predictive data mining in clinical medicine: Current issues and guidelines
Internation Journal of Medical Informatics, 77(2):81-97, 2008.
- Open-source tools for data mining
Clinics in Laboratory Medicine, 28(1):37-54, 2008.
- Towards knowledge-based gene expression data mining
Journal of Biomedical Informatics, 40(6):787-802, 2007.
- Visualization-based cancer microarray data classification analysis
Bioinformatics, 23(16):2147-2154, 2007.
- VizRank: Data Visualization Guided by Machine Learning
Data Mining and Knowledge Discovery, 13(2):119-136, 2006.
- Epistasis analysis with global transcriptional phenotypes
Nature Genetics, 37(5):471-477, 2005.
- Microarray data mining with visual programming
Bioinformatics, 21(3):396-398, 2005.
- Attribute Interactions in Medical Data Analysis
In: 9th Conference on Artificial Intelligence in Medicine in Europe (AIME 2003), October 18-22, 2003, Protaras, Cyprus.
- GenePath: a System for Automated Construction of Genetic Networks from Mutant Data
Bioinformatics, 19(3):383, 2003.