Jilles Vreeken
Research Group Leader
Cluster of Excellence MMCI at Saarland University
Saarland Informatics Campus
Building E 1.7 Room 3.22
66123 Saarbrücken, Germany
jv@cispa.saarland
+49 681 302 71 925
Jilles at work in Philadelphia

I lead the research group on Exploratory Data Analysis at the Helmholtz Center for Information Security. In addition, I'm affiliated as Senior Researcher with the Database and Information Systems (D5) group of the Max Planck Institute for Informatics, and as Independent Research Group Leader with the DFG cluster-of-excellence on Multimodal Computing and Interaction at Saarland University.

My research is mainly concerned with exploratory data mining. That is, I develop theory and algorithms for answering the question `this is my data, tell me what I need to know'. To identify what you need to know, i.e., what is the most interesting structure in the data, I often employ well-founded statistical methods. In particular, Information Theory — the principles of Minimum Description Length (MDL) and Maximum Entropy have proven to be highly valuable tools. Next, I develop highly efficient algorithms for extracting these interesting structures, i.e., models, from very large and complex data—as well as investigate how we can use these structures in a wide range of applications, including identifying rare diseases, e-health, bio-informatics, market analysis, product recommendation, etc.


I'm always looking for talented and motivated PhD candidates, postdocs, and HiWi's
with a strong background in data mining, machine learning, statistics, and/or mathematics.


Currently I'm investigating techniques for identifying informative local structures in large collections of complex data; how to efficiently mine good data descriptions directly such data; how to determine and discover causal dependencies from observational data; the theoretical and practical foundations of interactive exploration of very large data, discovering things by serendipity; how to mine large relational databases; how to mine very large graphs, including characterising influence propagation in social networks; as well as to study well-founded approaches for meaningfully comparing between, and validation of, explorative results.

Below, you'll find an overview of my activities, as well as a selection of my recent publications. You might further be interested in my publications, implementations, our workshop on Learning and Mining for Cybersecurity (LEMINCS) at KDD'19, our tutorial on Modern MDL meets Data Mining at KDD'19, or our tutorial on Summarizing Graphs at Multiple Scales at ICDM'18.


or, in case you're looking for a bit of procrastination, consider
Research in Progress — the secret life of research, through the medium of animated GIFs.


Activities more ▾

Teaching and Advising more ▾
  • Researchers and Assistants
    • Kailash Budhathoki
    • Sebastian Dalleiger
    • Jonas Fischer
    • Janis Kalofolias
    • David Kaltenpoth
    • Panagiotis Mandros
    • Alexander Marx
    • Osman Ali Mian
    • Corinna Coupette
    • Joscha Cueppers
    • Edith Heiter
    • Frauke Hinrichs
    • Anna Olah
    • Divyam Sayan
    • Sandra Sukarieh
    • Khánh Hiep Tran
  • Former MSc Thesis Students
    • Divyam Saran (2019)
    • Osman Ali Mian (2019)
    • Simina Ana Cotop (2019)
    • Magnus Halbe (2018)
    • Maha Aburahma (2018)
    • Iva Farag-Baykova (2018)
    • Yuliia Brendel (2018)
    • Maike Eissfeller (2018)
    • Boris Wiegand (2018)
    • Tatiana Dembelova (2018)
    • Robin Burghartz (2017)
    • Henrik Jilke (2017)
    • Benjamin Hättasch (2017)
    • Amirhossein Baradaranshahroudi (2016)
    • Apratim Bhattacharyya (2016)
    • Beata Wójciak (2016)
    • Margarita Salyaeva (2016)
    • Manan Gandhi (2016)
    • Kathrin Grosse (2016)
    • Kailash Budhathoki (2015)
    • Panagiotis Mandros (2015)
    • Thomas Van Brussel (2012)
    • Tanja Van den Eede (2011)
    • Sandy Moens (2010)
    • Andie Similon (2010)
    • Sander Schuckmann (2008)
  • Former BSc Students
    • Frauke Hinrichs (2017)
    • Magnus Halbe (2016)
    • Stefan Bier (2014)
  • Former Research Assistants
    • Grégoire Pacreau
    • Michael Hedderich
    • Patrick Ferber
    • Shweta Mahajan
    • Tobias Heinen
    • Cristian Caloian
    • David Ziegler
    • Stefan Neumann
    • Andrea Fuksova
    • Eustace Ebhotemhen
    • Shilpa Garg
    • Sinan Bozca
    • Michael Wessely

Selected Recent Publications (go here for the complete list)
2019
Kalofolias, J, Boley, M & Vreeken, J Discovering Robustly Connected Subgraphs with Simple Descriptions. In: Proceedings of the IEEE International Conference on Data Mining (ICDM), IEEE, 2019 (18.5% acceptance rate).
Mandros, P, Boley, M & Vreeken, J Discovering Reliable Correlations in Categorical Data. In: Proceedings of the IEEE International Conference on Data Mining (ICDM'19), IEEE, 2019 (18.5% acceptance rate).
Fischer, J & Vreeken, J Sets of Robust Rules, and How to Find Them. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Data (ECMLPKDD), Springer, 2019 (17.7% acceptance rate).
Marx, A & Vreeken, J Identifiability of Cause and Effect using Regularized Regression. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'19), ACM, 2019 (oral presentation 9.2% acceptance rate; overall 14.2%).
Mandros, P, Boley, M & Vreeken, J Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms (Extended Abstract). In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), IJCAI, 2019. (Invited contribution to the IJCAI Sister Conference Best Paper Track)
Kaltenpoth, D & Vreeken, J We Are Not Your Real Parents: Telling Causal From Confounded by MDL. In: SIAM International Conference on Data Mining (SDM), SIAM, 2019 (22.9% acceptance rate).
Marx, A & Vreeken, J Testing Conditional Independence on Discrete Data using Stochastic Complexity. In: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR, 2019 (31% acceptance rate).
Marx, A & Vreeken, J Telling Cause from Effect by Local and Global Regression. Knowledge and Information Systems vol.60(3), pp 1277-1305, IEEE, 2019. (IF 2.397)
2018
Budhathoki, K & Vreeken, J Accurate Causal Inference on Discrete Data. In: Proceedings of the IEEE International Conference on Data Mining (ICDM'18), IEEE, 2018 (19.9% acceptance rate).
Mandros, P, Boley, M & Vreeken, J Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms. In: Proceedings of the IEEE International Conference on Data Mining (ICDM'18), IEEE, 2018 (full paper, 8.9% acceptance rate; overall 19.9%). (Best Paper Award)
Marx, A & Vreeken, J Causal Inference on Multivariate and Mixed Type Data. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Data (ECMLPKDD), Springer, 2018 (25% acceptance rate).
Budhathoki, K & Vreeken, J Causal Inference on Event Sequences. In: Proceedings of the SIAM Conference on Data Mining (SDM), pp 55-63, SIAM, 2018 (23.2% acceptance rate).
Wu, H, Ning, Y, Chakraborty, P, Vreeken, J, Tatti, N & Ramakrishnan, N Generating Realistic Synthetic Population Datasets. Transactions on Knowledge Discovery from Data vol.12(4), pp 1-45, ACM, 2018. (IF 1.68)
List, M, Hornakova, A, Vreeken, J & Schulz, MH JAMI — Fast computation of Conditional Mutual Information for ceRNA network analysis. Bioinformatics vol.34(17), pp 3050-3051, Oxford University Press, 2018. (IF 7.307)
Budhathoki, K & Vreeken, J Origo: Causal Inference by Compression. Knowledge and Information Systems vol.56(2), pp 285-307, Springer, 2018. (IF 2.247)
2017
Budhathoki, K & Vreeken, J MDL for Causal Inference on Discrete Data. In: Proceedings of the IEEE International Conference on Data Mining (ICDM'17), pp 751-756, IEEE, 2017 (19.9% acceptance rate).
Marx, A & Vreeken, J Telling Cause from Effect by MDL-based Local and Global Regression. In: Proceedings of the IEEE International Conference on Data Mining (ICDM'17), pp 307-316, IEEE, 2017 (full paper, 9.3% acceptance rate; overall 19.9%). (invited for the KAIS Special Issue on the Best of IEEE ICDM 2017)
Kalofolias, J, Boley, M & Vreeken, J Efficiently Discovering Locally Exceptional yet Globally Representative Subgroups. In: Proceedings of the IEEE International Conference on Data Mining (ICDM'17), IEEE, 2017 (full paper, 9.3% acceptance rate; overall 19.9%).
Mandros, P, Boley, M & Vreeken, J Discovering Reliable Approximate Functional Dependencies. In: Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp 355-363, ACM, 2017 (oral presentation, 8.6% acceptance rate; overall 17.5%).
Budhathoki, K & Vreeken, J Correlation by Compression. In: Proceedings of the SIAM Conference on Data Mining (SDM), SIAM, 2017 (25% acceptance rate).
Bertens, R, Vreeken, J & Siebes, A Efficiently Discovering Unexpected Pattern-Co-Occurrences. In: Proceedings of the SIAM International Conference on Data Mining (SDM), pp 126-134, SIAM, 2017 (25% acceptance rate).
Bhattacharyya, A & Vreeken, J Efficiently Summarising Event Sequences with Rich Interleaving Patterns. In: Proceedings of the SIAM Conference on Data Mining (SDM), pp 795-803, SIAM, 2017 (selected in the top 10 papers of SDM'17, 2.7% acceptance rate; overall 25%).
Pienta, R, Kahng, M, Lin, Z, Vreeken, J, Talukdar, P, Abello, J, Parameswaran, G & Chau, DH Adaptive Local Exploration of Large Graphs. In: Proceedings of the SIAM International Conference on Data Mining (SDM), pp 597-605, SIAM, 2017 (25% acceptance rate).
Boley, M, Goldsmith, BR, Ghiringhelli, LM & Vreeken, J Identifying Consistent Statements about Numerical Data with Dispersion-Corrected Subgroup Discovery. Data Mining and Knowledge Discovery vol.31(5), pp 1391-1418, Springer, 2017. (IF 3.160) (ECML PKDD'17 Journal Track)
Fischer, AK, Vreeken, J & Klakow, D Beyond Pairwise Similarity: Quantifying and Characterizing Linguistic Similarity between Groups of Languages by MDL. Computación y Sistemas vol.21(4), 2017. (Special Issue for the 18th International Conference on Intelligent Text Processing and Computational Linguistics, CICLing'17)