Jilles Vreeken
Professor of Computer Science
Saarland University
Building E 1.7 Room 3.22
66123 Saarbrücken, Germany
jv@cispa.de
+49 681 302 71 925
Jilles at work at El Capitan

I lead the research group on Exploratory Data Analysis at the CISPA Helmholtz Center for Information Security. In addition, I'm affiliated as Senior Researcher with the Database and Information Systems (D5) group of the Max Planck Institute for Informatics, and as Professor with the Department of Computer Science of Saarland University.

My research is mainly concerned with causality, unsupervised learning, and data mining. In particular, I enjoy developing theory and algorithms for answering exploratory questions about data, such as `what is going on in this data?', `what are the key dependencies?', `are they causal or confounded?', and so on, without having to make unnecessary or unjustified assumptions about the data generating process. To identify what is worthwhile structure, i.e. what is worth knowing, I often employ well-founded statistical methods based on information theory, and then proceed to develop efficient algorithms that can extract useful and insightful results from large and complex data. I like all data types equally much.


I'm always looking for talented and motivated PhD candidates, postdocs, and HiWi's
with a strong background in data mining, machine learning, statistics, and/or mathematics.


Currently I'm investigating techniques for identifying informative and ideally causal structures in large collections of complex data; how to efficiently mine easily interpretable summaries from data; how to determine and discover causal dependencies from observational data; the theoretical and practical foundations of interactive exploration of very large data, discovering things by serendipity; how to mine large relational databases; how to mine very large graphs, including characterising influence propagation in social networks; as well as to study well-founded approaches for meaningfully comparing between, and validation of, explorative results.

Below, you'll find an overview of my activities, as well as a selection of my recent publications. You might further be interested in our recent workshop on Learning and Mining for Cybersecurity (LEMINCS) at KDD'19, our tutorial on Modern MDL meets Data Mining at KDD'19, or our tutorial on Summarizing Graphs at Multiple Scales at ICDM'18.


or, in case you're looking for a bit of procrastination, consider
Research in Progress — the secret life of research, through the medium of animated GIFs.


Activities more ▾

Teaching and Advising more ▾
  • Former MSc Thesis Students
    • Frauke Hinrichs
    • Sarah Mameche
    • Jana Hess (2021)
    • Anna Oláh (2020)
    • Edith Heiter (2020)
    • Sandra Sukarieh (2020)
    • Joscha Cueppers (2019)
    • Divyam Saran (2019)
    • Osman Ali Mian (2019)
    • Simina Ana Cotop (2019)
    • Magnus Halbe (2018)
    • Maha Aburahma (2018)
    • Iva Farag-Baykova (2018)
    • Yuliia Brendel (2018)
    • Maike Eissfeller (2018)
    • Boris Wiegand (2018)
    • Tatiana Dembelova (2018)
    • Robin Burghartz (2017)
    • Henrik Jilke (2017)
    • Benjamin Hättasch (2017)
    • Amirhossein Baradaranshahroudi (2016)
    • Apratim Bhattacharyya (2016)
    • Beata Wójciak (2016)
    • Margarita Salyaeva (2016)
    • Manan Gandhi (2016)
    • Kathrin Grosse (2016)
    • Kailash Budhathoki (2015)
    • Panagiotis Mandros (2015)
    • Thomas Van Brussel (2012)
    • Tanja Van den Eede (2011)
    • Sandy Moens (2010)
    • Andie Similon (2010)
    • Sander Schuckmann (2008)
  • Former BSc Thesis Students
    • Matthias Wilms (2021)
    • Daniel Kindler (2021)
    • Frauke Hinrichs (2017)
    • Magnus Halbe (2016)
    • Stefan Bier (2014)
  • Former Research Assistants
    • Khánh Hiep Tran
    • Grégoire Pacreau
    • Michael Hedderich
    • Patrick Ferber
    • Shweta Mahajan
    • Tobias Heinen
    • Cristian Caloian
    • David Ziegler
    • Stefan Neumann
    • Andrea Fuksova
    • Eustace Ebhotemhen
    • Shilpa Garg
    • Sinan Bozca
    • Michael Wessely

Selected Recent Publications (go here for the complete list)
2021
Fischer, J & Vreeken, J Differentiable Pattern Set Mining. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'21), ACM, 2021 (15.4% acceptance rate).
Coupette, C & Vreeken, J Graph Similarity Description: How Are These Graphs Similar?. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'21), ACM, 2021 (15.4% acceptance rate).
Fischer, J, Oláh, A & Vreeken, J What's in the Box? Explaining Neural Networks with Robust Rules. In: Proceedings of the International Conference on Machine Learning (ICML), PMLR, 2021 (21.4% acceptance rate).
Budhathoki, K, Boley, M & Vreeken, J Discovering Reliable Causal Rules. In: Proceedings of the SIAM International Conference on Data Mining (SDM), SIAM, 2021 (21.2% acceptance rate).
Wiegand, B, Klakow, D & Vreeken, J Mining Easily Understandable Models from Complex Event Data. In: SIAM International Conference on Data Mining (SDM), SIAM, 2021 (21.2% acceptance rate).
Kalofolias, J, Welke, P & Vreeken, J SUSAN: The Structural Similarity Random Walk Kernel. In: Proceedings of the SIAM International Conference on Data Mining (SDM), SIAM, 2021 (21.2% acceptance rate).
Mian, OA, Marx, A & Vreeken, J Discovering Fully Oriented Causal Networks. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), AAAI, 2021 (21.3% acceptance).
Schmidt, F, Marx, A, Baumgarten, N, Hebel, M, Wegner, M, Kaulich, M, Leisegang, M, Brandes, R, Göke, J, Vreeken, J & Schulz, MH Integrative Analysis of Epigenetics Data Identifies Gene-Specific Regulatory Elements. Nucleic Acids Research, Oxford University Press (IF 16.97)
Dutta, A, Vreeken, J, Ghiringhelli, L & Bereau, T Data-driven Equation for Drug-Membrane Permeability across Drugs and Membranes. Journal of Chemical Physics vol.24(154), AIP, 2021. (IF 2.991)
2020
Dalleiger, S & Vreeken, J The Relaxed Maximum Entropy Distribution and its Application to Pattern Discovery. In: Proceedings of the IEEE International Conference on Data Mining (ICDM'20), IEEE, 2020 (19.7% acceptance rate).
Cueppers, J & Vreeken, J Just Wait For It... Mining Sequential Patterns with Reliable Prediction Delays. In: Proceedings of the IEEE International Conference on Data Mining (ICDM'20), IEEE, 2020 (full paper, 9.8% acceptance rate; overall 19.7%). (invited for the KAIS Special Issue on the Best of IEEE ICDM 2020)
Fischer, J & Vreeken, J Discovering Succinct Pattern Sets Expressing Co-Occurrence and Mutual Exclusivity . In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'20), ACM, 2020 (16.8% acceptance rate).
Mandros, P, Kaltenpoth, D, Boley, M & Vreeken, J Discovering Functional Dependencies from Mixed-Type Data. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'20), ACM, 2020 (16.8% acceptance rate).
Penerath, F, Mandros, P & Vreeken, J Discovering Approximate Functional Dependencies using Smoothed Mutual Information . In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'20), ACM, 2020 (16.8% acceptance rate).
Zhang, Y, Humbert, M, Surma, B, Manoharan, P, Vreeken, J & Backes, M Towards Plausible Graph Anonymization. In: Proceedings of the Network and Distributed System Security Symposium (NDSS), The Internet Society, 2020 (17.4% acceptance rate).
Dalleiger, S & Vreeken, J Explainable Data Decompositions. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI'20), AAAI, 2020 (oral presentation 4.5% acceptance rate; overall 20.6%).
Belth, C, Zheng, X, Vreeken, J & Koutra, D What is Normal, What is Strange, and What is Missing in a Knowledge Graph. In: Proceedings of the Web Conference (WWW), ACM, 2020 (oral presentation; overall acceptance rate 19.2%).
Mandros, P, Boley, M & Vreeken, J Discovering Dependencies with Reliable Mutual Information. Knowledge and Information Systems vol.62, pp 4223-4253, Springer, 2020. (IF 2.936)
Sutton, C, Boley, M, Ghiringhelli, L, Rupp, M, Vreeken, J & Scheffler, M Identifying Domains of Applicability of Machine Learning Models for Materials Science. Nature Communications vol.11(4428), pp 1-9, Nature Research, 2020. (IF 12.12)
2019
Kalofolias, J, Boley, M & Vreeken, J Discovering Robustly Connected Subgraphs with Simple Descriptions. In: Proceedings of the IEEE International Conference on Data Mining (ICDM), IEEE, 2019 (18.5% acceptance rate).
Mandros, P, Boley, M & Vreeken, J Discovering Reliable Correlations in Categorical Data. In: Proceedings of the IEEE International Conference on Data Mining (ICDM'19), IEEE, 2019 (18.5% acceptance rate).
Fischer, J & Vreeken, J Sets of Robust Rules, and How to Find Them. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Data (ECMLPKDD), Springer, 2019 (17.7% acceptance rate).
Marx, A & Vreeken, J Identifiability of Cause and Effect using Regularized Regression. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'19), ACM, 2019 (oral presentation 9.2% acceptance rate; overall 14.2%).
Mandros, P, Boley, M & Vreeken, J Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms (Extended Abstract). In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), IJCAI, 2019. (Invited contribution to the IJCAI Sister Conference Best Paper Track)
Kaltenpoth, D & Vreeken, J We Are Not Your Real Parents: Telling Causal From Confounded by MDL. In: SIAM International Conference on Data Mining (SDM), SIAM, 2019 (22.9% acceptance rate).