Jilles Vreeken
Independent Research Group Leader (W2)
Exploratory Data Analysis
Cluster of Excellence MMCI
Saarland University
Senior Researcher
Databases and Information Systems
Max Planck Institute for Informatics
Saarland Informatics Campus
Building E 1.7 Room 3.22
66123 Saarbrücken, Germany
jilles@mpi-inf.mpg.de
+49 681 302 71 925
Jilles at work in South Africa

Since October 2013, I lead the independent research group on Exploratory Data Analysis at the DFG cluster-of-excellence on Multimodal Computing and Interaction at the University of Saarland. In addition, I'm affiliated as
Senior Researcher with the Database and Information Systems (D5) group of the Max Planck Institute for Informatics.

My research is mainly concerned with exploratory data mining. That is, I develop theory and algorithms for answering the question `this is my data, tell me what I need to know'. To identify what you need to know, i.e., what is the most interesting structure in the data, I often employ well-founded statistical methods. In particular, Information Theory — the principles of Minimum Description Length (MDL) and Maximum Entropy have proven to be highly valuable tools. Next, I develop highly efficient algorithms for extracting these interesting structures, i.e., models, from very large and complex data—as well as investigate how we can use these structures in a wide range of applications, including identifying rare diseases, e-health, bio-informatics, market analysis, product recommendation, etc.


I'm always looking for talented and motivated PhD candidates, postdocs, and HiWi's
with a strong background in data mining, machine learning, statistics, and/or mathematics.


Currently I'm investigating techniques for identifying informative local structures in large collections of complex data; how to efficiently mine good data descriptions directly such data; the theoretical and practical foundations of interactive exploration of very large data, discovering things by serendipity; how to mine large relational databases; how to mine very large graphs, including characterising influence propagation in social networks; as well as to study well-founded approaches for meaningfully comparing between, and validation of, explorative results.

Below, you'll find an overview of my activities, as well as a selection of my recent publications. You might further be interested in my publications, implementations, our workshop on Interactive Data Exploration and Analytics (IDEA) at KDD'16, or our tutorials on Information Theoretic Methods in Data Mining at ECML PKDD'14 and SIAM SDM'15.


or, in case you're looking for a bit of procrastination, consider
Research in Progress — the secret life of research, through the medium of animated GIFs.


Activities more ▾

Teaching and Advising more ▾
  • Researchers and Assistants
    • Dr. Mario Boley
    • Kailash Budhathoki
    • Janis Kalofolias
    • Panagiotis Mandros
    • Alexander Marx
    • Roel Bertens
    • Amirhossen Baradaranshahroudi
    • Iva Baykova
    • Robin Burghartz
    • Jonas Fischer
    • Patrick Ferber
    • Xingaong Gao
    • Magnus Halbe
    • Michael A. Hedderich
    • Frauke Hinrichs
  • Former MSc Thesis Students
    • Amirhossein Baradaranshahroudi (2016)
    • Apratim Bhattacharyya (2016)
    • Beata Wójciak (2016)
    • Margarita Salyaeva (2016)
    • Manan Gandhi (2016)
    • Kathrin Grosse (2016)
    • Kailash Budhathoki (2015)
    • Panagiotis Mandros (2015)
    • Thomas Van Brussel (2012)
    • Tanja Van den Eede (2011)
    • Sandy Moens (2010)
    • Andie Similon (2010)
    • Sander Schuckmann (2008)
  • Former BSc Students
    • Magnus Halbe (2016)
    • Stefan Bier (2014)
  • Former Research Assistants
    • Shweta Mahajan
    • Sebastian Brust
    • Sinan Bozca
    • Cristian Caloian
    • Eustace Ebhotemhen
    • Andrea Fuksova
    • Shilpa Garg
    • Tobias Heinen
    • Stefan Neumann
    • Michael Wessely
    • David Ziegler

Selected Recent Publications (go here for the complete list)
2016
Budhathoki, K & Vreeken, J Causal Inference by Compression. In: Proceedings of the IEEE International Conference on Data Mining (ICDM'16), IEEE, 2016 (regular paper, 8.5% acceptance rate; overall 19.6%).
Bertens, R, Vreeken, J & Siebes, A Keeping it Short and Simple: Summarising Complex Event Sequences with Multivariate Patterns. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'16), pp 735-744, ACM, 2016 (oral presentation, 8.9% acceptance rate; overall 18.1%).videowebsite
Rozenshtein, P, Gionis, A, Prakash, BA & Vreeken, J Reconstructing an Epidemic over Time. In: Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp 1835-1844, ACM, 2016 (overall 18.1% acceptance rate).website
Nguyen, H-V & Vreeken, J Flexibly Mining Better Subgroups. In: Proceedings of the SIAM International Conference on Data Mining (SDM), pp 585-593, SIAM, 2016 (overall 25% acceptance rate).implementation
website
Nguyen, H-V, Mandros, P & Vreeken, J Universal Dependency Analysis. In: Proceedings of the SIAM International Conference on Data Mining (SDM), pp 792-800, SIAM, 2016 (overall 25% acceptance rate).implementation
website
Nguyen, H-V & Vreeken, J Linear-time Detection of Non-Linear Changes in Massively High Dimensional Time Series. In: Proceedings of the SIAM International Conference on Data Mining (SDM), pp 828-836, SIAM, 2016 (overall 25% acceptance rate).implementation
website
Athukorala, K, Glowacka, D, Jacucci, G, Oulasvirta, A & Vreeken, J Is Exploratory Search Different? A Comparison of Information Search Behavior for Exploratory and Lookup Tasks. Journal of the Association for Information Science and Technology (JASIST) vol.67(11), pp 2635-2651, Wiley, 2016. (IF 2.26)
2015
Pienta, R, Lin, Z, Kahng, M, Vreeken, J, Talukdar, PP, Abello, J, Parameswaran, G & Chau, DH AdaptiveNav: Adaptive Discovery of Interesting and Surprising Nodes in Large Graphs. In: Proceedings of the IEEE Conference on Visualization (VIS), IEEE, 2015.video
Budhathoki, K & Vreeken, J The Difference and the Norm – Characterising Similarities and Differences between Databases. In: Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), pp 206-223, Springer, 2015.
website
Nguyen, H-V & Vreeken, J Non-Parametric Jensen-Shannon Divergence. In: Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), pp 173-189, Springer, 2015.website
Vreeken, J Causal Inference by Direction of Information. In: Proceedings of the SIAM International Conference on Data Mining (SDM), pp 909-917, SIAM, 2015.website
Karaev, S, Miettinen, P & Vreeken, J Getting to Know the Unknown Unknowns: Destructive-Noise Resistant Boolean Matrix Factorization. In: Proceedings of the SIAM International Conference on Data Mining (SDM), pp 325-333, SIAM, 2015.implementation
Sundareisan, S, Vreeken, J & Prakash, BA Hidden Hazards: Finding Missing Nodes in Large Graph Epidemics. In: Proceedings of the SIAM International Conference on Data Mining (SDM), pp 415-423, SIAM, 2015.
Koutra, D, Kang, U, Vreeken, J & Faloutsos, C Summarizing and Understanding Large Graphs. Statistical Analysis and Data Mining vol.8(3), pp 183-202, Wiley, 2015.website
Zimek, A & Vreeken, J The Blind Men and the Elephant: About Meeting the Problem of Multiple Truths in Data from Clustering and Pattern Mining Perspectives. Machine Learning vol.98(1), pp 121-155, Springer, 2015. (IF 1.587)
2014
Athukorala, K, Oulasvirta, A, Glowacka, D, Vreeken, J & Jaccuci, G Narrow or Broad? Estimating Subjective Specificity in Exploratory Search. In: Proceedings of ACM Conference on Information and Knowledge Management (CIKM), pp 819-828, ACM, 2014 (IR track full paper, overall 21% acceptance rate).
Kuzey, E, Vreeken, J & Weikum, G A Fresh Look on Knowledge Bases: Distilling Named Events from News. In: Proceedings of ACM Conference on Information and Knowledge Management (CIKM), pp 1689-1698, ACM, 2014 (KM track full paper, overall 21% acceptance rate).
Nguyen, H-V, Müller, E, Vreeken, J & Böhm, K Multivariate Maximal Correlation Analysis. In: Proceedings of the International Conference on Machine Learning (ICML), pp 775-783, JMLR: W&CP vol.32, 2014 (25.0% acceptance rate).implementation
Koutra, D, Kang, U, Vreeken, J & Faloutsos, C VoG: Summarizing and Understanding Large Graphs. In: Proceedings of the SIAM International Conference on Data Mining (SDM), pp 91-99, SIAM, 2014. (fast track journal invitation, as one of the best of SDM'14; full paper with presentation, 15.4% acceptance rate)implementation
Vreeken, J & Tatti, N Interesting Patterns. In: Aggarwal, CC & Han, J (eds) Frequent Pattern Mining, pp 105-134, pp 105-134, Springer, 2014.
van Leeuwen, M & Vreeken, J Mining and Using Sets of Patterns through Compression. In: Aggarwal, CC & Han, J (eds) Frequent Pattern Mining, pp 165-198, pp 165-198, Springer, 2014.
Zimek, A, Assent, I & Vreeken, J Frequent Pattern Mining Algorithms for Data Clustering. In: Aggarwal, CC & Han, J (eds) Frequent Pattern Mining, pp 403-424, pp 403-424, Springer, 2014.
Miettinen, P & Vreeken, J mdl4bmf: Minimal Description Length for Boolean Matrix Factorization. Transactions on Knowledge Discovery from Data vol.8(4), pp 1-30, ACM, 2014. (IF 1.68)implementation
Wu, H, Vreeken, J, Tatti, N & Ramakrishnan, N Uncovering the Plot: Detecting Surprising Coalitions of Entities in Multi-Relational Schemas. Data Mining and Knowledge Discovery vol.28(5), pp 1398-1428, Springer, 2014. (IF 2.877) (ECML PKDD'14 Journal Track)
Nguyen, H-V, Müller, E, Vreeken, J & Böhm, K Unsupervised Interaction-Preserving Discretization of Multivariate Data. Data Mining and Knowledge Discovery vol.28(5), pp 1366-1397, Springer, 2014. (IF 2.877) (ECML PKDD'14 Journal Track)implementation