Jilles Vreeken
Independent Research Group Leader (W2)
Exploratory Data Analysis group
Cluster of Excellence MMCI
Saarland University
Senior Researcher
Databases and Information Systems
Max Planck Institute for Informatics
Campus E 1.7 Room 2.04
66123 Saarbrücken, Germany
jilles@mpi-inf.mpg.de
+49 681 302 71 925
+49 681 302 70 155
Jilles at work at Zion National Park
Jilles at work at Zion National Park

Since October 2013, I lead the independent research group on Exploratory Data Analysis at the DFG cluster-of-excellence on Multimodal Computing and Interaction at the University of Saarland. In addition, I'm affiliated as
Senior Researcher with the Database and Information Systems (D5) group of the Max Planck Institute for Informatics.

My research is mainly concerned with exploratory data mining. That is, I develop theory and algorithms for answering the question `this is my data, tell me what I need to know'. To identify what you need to know, i.e., what is the most interesting structure in the data, I often employ well-founded statistical methods. In particular, Information Theory — the principles of Minimum Description Length (MDL) and Maximum Entropy have proven to be highly valuable tools. Next, I develop highly efficient algorithms for extracting these interesting structures, i.e., models, from very large and complex data—as well as investigate how we can use these structures in a wide range of applications, including identifying rare diseases, e-health, bio-informatics, market analysis, product recommendation, etc.


I'm always looking for talented and motivated PhD candidates, postdocs, and HiWi's
with a strong background in data mining, machine learning, statistics, and/or mathematics.


Currently I'm investigating techniques for identifying informative local structures in large collections of complex data; how to efficiently mine good data descriptions directly such data; the theoretical and practical foundations of interactive exploration of very large data, discovering things by serendipity; how to mine large relational databases; how to mine very large graphs, including characterising influence propagation in social networks; as well as to study well-founded approaches for meaningfully comparing between, and validation of, explorative results.

Below, you'll find an overview of my activities, as well as a selection of my recent publications. You might further be interested in my publications, implementations, our upcoming tutorial on Information Theoretic Methods in Data Mining at ECML PKDD'14, or our workshop on Interactive Data Exploration and Analytics (IDEA) at KDD'14.


or, in case you're looking for a bit of procrastination, consider
Research in Progress — the secret life of research, through the medium of animated GIFs.


Activities more ▾
  • Organisation & Invited Talks
  • Awards & Grants
    • KDD'11 Best Student Paper Award for 'Tell Me What I Need to Know'
    • ACM SIGKDD Doctoral Dissertation Award 2010 Runner-Up
    • ECML PKDD'09 Best Student Paper Award for 'Identifying the Components'
    • Young Researcher at the Heidelberg Laureate Forum 2014, Heidelberg, Germany.
    • Independent Research Group 'Exploratory Data Analysis' at the Cluster of Excellence MMCI at U.Saarland ('13–'18)
    • Research Project 'Instant, Interactive & Adaptive Data Mining' of the Research Foundation – Flanders (FWO) ('12–'15)
    • Post-Doctoral Fellowship of the Research Foundation – Flanders (FWO) ('10–'13)
    • UA-BOF-KP Small Project (2010)
    • UA-BOF-IWS Postdoctoral Researcher ('09–'10)
  • Journal Reviewing
    • Member of the Guest Editorial Board for the ECML PKDD Journal Track '13'15
    • Data Mining and Knowledge Discovery (DAMI)
    • Transactions on Knowledge Discovery and Data Mining (TKDD)
    • Transactions on Knowledge and Data Engineering (TKDE)
    • Maching Learning journal (MLj)
    • Information Systems (IS)
    • Knowledge and Information Systems (KAIS)
    • Social Network Analysis and Mining (SNAM)
    • Statistical Analysis and Data Mining (SAM)
    • Transactions on Intelligent Systems and Technology (TIST)
  • Program Committees
    • ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) '10'14
    • ACM International Conference on Knowledge and Information Management (CIKM) '12'13
    • IEEE International Conference on Data Mining (ICDM) '12, '14
    • IEEE International Conference on Data Engineering (ICDE) '13
    • SIAM Conference on Data Mining (SDM) '10,'11,'15
    • European Conference on Machine Learning and Principles and Practice of Knowledge Discovery
      in Databases (ECML PKDD) '08'14, area chair '14
    • European Conference on Artificial Intelligence (ECAI) '14
    • Intelligent User Interfaces (IUI) senior PC '15
    • International Conference on Advances in Social Network Analysis and Mining (ASONAM) '12
    • International Conference on Pattern Recognition Applications and Methods (ICPRAM) '12
    • Belgian-Dutch Conference on Machine Learning (BENELEARN) '13
    • Workshop on Big Graph Mining (BGM) '14)
    • Workshop on Optimization Methods for Anomaly Detection (OMAD) '14
    • Workshop on Practical Theories for Exploratory Data Mining (PTDM) '12
    • Workshop on Discovering, Summarizing and Using Multiple Clusterings (MultiClust) '11'13
    • Workshop From Local Patterns to Global Models (LeGo) '08'09

Teaching more ▾
  • Graduate Courses
    • Topics in Information Theory and its Applications (WS'15)
    • Topics in Algorithmic Data Analysis (SS'14)
    • Advanced Data Mining ('09–'13)
    • Project Databases ('09–'10)
    • Database Security ('09–'10)
  • Undergraduate Courses
    • Artificial Intelligence ('12–'13)
    • Introduction to Artificial Intelligence ('09–'12)
    • Introduction to Data Mining ('09–'11)
    • Internet Programming ('06–'08)
    • Databases ('05–'06)
  • Graduate Students
    • Sinan Bozca
    • Kailash Budhathoki
    • Shilpa Garg
    • Manan Ghandi
    • Stefan Neumann
    • Dr. Koen Smets (16 May 2012) PhD thesis of Koen Smets
    • Dr. Michael Mampaey (21 Oct 2011) PhD thesis of Michael Mampaey
    • Thomas Van Brussel, MSc (2012)
    • Tanja Van den Eede, MSc (2011)
    • Sandy Moens, MSc (2010)
    • Andie Similon, MSc (2010)
    • Sander Schuckmann, MSc (2009)

Selected Recent Publications (go here for the complete list)
In Press
Miettinen, P & Vreeken, J mdl4bmf: Minimal Description Length for Boolean Matrix Factorization. Transactions on Knowledge Discovery from Data, pp 1-30, ACM (IF 1.68) (In press)implementation
2014
Atukorala, K, Oulasvirta, A, Glowacka, D, Vreeken, J & Jaccuci, G Narrow or Broad? Estimating Subjective Specificity in Exploratory Search. In: Proceedings of ACM Conference on Information and Knowledge Management (CIKM'14), ACM, 2014. (IR track full paper, overall 21% acceptance rate)
Kuzey, E, Vreeken, J & Weikum, G A Fresh Look on Knowledge Bases: Distilling Named Events from News. In: Proceedings of ACM Conference on Information and Knowledge Management (CIKM'14), ACM, 2014. (KM track full paper, overall 21% acceptance rate)
Nguyen, H-V, Müller, E, Vreeken, J & Böhm, K Multivariate Maximal Correlation Analysis. In: Proceedings of the International Conference on Machine Learning (ICML'14), JMLR: W&CP vol.32, 2014. (25.0% acceptance rate)implementationwebsite
Koutra, D, Kang, U, Vreeken, J & Faloutsos, C VoG: Summarizing and Understanding Large Graphs. In: Proceedings of the SIAM International Conference on Data Mining (SDM'14), SIAM, 2014. (fast track journal invitation, as one of the best of SDM'14; full paper with presentation, 15.4% acceptance rate)implementationwebsite
Vreeken, J & Tatti, N Interesting Patterns. In: Aggarwal, CC & Han, J (eds) Frequent Pattern Mining, pp 105-134, pp 105-134, Springer, 2014.
van Leeuwen, M & Vreeken, J Frequent Pattern Mining and Compression - Mining Useful Patterns by MDL. In: Aggarwal, CC & Han, J (eds) Frequent Pattern Mining, pp 165-198, pp 165-198, Springer, 2014.
Zimek, A, Assent, I & Vreeken, J Frequent Pattern Mining Algorithms for Data Clustering. In: Aggarwal, CC & Han, J (eds) Frequent Pattern Mining, pp 403-424, pp 403-424, Springer, 2014.
Wu, H, Vreeken, J, Tatti, N & Ramakrishnan, N Uncovering the Plot: Detecting Surprising Coalitions of Entities in Multi-Relational Schemas. Data Mining and Knowledge Discovery vol.28(5), pp 1398-1428, Springer, 2014. (IF 2.877) (ECML PKDD'14 Journal Track)
Nguyen, H-V, Müller, E, Vreeken, J & Böhm, K Unsupervised Interaction-Preserving Discretization of Multivariate Data. Data Mining and Knowledge Discovery vol.28(5), pp 1366-1397, Springer, 2014. (IF 2.877) (ECML PKDD'14 Journal Track)implementationwebsite
Prakash, BA, Vreeken, J & Faloutsos, C Efficiently Spotting the Starting Points of an Epidemic in a Large Graph. Knowledge and Information Systems vol.38(1), pp 35-59, Springer, 2014. (IF 2.225)implementationwebsite
Webb, G & Vreeken, J Efficient Discovery of the Most Interesting Associations. Transactions on Knowledge Discovery from Data vol.8(3), pp 1-31, ACM, 2014. (IF 1.68)implementation
2013
Akşehirli, E, Goethals, B, Müller, E & Vreeken, J Cartification: A Neighborhood Preserving Transformation for Mining High Dimensional Data. In: Proceedings of the IEEE International Conference on Data Mining (ICDM'13), pp 937-942, IEEE, 2013. (19.6% acceptance rate)website
Ramon, J, Miettinen, P & Vreeken, J Detecting Bicliques in GF[q]. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD'13), pp 509-524, Springer, 2013.implementation
Kontonasios, K-N, Vreeken, J & De Bie, T Maximum Entropy Models for Iteratively Identifying Subjectively Interesting Structure in Real-Valued Data. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD'13), pp 256-271, Springer, 2013.implementation
Nguyen, HV, Müller, E, Vreeken, J, Keller, F & Böhm, K CMI: An Information-Theoretic Contrast Measure for Enhancing Subspace Cluster and Outlier Detection. In: Proceedings of the SIAM International Conference on Data Mining (SDM'13), pp 198-206, SIAM, 2013. (oral presentation, 14.4% acceptance rate; overal 25%)website
Akoglu, L, Vreeken, J, Tong, H, Chau, DH, Tatti, N & Faloutsos, C Mining Connection Pathways for Marked Nodes in Large Graphs. In: Proceedings of the SIAM International Conference on Data Mining (SDM'13), pp 37-45, SIAM, 2013. (oral presentation, 14.4% acceptance rate; overal 25%)implementation
Mampaey, M & Vreeken, J Summarizing Categorical Data by Clustering Attributes. Data Mining and Knowledge Discovery vol.26(1), pp 130-173, Springer, 2013. (IF 2.877)implementation