Homepage of Peter Boncz
Since March 2024 I am group leader the Database Architectures research group of CWI. Before, I served in the CWI management team (from 2017 on), where I was responsible for the
Machine Learning,
Database Architectures and
Human-centered Data Analytics research groups.
Professor at the Vrije Universiteit Amsterdam in the special chair of Large-Scale Analytical Data Management (about my research).
Architect of the database systems MonetDB, VectorWise (aka Actian Vector) and VectorH (VectorWise-on-Hadoop). Have been involved in 6 spin-off or startup companies in the area of data management.
has my online CV
Hiring: I tend to have PhD and post-doc positions open for cutting-edge research in the topics outlined below. Approach me (email boncz at cwi dot nl) to apply.
Teaching:
For Amsterdam Data Science I have been developing new education in Big Data technologies, teaching the architectural principles of frameworks like Spark, and how to use such systems to analyze large datasets on clusters in the cloud
(together with Hannes Muehleisen):
Prior to this, I used to teach the Databases bachelor course at VU, and even earlier the Database II course at UvA.
Awards:
Industry Work:
- MotherDuck (2022-). I have been spending sabbatical time at MotherDuck, a startup co-founded by DuckDB Labs, which has the goal of putting DuckDB in the cloud for analysis applications. It does so with a twist: it exploits the fact that DuckDB is always present on the client, even if that client is a web-browser (through DuckDB-WASM, in that case); so cloud and client can work together. This allows for very low-latency queries on local data (see my HPTS2024 presentation 60 frames per second cloud databases), as well as to reduce cloud compute cost by doing more work on clients. See also the MotherDuck Lecture and the CIDR2024 Hybrid Query Execution presentation.
- DuckDB Labs (2021-). DuckDB pays a debt generated for years by the database research community: providing a quality (usable and competent) analytical and portable database system in open source. DuckDB Labs is not my spin-off: it was founded by original DuckDB creators Hannes Muehleisen and Mark Raasveldt, with me this time playing the role of technical and business advisor. The mission of DuckDB Labs is to enable a core team to maximize the impact of DuckDB, starting in data science but extending into cloud warehousing, in-browser analytics, secure and private analytics, mobile and edge analytics, as well as hardware-embedded uses, such as smart storage (and use cases we did not think of yet).
Try the [DuckDB] system.
- Databricks (2016-). I shortly hosted the Vectorwise team in Amsterdam at CWI, looking for a new home, that end of 2016 became the Databricks Amsterdam office. I have worked as advisor at Databricks Amsterdam since then, focusing on further talent development and acquisition to grow the office, as well as providing technical advise to the local team, which has worked on projects such as Delta Lake, the Photon vectorized query processing engine as well as BI workloads in Databricks.
- Celerata AG (2015-2016). As TUM Fellow and member of the database research group of TU Munich, I advised, helped and co-founded a small company around the HyPer JIT-compiling database system by Thomas Neumann. Within a year, this company was bought by Tableau, and the HyPer technology is part of each Tableau software installation, with Celerata morphing in the Munich R&D center of Tableau. With the acquisition of Tableau by Salesforce, it is exciting to see Hyper going into new services and workloads.
Try the [Hyper API] system.
- Vectorwise (2008-2010). In its time, the fastest-per-core analytical database system on earth, developed at CWI in the MonetDB/X100 project with Marcin Zukowski, Sandor Heman and Niels Nes. The company was funded and bought by Actian (the company that also sells the Ingres database).
Try the [Actian Vector] system.
- MonetDB BV (2008-2021). A holding company, directed by Martin Kersten, that holds the IP on open-source MonetDB users, participates in EU projects, and services commercial users via its MonetDB Solutions BV daughter. Since 2020 MonetDB BV has its own office, outside (but near) CWI.
Try the [MonetDB] system.
- Data Distilleries (1996-2002). Though technically not a co-founder, it was very close. My PhD topic MonetDB was the backend in the Data Distilleries Architecture, and I was responsible for its integration and also for research. Later I became chief architect; a role shared with Tim Ruhl, where we were responsible for the development of the entire product line. Data Distilleries grew to 100 people, but retreated to Europe when its aggressive US expansion ran aground in the 9-11 aftermath. In Europe it remained reasonably successful and had an average headcount for 40 over all of its existence. In 2003 it was acquired by SPSS, who two years later moved development to the US. Part of it still survive in what is now IBM's SPSS Modeler and Deployment products.
Noteworthy:
- The management team of CWI in summer 2024 decided to award the Dijkstra Fellowship to Marchin Zukowski. Marcin was my first PhD student and was instrumental in bringing vectorized query execution to fruitition during his PhD, for leading the Vectorwise CWI spin-off company, that proved the value of this technique in practice, and subsequently went on to become one of the three co-founders of Snowflake, co-creating the first analytical database service architected for the cloud, and where he led the engineering team in many of the first years. I am very proud of him!
- Gave a SIGMOD2024 keynote on "Making Data Management Better (with vectorized execution)". I hope it inspires researchers to look beyond just publishing more papers, but do research with impact, that makes data management better for others.
- Due to COVID, my keynote talk at AMW 2020 in Cusco, Peru was postponed, but I eventually delivered at AMW2023 in Santiago de Chile: SQL/PGQ: a systems perspective (Google slides with speaker notes)
- In Memoriam the unforgettable Martin Kersten, my mentor (1953-2022).
- Presented the Past, Present and Future of the Database Architectures research group at CWI at the SIKS 2022 day - an in-person event again!.
- keynote at EDBT2022 on The (sorry) State of Graph Database Systems.
- For the Dutch Health Ministry, I weas a member of the commission Digital Support for Combating COVID-19. One of the activities is overseeing the development and deployment of the Dutch CoronaMelder app (June 2020-February 2022).
- Proud of my first PhD student Marcin Zukowski who co-founded Snowflake, that built the first cloud-native data warehouse. Which of course thanks some of its speed to compressed columnar storage and vectorized execution.
- Proud to have drawn Databricks to Amsterdam, where they announced to be investing 100 million euros.
- Keynote on fine-grained co-clustering of data (BDCC) given at the HardBD International Workshop on Big Data Management on Emerging Hardware (co-located with ICDE 2018, Paris).
- October 17 2014 I held my Inaugural Lecture at Vrije Universiteit Amsterdam, where I became professor in the special chair of Large-Scale Analytical Data Management.
- Thomas Neumann and I started a blog for database architects. The idea is to publish news announcements about the database systems that the bloggers are involved with (HyPer, MonetDB, Vectorwise...), technical articles that are not really suited as scientific publications but still interesting, or teasers for papers or talks. See: http://databasearchitects.blogspot.com.
- keynote on the many common lessons and technologies applicable to very different data models at Graph-TA 2016 in Barcelona.
- keynote on the LDBC project at EDBT 2014 in Athens.
- keynote on the LDBC project at IDEAS 2013 in Barcelona.
- keynote on the interaction between Computer and Database architecture at the GI-Workshop on Foundations of Databases in Ilmenau (May 2013).
- keynote at the BDA 2011 (French database conference).
- Was invited in March 2010 to give a three-day PhD course on Column-Store Technologies at University of Warsaw, see
PhD Open. Student exercises and assignments here.
- At VLDB 2009 I presented a tutorial on Column oriented Database Systems (PDF) together with Daniel Abadi (Yale) and Stavros Harizopoulos (HP Labs).
- Together with prof. Divy Agrawal (UCSB), taught at the VLDB Summer School, held Aug 10-14 2009 in Shanghai. Speaking for 2.5 days non-stop to 100 Chinese DB PhD students turned out to be.. quite intense.
- Published an invited "research highlight" article in the Communications of the ACM journal, on the architecture-conscious research performed in the past decade in the context of the MonetDB project (co-authors Martin Kersten and Stefan Manegold -- both from CWI).
- On invitation from prof. Hector Garcia-Molina and dr. Meichun Hsu, I spoke at the 2008 InfoSeminar at Stanford University.
Projects:
Projects handled as the PI, from the project/resource perspective:
- MotherDuck CWI Research Collaboration 2022-. A series of grants from MotherDuck is funding CWI research on database systems.
- SQIREL (NWO Commit2Data) 2018-2023. The project focuses on querying and information retrieval on graph data that is continuously changing/growing. In conjunction with Radboud University Nijmegen (Arjen de Vries). Various companies are also involved: Databricks, neo4j, Spinque, Wizenoze.
- Databricks CWI Research Collaboration 2017-. A series of grants from Databricks is funding CWI research on cloud-based data science systems.
- relational.ai CWI Research Collaboration 2019-. A series of grants from relational.ai is funding research at CWI in the context of DuckDB and graph query processing, as well as vectorized Worst-Case Optimal Joins (WCOJ) and efficient storage.
- Actian CWI Research Collaboration 2010-2016, 2019-2021. A series of grants from Actian is funding research and (later) advise CWI in the context of Actian Vector (former Vectorwise).
- LDBC (EU-FP7-STREP) 2012-2015. This project formed a benchmark council for RDF and Graph database vendors to agree on RDF/graph benchmarks, benchmark practices and results; and will also develop an initial set of such benchmarks. Through my affiliation at VU University Amsterdam, I acted as scientific director. Project coordinator was Universidad Politecnica Barcelona (Josep Larriba Pey).
- LOD2 (EU-FP7-IP) 2010-2014. This project, led by Leipzig University (Soeren Auer), funded the activities of Pham Minh Duc's PhD in large-scale graph data management, and the sabbatical stay in 2012/2013 of Irini Fundulaki (FORTH) at CWI. The CWI net income is EUR 500K. In this project, CWI has been aiding RDF Store vendor Openlink on using columnar storage and vectorized execution in its Virtuoso product, which has significantly increased its performance.
- Querying while Transforming Large Graph Databases (NWO Open Competition) 2010-2014. This project funded Lefteris Sidirourgos' PhD project. The CWI net income is EUR 188K.
- MultimediaN (ICES-KIS) 2004-2008. Throughout this project I acted as leader of the multimedia databases work-package (N3), which comprised 14 FTE at CWI, Philips Research, and Technische Universiteit Twente. At CWI, the project funded two students: Sandor Heman, Marcin Zukowski and Sjoerd Mullender partly (scientific programmer). On a total budget of 1.8M EUR for MN-N3, the CWI net income was EUR 600K. The VectorWise spin-off came from the insights gained in this project.
- XIRAF (FES) 2007-2009. This project funded Lefteris Sidirourgos as junior researcher, helping to mature the MonetDB/XQuery system, and Nan Tang as post-doctoral researcher. The CWI net income was EUR 320K. One end result of this product was the XIRAF software that NFI deploys internally and for Dutch police services to conduct digital forensic investigation (e.g. child porn cases).
- Evidence-based effective monitoring and control of Covid-19 after the initial outbreak (ZonMw 10430022010001) 2020-2021. In this project I worked with epidemiologists on a quantitative evaluation of the Dutch CoronaMelder GAEN app. Disclaimer: Hans Heesterbeek and Mirjam Kretzschmar were the PIs on this short but wide-ranging project.
Systems:
Work on data systems:
- MotherDuck (2022-): The huge adoption of DuckDB meant that Hannes and Mark got a lot of interest from VCs to invest in DuckDB. Having learned, e.g. from the VectorWise experience, which first headed towards an open-source model never made it there, my advice was not to have VCs invest directly in DuckDB, but rather have DuckDB Labs co-found startups where VCs would invest in. This became the working model for MotherDuck, when we met Jordan Tigani of BigQuery fame to become its founder. Later, I was looking to go back to systems coding in a sabbatical and especially wanted to dive deep into DuckDB. Hence I decided to join MotherDuck as a developer ("intern") and to bootstrap its Amsterdam office from the CWI. Since then, it has been a blast and I have learned so much!
- DuckDB (2019-): Hannes Muehleisen and Mark Raasveldt are the driving forces behind this new CWI database system, the reunites the best known techniques from both VectorWise and HyPer in a no-nonsense, embedded database system that targets data science users (e.g., those now using python, numpy and pandas) with proper database technology. A project that puts usability first, even before research (if required) and is hassle-free, and completely functional from what users expect from an advanced analytical SQL system. It stands on the shoulders of giants, because all our experiences in MonetDB, VectorWise, HyPer, Spark and more, regarding not only database architecture but also software development techniques and testing methodology are being used to make DuckDB go where the other systems never went. Within three years, DuckDB is going beyond the 1M downloads/y mark, and we hope this is just the beginning.. Besides for conquering the data science world, DuckDB is also the database architecture laboratory for CWI research in the Database Architectures group (but increasingly also elsewhere), for instance into new execution, compression, storage and data models.
- Databricks Spark (2016-): Databricks hired the orphaned VectorWise team late 2016, forming the core of their first R& D center outside San Francisco, in Amsterdam. I act as an advisor on technical matters, local relationships and talent acquisition under a CWI-Databricks research agreement. Within a few years, the Amsterdam location grew over 200 strong, and specializes in query performance and and data storage. Some of the projects I have been involved in are the DBIO Parquet caching layer, and the new Photon execution engine developed from scratch in C++ (Delta Engine) as well as performance measurement collection and analysis.
- VectorWise (2004-2016): In 2003 after joining CWI after my Data Distilleries period, Stefan Manegold, Niels Nes and I performed a comparison of TPC-H Q1 performance hard-coded as a standalone program versus database systems that shattered our image of MonetDB as the pinnacle of analytical database performance. It led to an architecture that conserved the bulk-processing of MonetDB in a pipelined query processing architecture (aka "vectorized processing"). Initially intended to serve as the new MonetDB kernel it was named MonetDB/X100. Later, when a spin-off was founded to commercialize it (later acquired by Actian) it was renamed VectorWise. Apart from the vectorized processing, VectorWise led to many interesting sub-projects in
- lightweight compression methods
- cuckoo-hashing and best-effort hash-partitioning
- query execution using SIMD instructions
- rethinking I/O for modern storage (e.g. Flash)
- adaptive tuple layouts during query processing
- multi-core aware parallelization
- just-in-time query compilation
- cooperative scans for sharing concurrent I/O
- query execution on compressed data
- recycling of intermediates in non-materializing database systems and
- column-oriented multi-dimensional clustering.
and there is still some Actian-CWI collaboration ongoing.
- HyPer (2014-2018): winning the Humboldt Research Award comes with spending a year with a hosting German academic institution, in this case TU Munich, who had nominated me. I decided to spend the prize money on travel, and for two years worked Thursday and Friday in Munich. Still being appointed Fellow at TUM, this collaboration continued for many more years, though the frequency was reduced to one day a month. In Munich I spent a lot of time with the extremely talented Thomas Neumann and Viktor Leis and their very smart PhD students, leading to quite some scientific results, e.g. in (morsel-driven) query execution, storage (data blocks), compression (FSST), query optimization (JOB, MSCN) and indexing (Cache-Sectorized Bloom Filters, Tree-Encoded Bitmaps). The HyPer JIT-compiler directly generates assembly-like llvm IR, which has a steep learning curve, so my only deep coding project implemented partial buffering and prefetching in hash joins -- never published because I was unconvinced about the robustness of the performance improvement and its software complexity tradeoff (a very similar technique was independently published later by Andy Pavlo as Relaxed Operator Fusion). So my biggest HyPer contribution may have been to stimulate a spin-off around it (Celerata) and making the match with Tableau to acquire that. Since 2018 TUM works on the awesome Umbra project (now called CompDB), a HyPer rewrite where everything is so slightly better.
- MonetDB (1994-2018): without too much explanation Martin Kersten handed me two large source code files (gdk.mx and monet.mx) to start working from which I did. Slowly I learned about database architecture and why the code he gave me was unique. It subsequently led to quite a bit of re-architecting work in the interpreter, type system, algorithms, storage and scientific results of which the cache-conscious query processing algorithms are most well known. I interrupted my PhD job 1999 to work at Data Distilleries; which used MonetDB. In 2002, I completed my PhD thesis on MonetDB (Monet: a next-Generation DBMS Kernel For Query-Intensive Applications). Since then, under the enthusiastic lead of Martin Kersten the whole database group of CWI has become involved and carries the MonetDB open-source project forwards.
- RDF data management (2012-2016): while graph data management usually is very different from RDF data management there is an overlap as RDF is in essence a graph data model. When traversing large graphs using many (self-)join steps over the edge table, intermediate results can easily explode. We are interested in finding indexing methods and query optimization strategies that can make executing complex large-scale graph queries affordable and manageable. Additionally, there is interest in the performance and benchmarking of RDF systems (inclusive the nascent RDF support in MonetDB).
- SCIBORQ (2010-2012): the SCIBORQ project (Scientific data management with Bounds On Runtime and Quality) aims to trade accuracy in query processing for extra speed. While query processing on samples has been studied in the past, such work is typically not acceptable for scientific purposes, as it fails to provide guarantees on the accuracy of results. We are developing new weighted sampling methods that help focus sampling on those data areas relevant for a workload, allowing to provide high-quality answers while operating on the minority of the data.
- XRPC (2006-2010): with Jennie Zhang I worked on P2P and distributed XML query processing. The scope of this work started broader as Ambient using P2P query processing. The work focused on querying XML data on the web, by automatically splitting XQuery queries in single SOAP message exchanges, that retain the semantics of the original XQuery (this is hard, due to node identity and document order). The results here are really nice, producing minimal message cost, both in terms of number of interactions (latency) as data volume (bandwidth) thanks to a new dynamic version of XML projection that makes sure only the needed parts of an XML document are sent over the wire.
- MonetDB/XQuery (2004-2009): related to the above, but with the help of Teggy Grust's team (Konstanz, Munich, Tuebingen) who developed the Pathfinder XQuery-RelationalAlgebra compiler we created the fastest system for processing huge XML documents with complex queries so far. This system was used as the backbone of the XIRAF system for digital forensics at NFI (the Dutch Forensic Institute). The secret of its speed, were the high-quality compilation techniques applied as well as the extremely fast (linear) staircase join algorithms.
- AmbientDB (2003-2006): a vision of P2P distributed query processing that was developed jointly with Willem Fontijn of Philips Research, who envisioned using this as middleware for ambient intelligence consumer electronics applications. During two years, Caspar Treijtel worked on this topic as a PhD student, but he did not finish.
Current research interests:
- database architecture: the art of crafting DBMS software, the core stuff. A special interest in column stores.
- computer architecture: the interaction between a DBMS and modern hardware. Also interested in large-scale hardware, be it many-core, TB memories and MapReduce clusters.
- adaptive storage and query processing: algorithms and storage that adapt automatically to a workload, and is able to exploit value- and structure-correlations in the data.
- graph databases and RDF: managing and querying huge graph-shaped data. This may include diverse topics as data integration, complex traversals, top-K over graph aggregates and inferencing.
Alumni:
Post-Docs:
- Nan Tang (post-doc in XIRAF 2008-2010). Moved to Edinburgh University, and then to QCRI.
- Renzo Angles (post-doc in LDBC 2013-2014). Now back in Talca University.
- Eyal Rozenberg (post-doc in 2015-2016). Now working independently on GPU data processing in Haifa.
- Gábor Szárnyas (post-doc in SQIREL 2020-2023). Now at DuckDB Labs.
PhD Students:
MSc Students:
- Menzo Windhouwer, Universiteit van Amsterdam, Distributed Query Execution on Monet (1997).
- Marcin Zukowski, MIMUW+Vrije Universiteit Amsterdam (@CWI), Parallel Query Execution in Monet on SMP machines (2002).
- Brahmananda Sapkota, Technische Universiteit Twente(@CWI), Design of Peer-to-Peer Protocol for AmbientDB (2003).
- Anna Jancarikova, Vrije Universiteit Amsterdam (@CWI), Distributed query processing over a peer-to-peer network (2003).
- Jan Rittinger, Konstanz University (@CWI), Pathfinder/MonetDB: A Relational Runtime for XQuery (2004).
- Arjan Scherpenisse, Universiteit van Amsterdam (@CWI), Giving Music more Brains: A study in music-metadata management (2005).
- Wouter Alink, Technische Universiteit Twente (@CWI+NFI), XIRAF: an XML-IR Approach to Digital Forensics (2005).
- Marco Antonelli, University Roma Tre (@CWI), A SPARQL front-end for MonetDB (2008).
- Fabian Nagel, Tuebingen University (@CWI+VectorWise), Recycling Intermediate Results in Pipelined Query Evaluation (2010).
- Kamil Anikiej, MIMUW+Vrije Universiteit Amsterdam (@Actian), Multi-core parallelization of vectorized query execution (2010).
- Alicja Luszczak, MIMUW+Vrije Universiteit Amsterdam (@Actian), Simple Solutions for Compressed Execution in Vectorized Database System (2011).
- Juliusz Sompolski, MIMUW+Vrije Universiteit Amsterdam (@Actian), Just-in-time Compilation in Vectorized Query Execution (2011).
- Michal Switakowski, MIMUW+Vrije Universiteit Amsterdam (@Actian), Integrating Cooperative Scans in a column-oriented DBMS (2011).
- Andrei Costea + Adrian Ionescu, Vrije Universiteit Amsterdam (@Actian), Query Optimization and Execution in Vectorwise MPP (2012).
- Bogdan Raducanu, Universitatea Politehnica Bucuresti+Vrije Universiteit Amsterdam (@Actian), Micro Adaptivity in a Vectorized Database System (2012).
- Linnea Passing, Augsburg University+TUM University Munich (@CWI), Recognizing, Naming and Exploring Structure in RDF Data (2014).
- Cristian Mihai Bârcă, Vrije Universiteit Amsterdam (@Actian), Dynamic Resource Management in Vectorwise on Hadoop (2014).
- Harald Lang, TUM University Munich, Adapting Main-Memory Databases to Modern Hardware Architectures (2014).
- Tim Gubner, Ilmenau University of Technology (@Actian), Achieving many-core scalability in Vectorwise (2014).
- Sebastian Woehrl, Ludwig Maximilian University Munich+Augsburg University+TUM University Munich (partly @CWI), Efficient relational main-memory query processing for Hadoop Parquet Nested Columnar storage with HyPer and Vectorwise (2014).
- Sinziana Filip, Vrije Universiteit Amsterdam, A scalable graph pattern matching engine on top of Apache Giraph (2014).
- Peter Rutgers, Vrije Universiteit Amsterdam, Extending the Lighthouse graph engine for shortest path queries (2015).
- Yordi Verkroost, Vrije Universiteit Amsterdam, Evaluation of Graph Management Systems for Monitoring and Analyzing Social Media Content with OBI4wan (2015).
- Per Olav Hoydahl Ohme, Vrije Universiteit Amsterdam, Reducing Memory Requirements for Distributed Graph Query Executions in Lighthouse (2016).
- Till Doehmen, Vrije Universiteit Amsterdam, Multi-Hypothesis Parsing of Tabular Data in Comma-Separated Values (CSV) Files (2016).
- Georgiana Ciocirdel, Vrije Universiteit Amsterdam, A G-CORE Graph Query Language Interpreter (2018).
- Mihai Varga, Vrije Universiteit Amsterdam, Just-in-time Compilation in MonetDB with Weld (2018).
- Boudewijn Braams, Vrije Universiteit Amsterdam, Predicate Pushdown in Parquet and Apache Spark (2018).
- Richard Gankema, Vrije Universiteit Amsterdam, Loop-Adaptive Execution in Weld (2018).
- Bogdan Ghita, Universitatea Politehnica Bucuresti+Vrije Universiteit Amsterdam, Self-learning Whitebox Compression (2019).
- Giorgi Kikolashvili, Vrije Universiteit Amsterdam, A JVM-based Vectorized Spark Query Engine (2019).
- Ionut Boicu, Universitatea Politehnica Bucuresti+Vrije Universiteit Amsterdam, Adaptive on-the-fly Compressed Execution in Spark (2019).
- Adriana Tufa, Universitatea Politehnica Bucuresti+Vrije Universiteit Amsterdam, Self-Organizing Data Layouts for Databricks Delta (2019).
- Per Fuchs, Vrije Universiteit Amsterdam, Fast, scalable worst-case optimal joins for graph-pattern matching on in-memory graphs in Spark (2019).
- Christian Stuart, Universiteit van Amsterdam, Profiling Compiled SQL Query Pipelines in Apache Spark (2020).
- Azim Afroozeh, Vrije Universiteit Amsterdam & Universiteit van Amsterdam, Towards a New File Format for Big Data: SIMD-Friendly Composable Compression (2020).
- Long Tran, Vrije Universiteit Amsterdam & Universiteit van Amsterdam, Chaos Engineering for Databases (2020).
- Sam Ansmink, Vrije Universiteit Amsterdam & Universiteit van Amsterdam, Encrypted Query Processing in DuckDB (2021).
- Jim Stam, Vrije Universiteit Amsterdam & Universiteit van Amsterdam, Low overhead self-optimizing storage for compression in DuckDB (2022).
- Tavneet Singh, Vrije Universiteit Amsterdam & Universiteit van Amsterdam, Architecting SQL/PGQ support in DuckDB (2022).
- Daniel ten Wolde, Vrije Universiteit Amsterdam & Universiteit van Amsterdam, Integrating SQL/PGQ into DuckDB (2022).
- Tom Ebergen, Vrije Universiteit Amsterdam, Join Order Optimization with (Almost) No Statistics (2022).
- David Puroja, Universiteit van Amsterdam + Vrije Universiteit Amsterdam, LDBC Social Network Benchmark Interactive v2.0 (2023).
- Lotte Felius, Universiteit van Amsterdam + Vrije Universiteit Amsterdam, Assessing the performance of distributed PostgreSQL (2023).
- Leonardo Kuffo Rivero, Universiteit van Amsterdam + Vrije Universiteit Amsterdam, ALP: Adaptive Lossless floating-Point compression (2023).
- Yiming Wu, Universiteit van Amsterdam + Vrije Universiteit Amsterdam, Seamlessly and Efficiently Integrating DuckDB with GNN Libraries (2023).
- Florian Gerlinghoff, Universiteit van Amsterdam + Vrije Universiteit Amsterdam, A Testing Strategy for Hybrid Query Execution in Database Management Systems (2023).
- Thomas Glas, TU Munich, Exploiting Column Correlations for Compression (2023).
- Pingan Ren, Vrije Universiteit Amsterdam & Universiteit van Amsterdam, Parallelized Path-finding in DuckPGQ (2024).
- Paul Gross, Vrije Universiteit Amsterdam & Universiteit van Amsterdam, Dynamically Exploiting Factorized Representations (2024).
- Ziya Mukhtarov, TU Munchen, Nested Data-Type Encodings in FastLanes (2024).
- Elena Krippner, LMU, Augsburg University & TUM, Rethinking Vector Embeddings Search for Analytical Database Systems (2024).
- Niclas Haderer, Vrije Universiteit Amsterdam & Universiteit van Amsterdam, Declarative Caching in MotherDuck (2024).
- Jeewon Heo, Vrije Universiteit Amsterdam & Universiteit van Amsterdam, Cost-Based Hybrid Query Optimization in MotherDuck (2024).
- Raufs Dunamalijevs, Universiteit van Amsterdam, Predicate Pushdown in FastLanes (2024).
Many of these student's I co-advised. In the case of Marcin and Menzo, this involved Martin Kersten. For the other students, it varies (check the thesis contents).
Other Students:
Service:
Scientific Governance:
- Member of the Board of Trustees of the VLDB Endowment 2014-2019.
- Member of the Board of Directors of the CIDR organization 2020-2025.
- Member of the DaMoN Steering Committee (2007-), with Anastasia Ailamaki and Stefan Manegold
- Member of the PVLDB Advisory Committee (2018-2021).
- PC Chair of ICDE 2024 (Industrial & Applications track), with Fatma Ozcan and Ashraf Aboulnaga.
- PC Chair of CIDR 2022, with Fatma Ozcan and Thomas Neumann.
- PC Chair of CIDR 2021 (lead), with Fatma Ozcan and Jignesh Patel.
- PC Chair of CIDR 2020, with Jignesh Patel and Thomas Neumann.
- PC Chair of VLDB 2017, with Kenneth Salem.
Journal Editorships:
- associate editor of ACM SIGMOD (2026).
- associate editor of IEEE Data Engineering Bulletin (2010-2012).
- associate editor of The VLDB Journal (2011-2017).
- editorial board member of PVLDB volumes 2, 3, 4, 6, 7, 8 and 14.
- associate editor of PVLDB (volumes 9 and 15).
- editor-in-chief of PVLDB (volume 10).
PC memberships of Major database conferences:
- SIGMOD 2004, 2008 (Demo), 2010, 2013, 2014 (Industrial), 2016, 2018 (Industrial), 2020, 2023
- VLDB 2001, 2005, 2007, 2008 (+editorial board member of PVLDB in most later years), 2025 (Industrial)
- ICDE 2005, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015 (Demo)
- EDBT 2002, 2008, 2013, 2014 (Industrial)
- WWW 2009
- CIDR 2015
- CIKM 2008 (area chair)
other PC memberships:
- GRADES 2018, 2019, 2020
- ESCW 2013 (track chair)
- SMDB 2010, 2012, 2013
- W3C Workshop on RDF Access to Relational Databases
- XIME-P 2007
- MDM 2007
- ExpDB 2006
- DBISP2P 2004, 2005, 2006, 2007
- VLDB PhD 2005, 2006, 2011
- DAMON 2007, 2008, 2011
- PDMST 2006, 2007
- WOD 2013
Additionally, I have served as referee for journal publications:
- ACM TODS
- The VLDB Journal
- IEEE TKDE
- IEEE Internet Computing
I serve as secretary to Data Science Platform Netherlands (DSPN), representing my employer CWI there.
DSPN was launched in 2016 as part of the ICT Research Platform Netherlands (IPN) to give a
voice to the Data Science initiatives of the Dutch ICT research organizations.
I am the co-founder of the Linked Data Benchmark Council (LDBC), a UK company limited by guarantee,
a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark results for graph data management software.
Its members are companies involved in the creating of graph data management systems (e.g. Oracle, neo4j, Tigergraph, Ontotext, Openlink).
Since 2018, LDBC also became active, beyond benchmarking, in studying and proposing query and schema languages for property graph systems.
Since its inception I have been either the chairman or vice-chairman of the board of directors of LDBC, of which I am a personal member.
Organization:
- Co-organizer of CIDR 2025 Amsterdam (together with Stefan Manegold).
- Co-organizer of the
DBDBD 2024, the Dutch-Belgian Database Day.
- With George Fletcher, Asterios Katsifodimos and Sebastian Schelter, I am the initiator and co-founder of DSDSD, the Dutch Seminar on Data Systems Design a bi-weekly hybrid seminar on Friday afternoon 16:00 CET.
- Co-organizer of CIDR 2023 Amsterdam (together with Stefan Manegold). Also organized the Martin Kersten memorial there.
- Organizer of CIDR 2021 (exceptionally, in the cloud). Special props to Gábor Szárnyas for handling everything digital there.
- Co-organizer of CIDR 2020, from that year on bi-annually in Amsterdam (together with Stefan Manegold).
- General Chair of ACM SIGMOD 2019 Amsterdam (together with Stefan Manegold).
- Co-organizer of
LDBC Technical User Community (TUC) meetings, since 2012. These
are typically co-located with ACM SIGMOD/PODS (i.e., yearly), and coincide with a LDBC Board of Directors meeting.
- Co-organizer of the workshop on
Database Architectures for Modern Hardware,
held at Schloss Dagstuhl, June 17-22 , 2018.
- Co-organizer of the fifth international
GRADES
workshop on Graph Data Management Experiences and Systems, co-located with and sponsored by ACM SIGMOD 2017.
- Co-organizer of the fourth international
GRADES
workshop on Graph Data Management Experiences and Systems
- Co-organizer of the third international
GRADES
workshop on Graph Data Management Experiences and Systems,
co-located with and sponsored by ACM SIGMOD 2015.
- Co-organizer of the second international
GRADES
workshop on Graph Data Management Experiences and Systems,
co-located with and sponsored by ACM SIGMOD 2014.
- Founder and co-organizer of the first international
GRADES
workshop on Graph Data Management Experiences and Systems,
co-located with and sponsored by ACM SIGMOD 2013.
- Co-organizer of the sixth
DaMoN,
the international workshop on Data Management on New Hardware,
co-located with and sponsored by ACM SIGMOD 2010.
- Co-organizer of the fifth
DaMoN,
the international workshop on Data Management on New Hardware,
co-located with and sponsored by ACM SIGMOD 2009.
- Co-organizer of the workshop on
XQuery Implementation Paradigms,
held at Schloss Dagstuhl, 19-22 november 2006.
- Co-organizer of the second
DaMoN,
the international workshop on Data Management on New Hardware,
co-located with and sponsored by ACM SIGMOD 2006.
- Co-organizer of the
DBDBD 2005,
the Dutch-Belgian Database Day.
- Founder and co-organizer of the first
DaMoN,
the international workshop on Data Management on New Hardware,
co-located with and sponsored by ACM SIGMOD 2005.
CWI DISCLAIMER