Homepage of Peter Boncz
Senior researcher at the INS1 research group of CWI (0.8FTE) and senior lecturer in the Knowledge Representation and Reasoning group at the Vrije Universiteit Amsterdam (0.2FTE).
I teach the database course at Vrije Universiteit Amsterdam.
has my online CV
Awards
Spin-Offs
- VectorWise (2008-2010). Simply the fastest-per-core analytical database system on earth, developed at CWI in the MonetDB/X100 project with Marcin Zukowski, Sandor Heman and Niels Nes. The company was funded and bought by Actian (the company that also sells the Ingres database). The VectorWise development team is across the street of CWI and there is still close research collaboration.
- Data Distilleries (1996-2002). Though technically not a co-founder, it was very close. My Phd topic MonetDB was the backend in the Data Distilleries Architecture, and I was responsible for its integration and also for research. Later I became chief architect; a role shared with Tim Ruhl, where we were responsible for the development of the entire product line. Data Distilleries grew to 100 people, but retreated to Europe when its aggressive US expansion ran aground in the 9-11 aftermath. In Europe it remained reasonably successful and had an average headcount for 40 over all of its existence. In 2003 it was acquired by SPSS, who two years later moved development to the US. Part of it still survive in what is now IBM's SPSS Modeler and Deployment products.
- MonetDB BV (2008-). A small company that is assisting carefully selected open-source MonetDB users who need commercial support, leveraging the MonetDB core developer team at CWI.
New News and Old News
- I feel very honored by professors Thomas Neumann and Alfons Kemper (TUM) nominating me for the Humboldt Research Award and flabbergasted by actually receiving this award.
- accepted position of associate editor at The VLDB Journal.
- I gave a keynote at the BDA (French database conference).
Did not talk French, rather talked VectorWise.
- Was invited in March 2010 to give a three-day PhD course on Column-Store Technologies at University of Warsaw, see
PhD Open. Student exercises and assignments here.
- At VLDB 2009 I presented a tutorial on Column oriented Database Systems (PDF) together with Daniel Abadi (Yale) and Stavros Harizopoulos (HP Labs).
- Together with prof. Divy Agrawal (UCSB), taught at the VLDB Summer School, held Aug 10-14 2009 in Shanghai. Speaking for 2.5 days non-stop to 100 Chinese DB PhD students turned out to be.. quite intense (sheets: Wed Thu 1 Thu 2 Thu 3 Fri).
- Published an invited "research highlight" article in the Communications of the ACM journal, on the architecture-conscious research performed in the past decade in the context of the MonetDB project (co-authors Martin Kersten and Stefan Manegold -- both from CWI).
- On invitation from prof. Hector Garcia-Molina and dr. Meichun Hsu, I spoke at the 2008 InfoSeminar at Stanford University.
Projects
Projects handled as the PI, from the project/resource perspective:
- LDBC (EU-FP7-STREP) 2012-2015. This project will form a benchmark council for RDF and Graph database vendors to agree on RDF/graph benchmarks, benchmark practices and results; and will also develop an initial set of such benchmarks. Through my affilliation at VU University Amsterdam, I act as scientific director. Project coordinator is Universidad Politecnica Barcelona (Josep Larriba Pey).
- VectorWise CWI Research Collaboration. A grant from Actian Corp. is paying for 2 PhD. positions at CWI in the period 2012-2017. One PhD candidate is already hired, in the first half of 2013 we will be looking to hire another.
- LOD2 (EU-FP7-IP) 2010-2014. This project, lead by Leipzig University (Soeren Auer), funds the activities of Pham Minh Duc doing a PhD in large-scale graph data management, and the sabbatical stay in 2012/20023 of Irini Fundulaki (FORTH) at CWI. The CWI net income is EUR 500K. In this project, CWI has been aiding RDF Store vendor Openlink on using columnar storage and vectorized execution in its Virtuoso product, which has significantly increased its performance.
- Querying while Transforming Large Graph Databases (NWO Open Competition) 2010-2014. This project is funding Lefteris Sidirourgos PhD project. The CWI net income is EUR 188K.
- MultimediaN (ICES-KIS) 2004-2008. Throughout this project I acted as leader of the multimedia databases work-package (N3), which comprised 14 FTE at CWI, Philips Research, and Technische Universiteit Twente. At CWI, the project funded two students: Sandor Heman, Marcin Zukowski and Sjoerd Mullender partly (scientific programmer). On a total budget of 1.8M EUR for MN-N3, the CWI net income was EUR 600K. The VectorWise spin-off came from the insights gained in this project.
- XIRAF (FES) 2007-2009. This project funded Lefteris Sidirourgos as junior researcher, helping to mature the MonetDB/XQuery system, and Nan Tang as post-doctoral researcher. The CWI net income was EUR 320K. One end result of this product was the XIRAF software that NFI deploys internally and for Dutch police services to conduct digital forensic investigation (e.g. child porn cases).
The topical perspective, ongoing projects:
- VectorWise (2004-today): In 2003 after joining CWI after my Data Distilleries period, Stefan Manegold, Niels Nes and I performed a comparison of TPC-H Q1 performance hard-coded as a standalone program versus database systems that shattered our image of MonetDB as the pinnacle of analytical database performance. It led to an architecture that conserved the bulk-processing of MonetDB in a pipelined query processing architecture (aka "vectorized processing"). Initially intended to serve as the new MonetDB kernel it was named MonetDB/X100. Later, when a spin-off was founded to commercialise it (later acquired by Actian) it was renamed VectorWise. Apart from the vectorized processing, VectorWise led to many interesting sub-projects in
- lightweight compression methods
- cuckoo-hashing and best-effort hash-partitioning
- query execution using SIMD instructions
- rethinking I/O for modern storage (e.g. Flash)
- adaptive tuple layouts during query processing
- multi-core aware parallelization
- just-in-time query compilation
- cooperative scans for sharing concurrent I/O
- query execution on compressed data
- recycling of intermediates in non-materializing database systems and
- column-oriented multi-dimensional clustering.
- MonetDB (1994-today): without too much explanation Martin Kersten handed me two large source code files (gdk.mx and monet.mx) to start working from which I did. Slowly I learnt about database architecture and why the code he gave me was unique. It subsequently led to quite a bit of re-architecting work in the interpreter, type system, algorithms, storage and scientific results of which the cache-conscious query processing algorithms are most well known. I interrupted my PhD job 1999 to work at Data Distilleries; which used MonetDB. In 2002, I completed my PhD thesis on MonetDB (Monet: a next-Generation DBMS Kernel For Query-Intensive Applications). Since then, under the enthusiastic lead of Martin Kersten the whole database group of CWI has become involved and carries the MonetDB open-source project forwards. After closing the XQuery work, my current contributions are in its nascent support of RDF and SPARQL.
- Graph and RDF data management (2009-today): while graph data management usually is very different from RDF data management there is an overlap as RDF is in essence a graph data model. While traversing large graphs using many (self-)join steps over the edge table, intermediate results can easily explode. We are interested in finding indexing methods and query optimization strategies that can make executing complex large-scale graph queries affordable and manageable. Additionally, there is interest in the performance and benchmarking of RDF systems (inclusive the nascent RDF support in MonetDB).
- SCIBORQ (2010-today): the SCIBORQ project (Scientific data management with Bounds On Runtime and Quality) aims to trade accuracy in query processing for extra speed. While query processing on samples has been studied in the past, such work is typically not acceptable for scientific purposes, as it fails to provide guarantees on the accuracy of results. We are developing new weighted sampling methods that help focus sampling on those data areas relevant for a workload, allowing to provide high-quality answers while operating on the minority of the data.
Closed projects:
- MonetDB/XQuery (2004-2009): related to the above, but with the help of Teggy Grust's team (Konstanz, Munich, Tuebingen) who developed the Pathfinder XQuery-RelationalAlgebra compiler we created the fastest system for processing huge XML documents with complex queries so far. This system was used as the backbone of the XIRAF system for digital forensics at NFI (the Dutch Forensic Institute). The secret of its speed, were the high-quality compilation techniques applied as well as the extremely fast (linear) staircase join algorithms.
- XRPC (2006-2010): with Jennie Zhang I worked on P2P and distributed XML query processing. The scope of this work started broader as Ambient using P2P query processing. The work focused on querying XML data on the web, by automatically splitting XQuery queries in single SOAP message exchanges, that retain the semantics of the original XQuery (this is hard, due to node identity and document order). The results here are really nice, producing minimal message cost, both in terms of number of interactions (latency) as data volume (bandwidth) thanks to a new dynamic version of XML projection that makes sure only the needed parts of an XML document are sent over the wire.
- AmbientDB (2003-2006): a vision of P2P distributed query processing that was developed jointly with Willem Fontijn of Philips Research, who envisioned using this as middleware for ambient intelligence consumer electronics applications. During two years, Caspar Treijtel worked on this topic as a PhD student, but he did not finish.
Current research interests:
- database architecture: the art of crafting DBMS software, the core stuff. A special interest in column stores.
- computer architecture: the interaction between a DBMS and modern hardware. Also interested in large-scale hardware, be it many-core, TB memories and MapReduce clusters.
- adaptive storage and query processing: algorithms and storage that adapt automatically to a workload, and is able to exploit value- and structure-correlations in the data.
- graph databases and RDF: managing and querying huge graph-shaped data. This may include diverse topics as data integration, complex traversals, top-K over graph aggregates and inferencing.
Alumni
Post-Docs:
- Nan Tang (post-doc in XIRAF 2008-2010, moved to Edinburgh University)
- Renzo Angles (post-doc in LDBC 2013-)
PhD Students:
MSc Students:
- Menzo Windhouwer, Universiteit van Amsterdam, Distributed Query Execution on Monet (1997).
- Marcin Zukowski, MIMUW+Vrije Universiteit Amsterdam (@CWI), Parallel Query Execution in Monet on SMP machines (2002).
- Brahmananda Sapkota, Technische Universiteit Twente(@CWI), Design of Peer-to-Peer Protocol for AmbientDB (2003).
- Anna Jancarikova, Vrije Universiteit Amsterdam (@CWI), Distributed query processing over a peer-to-peer network (2003).
- Jan Rittinger, Konstanz University (@CWI), Pathfinder/MonetDB: A Relational Runtime for XQuery (2004).
- Arjan Scherpenisse, Universiteit van Amsterdam (@CWI), Giving Music more Brains: A study in music-metadata management (2005).
- Wouter Alink, Technische Universiteit Twente (@CWI+NFI), XIRAF: an XML-IR Approach to Digital Forensics (2005).
- Marco Antonelli, University Roma Tre (@CWI), A SPARQL front-end for MonetDB (2008).
- Fabian Nagel, Tuebingen University (@CWI+VectorWise), Recycling Intermediate Results in Pipelined Query Evaluation (2010).
- Kamil Anikiej, MIMUW+Vrije Universiteit Amsterdam (@VectorWise), Multi-core parallelization of vectorized query execution (2010).
- Alicja Luszczak, MIMUW+Vrije Universiteit Amsterdam (@VectorWise), Simple Solutions for Compressed Execution in Vectorized Database System (2011).
- Juliusz Sompolski, MIMUW+Vrije Universiteit Amsterdam (@VectorWise), Just-in-time Compilation in Vectorized Query Execution (2011).
- Michal Switakowski, MIMUW+Vrije Universiteit Amsterdam (@VectorWise), Integrating Cooperative Scans in a column-oriented DBMS (2011).
- Andrei Costea + Adrian Ionescu, Vrije Universiteit Amsterdam (@VectorWise), Query Optimization and Execution in Vectorwise MPP (2012).
- Bogdan Raducanu, Universitatea Politehnica Bucuresti+Vrije Universiteit Amsterdam (@VectorWise), Micro Adaptivity in a Vectorized Database System (2012).
Many of these student's I co-advised. In the case of Marcin and Menzo, this involved Martin Kersten. For the other students, it varies (check the thesis contents).
Other Students:
Organizational Activities in the Research Community
Journal Editorships:
- associate editor of IEEE Data Engineering Bulletin (2010-2012).
- associate editor of The VLDB Journal (2011-).
PC memberships of Major database conferences:
- SIGMOD 2004, SIGMOD 2008 (Demo), SIGMOD 2010, SIGMOD 2013
- VLDB 2001, 2005, 2007, 2008, 2009, 2010, 2011, 2013, 2014
- ICDE 2005, 2008, 2009, 2010, 2011, 2012, 2013, 2014
- EDBT 2002, 2008, 2013, 2014 (Industrial)
- WWW 2009
- CIKM 2008 (area chair)
other PC memberships:
- ESCW 2013 (track chair)
- SMDB 2010, 2012, 2013
- W3C Workshop on RDF Access to Relational Databases
- XIME-P 2007
- MDM 2007
- ExpDB 2006
- DBISP2P 2004,2005,2006,2007
- VLDB PhD 2005,2006,2011
- DAMON 2007,2008,2011
- PDMST 2006,2007
- WOD 2013
Additionally, I have served as referee for journal publications:
- ACM TODS
- The VLDB Journal
- IEEE TKDE
- IEEE Internet Computing
Organization:
- Founder and co-organizer of the first international
GRADES
workshop on Graph Data Management Experiences and Systems,
co-located with and sponsored by SIGMOD 2013.
- Co-organizer of the sixth
DaMoN,
the international workshop on Data Management on New Hardware,
co-located with and sponsored by ACM SIGMOD 2010.
- Co-organizer of the fifth
DaMoN,
the international workshop on Data Management on New Hardware,
co-located with and sponsored by ACM SIGMOD 2009.
- Co-organizer of the workshop on
XQuery Implementation Paradigms,
to be held at Schloss Dagstuhl, 19-22 november 2006.
- Co-organizer of the second
DaMoN,
the international workshop on Data Management on New Hardware,
co-located with and sponsored by ACM SIGMOD 2006.
- Co-organizer of the
DBDBD 2005,
the Dutch-Belgian Database Day.
- Founder and co-organizer of the first
DaMoN,
the international workshop on Data Management on New Hardware,
co-located with and sponsored by SIGMOD 2005.
CWI DISCLAIMER