Homepage of Peter Boncz
notable (recent) activities:
-
Won the VLDB 10-Year Best Paper Award together with Stefan Manegold and Martin Kersten, for our work on architecture-conscious query processing in column stores! VLDB Keynote
-
At VLDB 2009 I presented a tutorial on Column oriented Database Systems (PDF) together with Daniel Abadi (Yale) and Stavros Harizopoulos (HP Labs).
-
Together with prof. Divy Agrawal (UCSB), taught at the VLDB Summer School, held this year Aug 10-14 in Shanghai. Speaking for 2.5 days non-stop to 100 Chinese DB PhD students turned out to be.. quite intense (sheets: Wed Thu 1 Thu 2 Thu 3 Fri).
-
Co-founded a new company, VectorWise, around the X100 high-performance database system. More news here: we are collaborating with open-source database company Ingres.
-
With Ken Ross (Columbia University) I am back as organizer of DaMoN 2009, a workshop which I co-founded in 2005.
-
My NWO Open project proposal Querying while Transforming Large Graph Databases was awarded; the project will run 2010-2014.
-
MonetDB BV was founded, a company that provides services around the MonetDB open-source database system (topic of my PhD research).
-
Published an invited "research highlight" article in the Communications of the ACM journal, on the architecture-consious research performed in the past decade in the context of the MonetDB project (co-authors Martin Kersten and Stefan Manegold -- both from CWI).
-
On invitation from prof. Hector Garcia-Molina and dr. Meichun Hsu, I spoke at the 2008 InfoSeminar at Stanford University.
-
I won the ICT Regie Award 2006 for my role in the CWI spin-off company Data Distilleries (see below).
-
member of the CWI ondernemingsraad.
The INS1 research theme at
CWI, where I work aims at researching
database technology, applied to query-intensive
domains like data mining,
XML databases and
multimedia retrieval.
Professional Activities
My CV is publicly available via:
Being a database architect, my research tends to be clustered around efforts to
build database systems:
- MonetDB/XQuery: a high-performance
XML database system.
It uses the Pathfinder XQuery-to-Relational
Algebra compiler and (loop-lifted) staircase join algorithms to turn the binary
relational MonetDB into a full-fledged XML DBMS.
- AmbientDB: a P2P database architecture, supporting ad-hoc distributed
querying, schema integration and data synchronization. We see AmbientDB as a "data management"
enhanced middleware technology, that eases construction of intelligent applications on networks
of pervasive computing devices. The query processing core of AmbientDB is MonetDB/XQuery. It
uses SOAP for distributed querying, and P2P data structures for node discovery and connectivity.
- X100: continues our research into the interaction between database
architecture and modern computer architecture (cache-conscious processing, CPU efficiency of
database algorithms, efficient exploitation sequential I/O). As this is about performance,
we focus on the (few?) application domains where this is relevant. This includes on-line data mining
and OLAP on large datawarehouses such as in TPC-H,
but also content-based information access in huge multimedia databases like
TREC Video.
All these research activities are funded as part of the 6-year project
MultimediaN,
by the Dutch Government (BSIK), in which I lead the
database sub-project with involvement from
Philips Research
and Technical University Twente (CTIT).
In this context, CWI has research
collaborations with
NFI (MonetDB/XQuery),
SPSS (X100) and
TextKernel BV (X100).
Ph.D. Students
These students worked in MultimediaN (2004-2009), in which
I was the lead in of of the projects (14FTE at CWI, Philips Research, U Twente).
I am also advising:
Both positions are in the XIRAF project 2007-2009, jointly with the Dutch Forensic Institute (NFI).
PC Memberships
Major conferences:
- SIGMOD 2004, SIGMOD 2008 (Demo), SIGMOD 2010
- VLDB 2001,2005,2007,2008,2009,2010
- ICDE 2005,2008,2009,2010
- EDBT 2002,2008
- WWW 2009
- CIKM 2008 (area chair)
other PCs:
- SMDB 2010
- W3C Workshop on RDF Access to Relational Databases
- XIME-P 2007
- MDM 2007
- ExpDB 2006
- DBISP2P 2004,2005,2006,2007
- VLDB PhD 2005,2006
- DAMON 2007,2008
- PDMST 2006,2007
Additionally, I have served as referee for major journal publications:
- ACM TODS
- IEEE TKDE
- VLDB Journal
Organization
- Co-organizer of the fifth
DaMoN,
the international workshop on Data Management on New Hardware,
co-located with and sponsored by ACM SIGMOD 2009.
- Co-organizer of the workshop on
XQuery Implementation Paradigms,
to be held at Schloss Dagstuhl, 19-22 november 2006.
- Co-organizer of the second
DaMoN,
the international workshop on Data Management on New Hardware,
co-located with and sponsored by ACM SIGMOD 2006.
- Co-organizer of the
DBDBD 2005,
the Dutch-Belgian Database Day.
- Co-organizer of the first
DaMoN,
the international workshop on Data Management on New Hardware,
co-located with and sponsored by SIGMOD 2005.
Past Work
In may 2002, I completed and defended my PhD
thesis on
MonetDB.
MonetDB is an experiment in database architecture (gone out of hand) that proposes to use full vertical fragmentation
in order to better accommodate query-intensive access patterns, both in terms of I/O optimization
as well as for improving the access to the memory caches.
MonetDB also uses a column-wise query-processing algebra that has a zero degree of freedom, which makes it
possible to use a generic but pre-compiled query engine, as opposed to the interpretative
techniques used in other DBMSs. Compile-time fixed query processing primitives are crucial for modern CPUs
like the Pentium 4, which need highly predictable code in order to avoid branch mispredictions,
as well as an ever present pool of independent instructions (in order to fill its parallel units
and obtain a good Instructions-Per-Cycle ratio). Note that compile-time here means DBMS build time,
and not query-compilation-time, which is a run-time activity. A final MonetDB design issue
was extensibility: by constructing MonetDB as a back-end, on top of
which multiple front-end systems can work and interact with the storage/query
on a lower level than, say, SQL allows to re-use the same system in multiple application domains, which
was one of MonetDB's design goals. MonetDB has been applied successfully to
OLAP,
data mining,
GIS,
k-NN search,
XML-,
image- and
video-databases.
During my time as PhD candidate at the
University of Amsterdam,
our research group founded Data Distilleries (DD -- it
has been acquired in 2004 by SPSS), and I was with it
almost from the beginning. Its
decison management tools
are powered by MonetDB, and as such I got deeply involved, spending more than two years full time
as overall product architect at the company. As such I architected and designed large parts of the new
real time suite
as well, getting acquainted with J2EE application server technology in the area of real-time marketing
and also real-time heterogeneous databases.
As any start-up company job, this was not about software architecture alone, and I learned a lot
about software engineering methodology in practice, but also on project and product management.
However, the most fun part was always going on site visits to the clients of DD, which are mostly big
companies (like ABN Amro,
ING Postbank,
Aegon,
Vodafone,
Center Parcs,
OHRA,
IMP) getting to the highly-protected
machine room, where between many huge mainframes, a Unix or NT server would be (and is still) running MonetDB!
Research Interests
So, since the start of 2002 I'm back at CWI doing research into:
- database architecture: the art of crafting DBMS software.
- data mining, OLAP: the so-called "query-intensive" DBMS application areas.
- (XML) query processing: algorithms and data (index) structures for high query performance.
- non-standard application domains: applying DBMS technology to multimedia, GIS and biology.
- computer architecture: the interaction between a DBMS and modern hardware.
- P2P technologies: architectures that allow for ad-hoc cooperation.
CWI DISCLAIMER