X100

Beyond cache-conscious query processing..

Development Wiki (restricted access)

Motivation

Database architecture involves finding good trade-offs of interacting design desicions in many dimensions (e.g. query language, optimization, query processing algorithms and data structures) and their interaction with the hardware environment. As hardware keeps evolving, this implies that database architecture should be subject to constant re-evaluation.

A good example of such balancing is reflected by the "5-minute rule" of Jim Gray, and the various re-formulations in later years to adjust it to ever more recent hardware developments. As a more concrete example, our own work in MonetDB has shown that from the late 1990s it has become important to optimize data structures and e.g. join algorithms to make optimal use of the caches found in modern CPUs, a complication that was nonexistant when most current RDBMS products where architected.

Goal

The X100 project aims at investigating opportunities for improvement in database architecture provided by evolution in computer architecture. We target so-called query-intensive domains, where our prime domains are OLAP and (multi-media) Information Retrieval and XML processing. Hence, an additional requirement of this new high-performance database kernel is that it can be usefully adapted to non-standard database application domains.

Research Questions

We will be evaluating new data structures, indexing structures, query execution and query optimization techniques that try to exploit efficiency opportunities in modern computer hardware regarding:

About The Name 'Times-Hundred'...

We want to provide a DBMS kernel that is truely efficient in the sense that: Now, when we compare this with the state-of-the-art of DBMS kernels, we see that: In all, X100 will need to fight bottlenecks on all levels, from sequential I/O to good use of the caches in the memory hierarchy, up to the level of tuing the instruction mix in the CPU.

So, if eventually we do that succesfully and get to our goal of CPU-boundedness on minimal instructions with a high IPC, then indeed we might arrive at the situation that X100 achieves a performance that is two orders of magnitude better than that of commercial RDBMS systems on the benchmarks of our choice (primarily OLAP such as TPC-H, but also content-based information access in huge multimedia databases like TREC Video).

Prototyping

X100 will be used initially as an extension module inside MonetDB.

Project Members

Publications

Our previous work on on cache- and CPU-efficiency inside MonetDB is highly related: