Research Interests

Our research group is mainly interested in techniques for supporting efficient processing of queries on large databases. For the last 20 years, our focus has been on the development of techniques for object-relational databases. A strong emphasis has been on spatial, temporal and spatio-temporal index structures. The recent improvements in network technology allow querying massive remote data sources. From these pre-conditions, the focus of our work has been broadened into two directions. On the one hand, we investigate the management of data streams, created by massive amounts of small sensors. On the other hand, we consider geospatial database in the context of scientific applications and explore techniques for querying and analyzing big spatial databases.

PIPES - stream processing

Over the next years, a tremendous number of sensors will be installed in our environment. More and more data is continuously delivered from these devices as a stream. In general, a large number of streams are required to provide the desired information and each of the streams outputs a large number of data items. Ideally, users pose ad-hoc queries on streams, similar to a traditional DBMS. There are however fundamental differences: a query runs until the user explicitly stops it and, streaming data items are generally valid for a short period of time only. This leads to a substantial change in query processing. Therefore, systems for streams are primarily designed for the management of queries, of which there might be millions running simultaneously, whereas data items of the streams are kept in the system only temporarily.
Our research group addresses the following issues in stream processing:

Based on a temporal interval-semantics, we are concerned with the implementation of data-driven query operators, i. e., whenever a data item reaches an operator it triggers its processing step.
The optimization of queries on streams is a real challenge when we consider millions of rather complex queries running simultaneously. For this scenario, scalable multi-query optimization has become the big issue.
Even though, there are heuristics, to construct good query plans, it might be that the performance of the plan drops during runtime. This can be caused by fundamental changes in the system, e. g. changes in the quantity and quality of arriving data items. We therefore need a mechanism to dynamically adapt our query plan at runtime. In general, it is sufficient to redistribute only resources among the operators, but in the worst-case, a rearrangement of the entire query plan might be necessary.

We have addressed these research issues in a project called PIPES and use them in a new challenging applications for monitoring mission-critical infrastructures like in our ACCEPT project that is currently supported by Bundesministerium für Bildung und Forschung (BMBF).

Spatial databases in scientific applications

Large collections of geospatial data are collected by earth obervation systems, but there has been not much research on the management of large raster databases. In this project, essential questions in this field are addressed regarding compression, retrieval and analytical queries of large raster data sets. Our goal is to support expensive operations on powerful computing infrastrutures including Cloud infrastructures and powerful GPGPUs.

Advanced query processing and optimization

Though database systems are considered as a mature technology, there are still research challenges due to new demanding applications. The efficient processing of joins has been a big issue for long, but surprisingly very little work has been done for supporting complex join predicates like similarity. Similarity joins are important when users are interested in the integration of different data sources. Another application of a similarity join arises in the context of data mining to detect similar patterns. We are very much interested in efficiently supporting such unusual joins, particularly for the cases when the input consists of more than two relations and the output is produced progressively.

Index structures

One of the subjects we are very well known for is the area of index-structures. Our R*-tree and MVBT are index-structures that are already available in commercial systems such as Oracle. For many years we have been studying the design and evaluation of heuristics for improving the R*-tree. Moreover, bulk-operations like loading a tree from a given set of objects have been an important topic to our research group. Indexing and storing XML-data is also a subject we are working on. One major focus has been on native storage structures for XML and supporting bulk-loading on our XML-storage. Recently, we have extended our studies to new fields of applications like preference databases and demanding new technologies, for example location-based services and peer-to-peer systems.

XXL (eXtensible and fleXible Library)

Though researchers in the database area are often interested in the development of a prototype database system, we followed a different approach and have developed a library called XXL, which may of course very well serve as a platform for building database systems. XXL provides the query processing functionality required for a database system like a set of demand-driven operators, a rich collection of index-structures, and a rule-based optimizer. It supports processing of both, relational and XML data. All the packages of XXL come with a full documentation and therefore, people outside of our group are able to quickly familiarize with the functionality of XXL. It is very important to us that we generally use XXL to implement new techniques presented in our research papers. There is reference implementation available in XXL that allow for quick experimental comparison, for example. We found that XXL improves quality and speed of our coding, when implementing new ideas, since it provides a rich infrastructure of low- and high-level components. XXL is a live library where new functionality is continuously added. The library is publicly available under GNU LGPL.