GSoC Project Ideas

Below, you can find some ideas on the directions in which we could jointly push Polypheny forward. Please consider them as starting points for your proposal. Of course, if you have other ideas, we would be very happy to hear them. Feel free to contact us and get feedback on what you plan to do beforehand.

Simply copying and pasting one of the ideas will not work. On the other hand, creating a completely new idea without first consulting the mentors might be difficult as well.

Query the Blockchain

A blockchain can be seen as a distributed append-only database. The aim of this project is to build an adapter for executing (read) queries against (different) blockchains like the Bitcoin blockchain or the Ethereum blockchain.

Due to Polypheny’s ability to join and combine data from different adapters in one query, this project will allow to integrate the latest data from a blockchain into arbitrary queries.

Difficulty: medium-hard

Quality Check and Assurance

A major problem in the process of developing any kind of software is to ensure that a change does not introduce new bugs in a completely different subsystem of the software. For a database system this also includes to ensure the completeness and correctness of the results of a query.

Continuously and automatically checking that a system behaves and works like expected is therefore important to ensure consistent software quality and to avoid regressions. Usually, this is done using unit tests and integration tests. While unit tests check that individual parts (units) of the code (typically individual methods) work as expected, integration testing checks if the whole application works correctly.

Currently, Polypheny has only a few integration tests which make it hard to avoid regressions and unintended side effects. The aim of this project idea is to systematically add additional tests to cover as many features as possible. This especially means writing checks for the SQL query interface.

Difficulty: easy

Visualize It

In debugging mode, Polypheny creates a lot of log output on all decisions and optimizations taken while processing a query. This output is hard to read but contains a lot of useful and interesting information for developing and optimizing Polypheny.

In the query optimization process, several candidate plans are generated by applying optimization rules. Every candidate plan has a certain cost assigned. While Polypheny-UI contains support for visualizing query plans, this is currently only done for the selected plan and not for all candidate plans.

The idea of this project is to visualize this debugging information in the UI. Furthermore, the existing query plan visualization should be extended to allow browsing through the candidate plans including a list of the applied rules and the associated costs.

Difficulty: medium

Support for Contextual Query Language

The Contextual Query Language (CQL) is a formal language for representing queries to information retrieval systems such as search engines, bibliographic catalogs and museum collection information. The idea of this project is to add a read-only CQL query interface to Polypheny-DB.

Difficulty: medium

Keep Them All

A multi-version database system allows to store multiple versions of the same entry. This allows, for example, in a human resources database to not only query the current salary of an employee, but also the salary they had two years ago.

The goal of this project is to extend Polypheny-DB to transparently support storing multiple versions of an element including a full referential integrity for the past revisions.

The project includes

  • extending Polypheny’s SQL dialect to support specifying a version,
  • rewriting the queries to retrieve the right version and to add a new version if the query aims to modify an existing entry, and
  • evaluate the correctness of the system especially for complex queries.

Difficulty: hard

A Question of Freshness

The goal of this project is to build a freshness-aware routing component for the Polypheny which is capable of handling stores with different levels of data freshness. Depending on the freshness required by queries in the workload, the system should autonomously decide if it makes sense to postpone updating some of the data stores until there is less load on the system to refresh them.

The required freshness is specified by the client as part of the query. The project includes the definition of an appropriate SQL syntax and the extension of the parser to support this syntax.

Difficulty: medium

Container-based Store Management

Data stores supporting an embedded mode are very handy. They can easily be deployed and removed using DDL statements or via the Polypheny-UI. Data stores not supporting an embedded mode (e.g. PostgreSQL) are more complex to setup and involve manual actions.

Introducing a container-based deployment (e.g., based on Docker) of data stores would allow to provided an embedded mode for every supported data store.

Difficulty: medium-easy

Physical Query Plan Builder

The Polypheny-UI already comes with a logical query plan builder integrated. The aim of this project is to implement a graphical build tool for physical query plans similar to the one for logical query plans. In the polystore context, the physical query plan distinguishes from the logical one by using operators specific to the involved data store adapters.

A physical query plan builder would be extremely helpful for development. Because there is already an implementation for logical plans, parts of the code can be reused.

To get an impression how a physical query plan in Polypheny looks like you can simply execute a query in the UI and have a look at the plan by selecting “Physical Query Plan” in the left menu.

Difficulty: medium

Polypheny Goes Semantics

Triple stores such as RDF manage data together with semantic relationships. The aim of this project is to build an adapter for integrating RDF stores into Polypheny and to support semantic queries, i.e., queries that leverage these semantic relationships.

An optional extension of this project is the integration of SPARQL into Polypheny, for instance by mapping SPARQL to SQL.

Difficulty: medium-hard