The Smart Data Analytics group is always looking for good students to write theses. The topics can be in one of the following broad areas:

Please note that the list below is only a small sample of possible topics and ideas. Please contact us to discuss further, to find new topics, or to suggest a topic of your own.


Topic Level Contact Person
Distributed Spatiotemporal Prediction Bachelor; Master Dr. Alexander Kister
Distributed Anomaly Detection in RDF
Detecting anomalies in data is a vital task, with numerous high-impact applications in areas such as security, finance, health care, and law enforcement. While numerous techniques have been developed in past years for spotting outliers and anomalies in unstructured collections of multi-dimensional points, with graph data becoming ubiquitous, techniques for structured graph data have been of focus recently. As objects in graphs have long-range correlations, a suite of novel technology has been developed for anomaly detection in graph data.
Bachelor; Master Dr. Hajira Jabeen
Rule/Concept Learning using Swarm or E.C.
In the Semantic Web context, OWL ontologies play the key role of domain conceptualizations while the corresponding assertional knowledge is given by the heterogeneous Web resources referring to them. However, being strongly decoupled, ontologies and assertional bases can be out of sync. In particular, an ontology may be incomplete, noisy, and sometimes inconsistent with the actual usage of its conceptual vocabulary in the assertions. Despite of such problematic situations, we aim at discovering hidden knowledge patterns from ontological knowledge bases, in the form of multi-relational association rules, by exploiting the evidence coming from the (evolving) assertional data. The final goal is to make use of such patterns for (semi-)automatically enriching/completing existing ontologies.
 Bachelor; Master Dr. Hajira Jabeen
Swarm Optimization Clustering  Bachelor; Master Dr. Hajira Jabeen
Knowledge Base Completion Bachelor; Master Dr. Hajira Jabeen
Entity Resolution
Entity resolution is the task of identifying all mentions that represent the same real-world entity within a knowledge base or across multiple knowledge bases. We address the problem of performing entity resolution on RDF graphs containing multiple types of nodes, using the links between instances of different types to improve the accuracy.
Bachelor; Master Dr. Hajira Jabeen
Intelligent Semantic Creativity : Culinarian
Computational creativity is an emerging branch of artificial intelligence that places computers in the center of the creative process. We aimt to create a computational system that creates flavorful, novel, and perhaps healthy culinary recipes by drawing on big data techniques. It brings analytics algorithms together with disparate data sources from culinary science.
In the most ambitious form, the system would employ human-computer interaction for rating different recipes and model the human cogitive ability for the cooking process.
The end result is going to be an ingredient list, proportions, and as well as a directed acyclic graph representing a partial ordering of culinary recipe steps.
Bachelor; Master Dr. Hajira Jabeen
IoT Data Catalogues
While platforms and tools such as Hadoop and Apache Spark allow for efficient processing of Big Data sets, it becomes increasingly challenging to organize and structure these data sets. Data sets have various forms ranging from unstructured data in files to structured data in databases. Often the data sets reside in different storage systems ranging from traditional file systems, over Big Data files systems (HDFS) to heterogeneous storage systems (S3, RDBMS, MongoDB, Elastic Search, …). At AGT International, we are dealing primarily with IoT data sets, i.e. data sets that have been collected from sensors and that are processed using Machine Learning-based (ML) analytic pipelines. The number of these data sets is rapidly growing increasing the importance of generating metadata that captures both technical (e.g. storage location, size) and domain metadata and correlates the data sets with each other, e.g. by storing provenance (data set x is a processed version of data set y) and domain relationships.
Master Dr. Martin Strohbach, Prof. Dr. Jens Lehmann

(Work at AGT International in Darmstadt)

Big Data quality Assessment
Data quality is considered as a multidimensional concept that covers different aspects of quality such as accuracy, completeness, and timeliness. With the advent of Big Data, traditional quality assessment techniques are facing different challenges. Therefore, we should adopt the traditional techniques to big data technologies. The goal of this thesis is to re-implement the assessment techniques in the SANSA framework.
Bachelor; Master Dr. Anisa Rula
Architecture for Multilingual Fact Validation Algorithms
Fact finders are state of the art class algorithms that compute the trustworthiness of an information source. However, they often ignore a great deal of relevant background and contextual information. The goal of this thesis is to re-design Deep Fact Validation (DeFacto) in order to create an architecture that generalizes the fact-finding process, allowing to elegantly incorporate knowledge.
Bachelor; Master Diego Esteves
Multilingual Fact Validation Algorithms
DeFacto (Deep Fact Validation) is an algorithm able to validate facts by finding trustworthy sources for them on the Web. Currently, it supports 3 main languages (en, de and fr). The goal of this thesis is to explore and implement alternative information retrieval (IR) methods to minimize the dependency of external tools on verbalizing natural language patterns. As result, we expect to enhance the algorithm performance by expanding its coverage.
Bachelor; Master Diego Esteves
Experimental Analysis of Class CS Problems
In this thesis, we explore unsolved problems of theoretical computer science with machine learning methods, especially reinforcement learning.
Bachelor; Master Diego Esteves
Generating Property Graphs from RDF using a semantic preserving conversion approach
Graph Databases are on a rise since the last decade due to their dominance in mining and analysis of complex networks. Property Graphs (PGs), one of the graph data models which Graph Databases use, are suitable for the representation of many real-life application scenarios. They allow to efficiently represent complex networks (e.g. social networks, E-commerce) and interactions. In order to leverage this advantage of graph databases, conversions of other data models to property graphs are a current area of research. The aim of this thesis is to (i) propose a novel systematic conversion approach for generating PGs from RDF (one of the graph data models) (ii) and carry out exhaustive experiments on both RDF and PG datasets with respect to their native storage databases (i.e. Graph DBs vs Triplestores). This will allow to identify the types of queries for which graph databases offer performance advantages and ideally allow to adapt the storage mechanism accordingly. The outcome of this work will be integrated into the LITMUS framework, which is an open extensible framework for benchmarking of diverse Data Management Solutions.
Master Harsh Thakkar
Understanding Short-Text: a Named Entity Recognition perspective
Named Entity Recognition (NER) models play an important role in the Information Extraction (IE) pipeline. However, despite decent performance of NER models on newswire datasets, to date, conventional approaches are not able to successfully identify classical named-entity types in short/noisy texts. This thesis will thoroughly investigate NER in microblogs and propose new algorithms to overcome current state-of-the-art models in this research area.
Bachelor; Master Diego Esteves