Papers and workshops accepted at TheWebConference (ex WWW) 2018

TheWebConference_LyonWe are very pleased to announce that our group got 2 papers accepted for presentation at the The 2018 edition of The Web Conference (27th edition of the former WWW conference), which will be held on April 23-27, 2018 in Lyon, France.
The 2018 edition of The Web Conference will offer many opportunities to present and discuss latest advances in academia and industry. This first joint call for 
contributions provides a list of the first calls for: research tracks, workshops, tutorials, exhibition, posters, demos, developers’ track, W3C track, industry track, PhD symposium, challenges, minute of madness, international project track, W4A, hackathon, the BIG web, journal track.

Here are the accepted papers with their abstracts:

  • DL-Learner – A Framework for Inductive Learning on the Semantic Web” by Lorenz Bühmann, Patrick Westphal, Jens Lehmann and Simon Bin (Journal paper track).

    Abstract: In this system paper, we describe the DL-Learner framework, which supports supervised machine learning using OWL and RDF for background knowledge representation. It can be beneficial in various data and schema analysis tasks with applications in different standard machine learning scenarios, e.g. in the life sciences, as well as Semantic Web specific applications such as ontology learning and enrichment. Since its creation in 2007, it has become the main OWL and RDF-based software framework for supervised structured machine learning and includes several algorithm implementations, usage examples and has applications building on top of the framework. The article gives an overview of the framework with a focus on algorithms and use cases.


  • Why Reinvent the Wheel- Let’s Build Question Answering Systems Together” by Kuldeep Singh, Arun Sethupat Radhakrishna, Andreas Both, Saeedeh Shekarpour, Ioanna Lytra, Ricardo Usbeck, Akhilesh Vyas, Akmal Khikmatullaev, Dharmen Punjani, Christoph Lange, Maria-Esther Vidal, Jens Lehmann and Sören Auer ( Research track).

    Abstract: Modern question answering (QA) systems need to flexibly integrate a number of components specialised to fulfil specific tasks in a QA pipeline. Key QA tasks include Named Entity Recognition and Disambiguation, Relation Extraction, and Query Building. Since a number of different software components exist, implementing different strategies for each of these tasks, a major challenge when building QA systems, is how to select and combine the most suitable components into a QA system, given the characteristics of a question. We study this optimisation problem and train Classifiers, which take features of a question as input and have the goal of optimising the selection of QA components based on those features. We then devise a greedy algorithm to identify the pipelines that include the suitable components and can effectively answer the given question. We implement this model within Frankenstein, a QA framework able to select QA components and compose QA pipelines. We evaluate the effectiveness of the pipelines generated by Frankenstein using the QALD and LC-QuAD benchmarks. These results not only suggest that Frankenstein precisely solves the QA optimisation problem, but also enables the automatic composition of optimised QA pipelines, which outperform the static Baseline QA pipeline. Thanks to this flexible and fully automated pipeline generation process, new QA components can be easily included in Frankenstein, thus improving the performance of the generated pipelines.

These work were supported by grants from the EU FP7 Programme for the project GeoKnow (GA no. 318159) as well as for the German Research Foundation project GOLD and the German Ministry for Economic Affairs and Energy project SAKE (GA no. 01MD15006E), the European Union’s Horizon 2020 research and innovation programme for the project SLIPO (GA no. 731581), the EU Horizon 2020 R&I programme for the Marie Sklodowska Curie action WDAqua (GA No 642795), Eurostars project QAMEL (E!9725) as well as the European Union’s H2020 research and innovation action HOBBIT (GA 688227) and the CSA BigDataEurope (GA No 644564).

Furthermore, we are pleased to inform that we got the following workshops, which will be co-located with The Web Conference 2018.

Here is the accepted workshops and their short description:

  • Linked Data on the Web (LDOW2018)  by Tim Berners-Lee (W3C/MIT, USA),  Sarven Capadisli (University of Bonn, Germany), Stefan Dietze (Leibniz Universität Hannover,Germany), Aidan Hogan (Universidad de Chile, Chile), Krzysztof Janowicz (University of California, Santa Barbara, US) and Jens Lehmann (University of Bonn, Germany)
    The Web is developing from a medium for publishing textual documents into a medium for sharing structured data. This trend is fueled on the one hand by the adoption of the Linked Data principles by a growing number of data providers. On the other hand, large numbers of websites have started to semantically mark up the content of their HTML pages and thus also contribute to the wealth of structured data available on the Web. The 11th Workshop on Linked Data on the Web (LDOW2018) aims to stimulate discussion and further research into the challenges of publishing, consuming, and integrating structured data from the Web as well as mining knowledge from the global Web of Data.
    Topics of interest for the workshop include, but are not limited to, the following:

    • Web Data Quality Assessment
    • Web Data Cleansing
    • Integrating Web Data from Large Numbers of Data Sources
    • Mining the Web of Data
    • Linked Data Applications

Social media hashtag: #LDOW2018

  • Semantics, Analytics, Visualisation: Enhancing Scholarly Dissemination  by Alejandra Gonzalez-Beltran (Oxford e-Research Centre, University of Oxford, Oxford, UK), Francesco Osborne (Knowledge Media Institute, Open University, Milton Keynes, UK), Silvio Peroni (Department of Computer Science and Engineering, University of Bologna, Bologna, Italy), Sahar Vahdati (Smart Data Analytics, University of Bonn, Bonn, Germany)
    After the great success of the past three editions, we are pleased to announce SAVE-SD 2018, which wants to bring together publishers, companies and researchers from different fields (including Document and Knowledge Engineering, Semantic Web, Natural Language Processing, Scholarly Communication, Bibliometrics, Human-Computer Interaction, Information Visualisation, Bioinformatics, and Life Sciences) in order to bridge the gap between the theoretical/academic and practical/industrial aspects in regards to scholarly data.
    The following topics will be addressed:

    • semantics of scholarly data, i.e. how to semantically represent, categorise, connect and integrate scholarly data, in order to foster reusability and knowledge sharing;
    • analytics on scholarly data, i.e. designing and implementing novel and scalable algorithms for knowledge extraction with the aim of understanding research dynamics, forecasting research trends, fostering connections between groups of researchers, informing research policies, analysing and interlinking experiments and deriving new knowledge;
    • visualisation of and interaction with scholarly data, i.e. providing novel user interfaces and applications for navigating and making sense of scholarly data and highlighting their patterns and peculiarities.

Looking forward to seeing you at The Web Conference (ex WWW) 2018

Christmas Time at SDA – Time to look back at 2017

christmas-xmas-christmas-tree-decoration(4658x3105)We are looking back at a busy and successful year 2017 full of new members, inspirational discussions, exciting conferences, a lot of accepted papers and awards as well as new software releases.

Below is a short summary of the main cornerstones for 2017:


The growth of the group in 2017

SDA is a new group, but not new in the field :). As a group, it was founded by Prof. Dr. Jens Lehmann at the beginning of 2016. The group has members at the University of Bonn with associated researchers at the Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS) and the Institute for Applied Computer Science Leipzig. Within 2017, the group has grown from 20 members to around 55 members (1 Professor, 1 Akademischer Rat / Assistant Professor, 11 PostDocs, 31 PhD Students,11 master students) as you can see on our team page.

An interesting future for AI and knowledge graphs

Artificial intelligence / machine learning and semantic technologies / knowledge graphs are central topics for SDA. Throughout the year, we have been able to achieve a range of interesting research achievements. This ranges from internationally leading results in question answering over knowledge graphs, to scalable distributed querying, inference and analysis of large RDF datasets as well as new perspectives on industrial data spaces and data integration. Among the race for ever improving achievements in AI, which go far beyond what many could have imagined 10 years ago, our researchers were able to deliver important contributions and continue to shape different sub areas of the growing AI research landscape.

Papers accepted

We had 46 papers accepted at well-known conferences (i.e The Web Conference 2018, WWW 2017, AAAI 2017, ISWC 2017, ESWC 2017, DEXA 2017, SEMANTiCS 2017, K-CAP 2017, WI 2017, KESW 2017, IEEE BigData 2017, NIPS 2017, TPDL 2017, ICSC 2018, ICEGOV 2018 and more. We estimate our articles to be cited around 3000+ times per year (based on Google Scholar profiles).


We received several awards in 2017 – click on the posts to find out more:


Software releases

SANSA – An open source data flow processing engine for performing distributed computation over large-scale RDF datasets had 2 successfully released during 2017 (SANSA 0.3 and SANSA 0.2).

From the funded projects we were happy to launch the final release of the Big Data Europe platform – an open source Big Data Processing Platform allowing users to install numerous big data processing tools and frameworks and create working data flow applications.

There were several other releases:

  • SML-Bench – A Structured Machine Learning benchmark framework 0.2 has been released.
  • WebVOWL – A Web-based Visualization of Ontologies had several releases in 2017.
  • – A Crowd-Sourcing platform for collaborative management of scholarly metadata reached coverage of more than 5K computer science conferences in 2017.

Furthermore, SDA deeply values team bonding activities. :-) Often we try to introduce fun activities that involve teamwork and teambuilding. At our X-mas party, we enjoyed a very international and lovely dinner together, we played a `Secret Santa` and Pantomime game.


Long-term team building through deeper discussions, genuine connections and healthy communication helps us to connect within the group!

Many thanks to all of you who have accompanied and supported us on this way!

Jens Lehmann on behalf of The SDA Research Team

SDA at NIPS 2017

NipsWe are very pleased to announce that our group got a paper accepted for presentation at the workshop on Optimal Transport and Machine Learning ( at  NIPS 2017 : The Thirty-first Annual Conference on Neural Information Processing Systems, which was held on December 4 – 9, 2017 in Long Beach, California.

The Thirty-first Annual Conference on Neural Information Processing Systems (NIPS) is a single-track machine learning and computational neuroscience conference that includes invited talks, demonstrations and oral and poster presentations of refereed papers.  NIPS has a responsibility to provide an inclusive and welcoming environment for everyone in the fields of AI and machine learning. Unfortunately, several events held at (or in conjunction with) this year’s conference fell short of these standards.

Here is the accepted paper with its abstract:

On the regularization of Wasserstein GANsby Henning Petzka, Asja Fischer, Denis Lukovnikov.

Abstract: Since their invention, generative adversarial networks (GANs) have become a popular approach for learning to model a distribution of real (unlabeled) data. Convergence problems during training are overcome by Wasserstein GANs which minimize the distance between the model and the empirical distribution in terms of a different metric, but thereby introduce a Lipschitz constraint into the optimization problem. A simple way to enforce the Lipschitz constraint on the class of functions, which can be modeled by the neural network, is weight clipping. Augmenting the loss by a regularization term that penalizes the deviation of the gradient norm of the critic (as a function of the network’s input) from one, was proposed as an alternative that improves training. We present theoretical arguments why using a weaker regularization term enforcing the Lipschitz constraint is preferable. These arguments are supported by experimental results on several data sets.

This part of work is supported by WDAqua : Marie Skłodowska-Curie Innovative Training Network (GA no. 642795).


Paper accepted at ICEGOV 2018


We are very pleased to announce that our group got a paper accepted for presentation at the 11th International Conferences on Theory and Practice of Electronic Governance  (ICEGOV) 2018, which will be held on April 4 – 6, 2018 in Galway, Ireland.

The conference focuses on the use of technology to transform the working of government and its relationships with citizens, businesses, and other non-state actors in order to improve public governance and its contribution to public policy and development (EGOV). It also promotes the interaction and cooperation between universities, research centres, governments, industries, and non-governmental organizations needed to develop the EGOV community. It is supported by a rich program of keynote lectures, plenary sessions, papers presentations within the thematic sessions, invited sessions, and networking sessions.

Here is the accepted paper with its abstract:

Classifying Data Heterogeneity within Budget and Spending Open Data” by Fathoni A. Musyaffa, Fabrizio Orlandi, Hajira Jabeen, and Maria-Esther Vidal.

Abstract: Heterogeneity problems within open budgets and spending datasets hinder effective analysis and consumption of these datasets. To understand detailed types of heterogeneities available within open budgets and spending datasets, we analyzed more than 75 datasets from different levels of public administrations. We classified and enumerated these heterogeneities, and see if the heterogeneities found can be represented using state-of-the-art data models designed for representing open budgets and spending data. In the end, lessons learned are provided for public administrators, technical and scientific communities.

This part of work is supported by DAAD and partially by EU H2020 project no. 645833 (

Looking forward to seeing you at ICEGOV2018.

SANSA 0.3 (Semantic Analytics Stack) Released

We are happy to announce SANSA 0.3 – the third release of the Scalable Semantic Analytics Stack. SANSA employs distributed computing via Apache Spark and Flink in order to allow scalable machine learning, inference and querying capabilities for large knowledge graphs.

You can find the FAQ and usage examples at

The following features are currently supported by SANSA:

  • Reading and writing RDF files in N-Triples, Turtle, RDF/XML, N-Quad format
  • Reading OWL files in various standard formats
  • Support for multiple data partitioning techniques
  • SPARQL querying via Sparqlify (with some known limitations until the next Spark 2.3.* release)
  • SPARQL querying via conversion to Gremlin path traversals (experimental)
  • RDFS, RDFS Simple, OWL-Horst (all in beta status), EL (experimental) forward chaining inference
  • Automatic inference plan creation (experimental)
  • RDF graph clustering with different algorithms
  • Rule mining from RDF graphs based AMIE+
  • Terminological decision trees (experimental)
  • Anomaly detection (beta)
  • Distributed knowledge graph embedding approaches: TransE (beta), DistMult (beta), several further algorithms planned

Deployment and getting started:

  • There are template projects for SBT and Maven for Apache Spark as well as for Apache Flink available to get started.
  • The SANSA jar files are in Maven Central i.e. in most IDEs you can just search for “sansa” to include the dependencies in Maven projects.
  • There is example code for various tasks available.
  • We provide interactive notebooks for running and testing code via Docker.

We want to thank everyone who helped to create this release, in particular the projects Big Data Europe, HOBBIT, SAKE, Big Data Ocean, SLIPO, QROWD and BETTER.

Greetings from the SANSA Development Team



Papers accepted at ICSC 2018

ICSC2018We are very pleased to announce that we got 3 papers accepted at ICSC 2018 for presentation at the main conference, which will be held on Jan 31 – Feb 2 ,2018,  California, United States.

The 12th IEEE International Conference on Semantic Computing (ICSC2018) Semantic Computing (SC) addresses the derivation, description, integration, and use of semantics (“meaning”, “context”, “intention”) for all types of resource including data, document, tool, device, process and people. The scope of SC includes, but is not limited to, analytics, semantics description languages and integration (of data and services), interfaces, and applications including biomed, IoT, cloud computing, software-defined networks, wearable computing, context awareness, mobile computing, search engines, question answering, big data, multimedia, and services.

Here is the list of the accepted paper with their abstract:

“SAANSET: Semi-Automated Acquisition of Scholarly Metaadata using Platform” by Rebaz Omar, Sahar Vahdati, Christoph Lange, Maria-Esther Vidal and Andreas Behrend

Abstract: Researchers spend a lot of time in finding information about people, events, journals, and research areas related to topics of their interest. Digital libraries and digital scholarly repositories usually offer services to assist researchers in this task. However, every research community has its own way of distributing scholarly metadata.
Mailing lists provide an instantaneous channel and are often used for discussing topics of interest to a community of researchers, or to announce important information — albeit in an unstructured way. To bring structure specifically into the announcements of events and thus to enable researchers to, e.g., filter them by relevance, we present a semi-automatic crowd-sourcing workflow that captures metadata of events from call-for-papers emails into the semantic wiki. Evaluations confirm that our approach reduces a high number of actions that researchers should do manually to trace the call for papers received via mailing lists.

“Semantic Enrichment of IoT Stream Data On-Demand” by Farah Karim, Ola Al Naameh, Ioanna Lytra, Christian Mader, Maria-Esther Vidal, and Sören Auer

Abstract: Connecting the physical world to the Internet of Things (IoT) allows for the development of a wide variety of applications. Things can be searched, managed, analyzed, and even included in collaborative games.
Industries, health care, and cities are exploiting IoT data-driven frameworks to make these organizations more efficient, thus, improving the lives of citizens. For making IoT a reality, data produced by sensors, smart phones, watches, and other wearables need to be integrated; moreover, the meaning of IoT data should be explicitly represented. However, the Big Data nature of IoT data imposes challenges that need to be addressed in order to provide scalable and efficient IoT data-driven infrastructures. We tackle these issues and focus on the problems of describing the meaning of IoT streaming data using ontologies and integrating this data in a knowledge graph.
We devise DESERT, a SPARQL query engine able to on-Demand factorizE and Semantically Enrich stReam daTa in a knowledge graph.
Resulting knowledge graphs model the semantics or meaning of merged data in terms of entities that satisfy the SPARQL queries and relationships among those entities; thus, only data required for query answering is included in the knowledge graph.
We empirically evaluate the results of DESERT on SRBench, a benchmark of Streaming RDF data.
The experimental results suggest that DESERT allows for speeding up query execution while the size of the knowledge graphs remains relatively low.


“Shipping Knowledge Graph Management Capabilities to Data Providers and Consumers” by Omar Al-Safi, Christian Mader, Ioanna Lytra, Mikhail Galkin, Kemele Endris, Maria-Esther Vidal, and Sören Auer

Abstract: The amount of Linked Data both open, made available on the Web, and private, exchanged across companies and organizations, have been increasing in recent years. This data can be distributed in form of Knowledge Graphs (KGs), but maintaining these KGs is mainly the responsibility of data owners or providers. Moreover, building applications on top of KGs in order to provide, for instance, analytics, data access control, and privacy is left to the end user or data consumers. However, many resources in terms of development costs and equipment are required by both data providers and consumers, thus impeding the development of real-world applications over KGs. We propose to encapsulate KGs as well as data processing functionalities in a client-side system called Knowledge Graph Container, intended to be used by data providers or data consumers. Knowledge Graph Containers can be tailored to the target environments, ranging from Big Data to light-weight platforms. We empirically evaluate the performance and scalability of Knowledge Graph Containers with respect to state-of-the-art Linked Data management approaches. Observed results suggest that Knowledge Graph Containers increase the availability of Linked Data, as well as efficiency and scalability of various Knowledge Graph management tasks.


These work were supported by the European Union’s H2020 research and innovation program BigDataEurope (GA no. 644564), WDAqua : Marie Skłodowska-Curie Innovative Training Network (GA no. 642795), InDaSpace :  a German Ministry for Finances and Energy research grand, DAAD Scholarship, the European Commission with a grant for the H2020 project OpenAIRE2020 (GA no. 643410) , (GA no. 645833) and by the European Union’s Horizon 2020 IoT European Platform Initiative (IoT-EPI) BioTope (GA No 688203).

Looking forward to seeing you at ICSC 2018. Wil van der Aalst visits SDA

WvdA-BvO-24059(3256x1832) Wil van der Aalst from Technische Universiteit Eindhoven (TU/e) was visiting the SDA group on the 29th of November 2017. Wil van der Aalst is a distinguished university professor at the Technische Universiteit Eindhoven (TU/e) where he is also the scientific director of the Data Science Center Eindhoven (DSC/e). Since 2003 he holds a part-time position at Queensland University of Technology (QUT). Currently, he is also a visiting researcher at Fondazione Bruno Kessler (FBK) in Trento and a member of the Board of Governors of Tilburg University. His personal research interests include process mining, Petri nets, business process management, workflow management, process modeling, and process analysis. Wil van der Aalst has published over 200 journal papers, 20 books (as author or editor), 450 refereed conference/workshop publications, and 65 book chapters. Many of his papers are highly cited (he one of the most cited computer scientists in the world; according to Google Scholar, he has an H-index of 135 and has been cited 80,000 times) and his ideas have influenced researchers, software developers, and standardization committees working on process support. Next to serving on the editorial boards of over 10 scientific journals he is also playing an advisory role for several companies, including Fluxicon, Celonis, and ProcessGold. Van der Aalst received honorary degrees from the Moscow Higher School of Economics (Prof. h.c.), Tsinghua University, and Hasselt University (Dr. h.c.). He is also an elected member of the Royal Netherlands Academy of Arts and Sciences, the Royal Holland Society of Sciences and Humanities, and the Academy of Europe. Recently, he was awarded with a Humboldt Professorship, Germany’s most valuable research award (five million euros), and will move to RWTH Aachen University at the beginning of 2018.

Prof. Jens Lehmann invited the speaker to the bi-weekly “SDA colloquium presentations”. 40-50 researchers and students from SDA attended. The goal of his visit was to exchange experience and ideas on semantic web techniques specialized for process mining, including process modeling, classifications algorithms and many more. Apart from presenting various use cases where process mining has helped scientists to get useful insights from row data, Wil van der Aalst shared with our group future research problems and challenges related to this research area and gave a talk on “Learning Hybrid Process Models from Events: Process Mining for the Real World (Slides)”

Abstract: Process mining provides new ways to utilize the abundance of event data in our society. This emerging scientific discipline can be viewed as a bridge between data science and process science: It is both data-driven and process-centric. Process mining provides a novel set of techniques to discover the real processes. These discovery techniques return process models that are either formal (precisely describing the possible behaviors) or informal (merely a “picture” not allowing for any form of formal reasoning). Formal models are able to classify traces (i.e., sequences of events) as fitting or non-fitting. Most process mining approaches described in the literature produce such models. This is in stark contrast with the over 25 available commercial process mining tools that only discover informal process models that remain deliberately vague on the precise set of possible traces. There are two main reasons why vendors resort to such models: scalability and simplicity. 

In this talk, prof. Van der Aalst will propose to combine the best of both worlds: discovering hybrid process models that have formal and informal elements. The discovered models allow for formal reasoning, but also reveal information that cannot be captured in mainstream formal models. A novel discovery algorithm returning hybrid Petri nets has been implemented in ProM and will serve as an example for the next wave of commercial process mining tools. Prof. Van der Aalst will also elaborate on his collaboration with industry. His research group at TU/e applied process mining in over 150 organizations, developed the open-source tool ProM, and influenced the 20+ commercial process mining tools available today.

During the meeting, SDA core research topics and main research projects were presented and try to find an intersection on the future collaborations with Prof. Van der Aalst  and his research group.

As an outcome of this visit, we expect to strengthen our research collaboration networks with TU/e and in the future with RWTH Aachen University, mainly on combining semantic knowledge and distributed computing and analytics.

Paper accepted at IEEE BigData 2017

IEEE-BIG-DATA17_BOSTONWe are very pleased to announce that our group got a paper accepted for presentation at IEEE BigData 2017, which will be held on December 11th-14th, 2017, Boston, MA, United States.

In recent years, “Big Data” has become a new ubiquitous term. Big Data is transforming science, engineering, medicine, healthcare, finance, business, and ultimately our society itself. The IEEE Big Data conference series started in 2013 has established itself as the top tier research conference in Big Data.
The 2017 IEEE International Conference on Big Data (IEEE Big Data 2017) will provide a leading forum for disseminating the latest results in Big Data Research, Development, and Applications.

Implementing Scalable Structured Machine Learning for Big Data in the SAKE Project” by Simon Bin, Patrick Westphal, Jens Lehmann, and Axel-Cyrille Ngomo Ngonga.

Abstract: Exploration and analysis of large amounts of machine generated data requires innovative approaches. We propose a combination of Semantic Web and Machine Learning to facilitate the analysis. First, data is collected and converted to RDF according to a schema in the Web Ontology Language OWL. Several components can continue working with the data, to interlink, label, augment, or classify. The size of the data poses new challenges to existing solutions, which we solve in this contribution by transitioning from in-memory to database.

This work was supported in part by a research grant from the German Ministry for Finances and Energy under the SAKE project (Grant agreement No. 01MD15006E) and by a research grant from the European Union’s Horizon 2020 research and innovation programme under the SLIPO project (Grant agreement No. 731581).

Looking forward to seeing you at IEEE BigData 2017.

“A Corpus for Complex Question Answering over Knowledge Graphs” elected as Paper of the month at FraunhoferIAIS

DOMfM_yX0AAQopVWe are very pleased to announce that our paper “A Corpus for Complex Question Answering over Knowledge Graphs” by Priyansh TrivediGaurav MaheshwariMohnish Dubey and Jens Lehmann has been elected as the Paper of the month at Fraunhofer IAIS. This award is given to publications that have a high innovation impact in the research field after a committee evaluation.

This research paper has been accepted on ISWC 2017 main conference and the paper presents a large gold standard Question Answering Dataset over DBpedia, and the accompanying framework to make the dataset. This is the largest QA dataset having 5000 questions, and their corresponding SPARQL query. This paper was nominated for the “Best Student Paper Award” in the resource track.

Abstract: Being able to access knowledge bases in an intuitive way has been an active area of research over the past years. In particular, several question answering (QA) approaches which allow to query RDF datasets in natural language have been developed as they allow end users to access knowledge without needing to learn the schema of a knowledge base and learn a formal query language. To foster this research area, several training datasets have been created, e.g.~in the QALD (Question Answering over Linked Data) initiative. However, existing datasets are insufficient in terms of size, variety or complexity to apply and evaluate a range of machine learning based QA approaches for learning complex SPARQL queries. With the provision of the Large-Scale Complex Question Answering Dataset (LC-QuAD), we close this gap by providing a dataset with 5000 questions and their corresponding SPARQL queries over the DBpedia dataset.In this article, we describe the dataset creation process and how we ensure a high variety of questions, which should enable to assess the robustness and accuracy of the next generation of QA systems for knowledge graphs.

The paper and authors were honored for this publication in a special event at Fraunhofer Schloss Birlinghoven, Sankt Augustin, Germany.


SDA at ISWC 2017 – A Ten-Year Best Paper and a Demo Award


The International Semantic Web Conference (ISWC) is the premier international forum where Semantic Web / Linked Data researchers, practitioners, and industry specialists come together to discuss, advance, and shape the future of semantic technologies on the web, within enterprises and in the context of the public institution.

 We are very pleased to announce that we got 6 papers accepted at ISWC 2017 for presentation at the main conference. Additionally, we also had 6 Posters/Demo papers accepted.

Furthermore, we are happy to win the SWSA Ten-Year Best Paper Award, which recognizes the highest impact papers from the 6th International Semantic Web Conference in Busan, Korea in 2007.
Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, Zachary G. Ives. DBpedia: A Nucleus for a Web of Open Data


In addition to this award, we are very happy to announce that we won the Best Demo Award for the SANSA Notebooks:
The Tale of Sansa Spark” by Ivan Ermilov, Jens Lehmann, Gezim Sejdiu, Buehmann Lorenz, Patrick Westphal, Claus Stadler, Simon Bin, Nilesh Chakraborty, Henning Petzka, Muhammad Saleem, Axel-Cyrille Ngonga Ngomo and Hajira Jabeen.

Here are some further pointers in case you want to know more about SANSA:

The audience displayed enthusiasm during the demonstration appreciating the work and asking questions regarding the future of SANSA, technical details and possible synergy with industrial partners and projects. Gezim Sejdiu and Jens Lehmann, who were presenting the demo, were talking 3+ hours non-stop (without even time to eat 😉 ).

Among the other presentations, our colleagues presented the following presentations:


ISWC17 was a great venue to meet the community, create new connections, talk about current research challenges, share ideas and settle new collaborations. We look forward to the next ISWC conference.

Until then, meet us at SDA !