We are very pleased to announce that our group got a paper accepted at the Oxford Bioinformatics Journal.
Oxford Bioinformatics Journal is a bi-weekly peer-reviewed scientific journal that focuses on genome bioinformatics and computational biology. The journal is leading its field, and publishes scientific papers that are relevant to academic and industrial researchers.
Here is the pre-print of the accepted paper with its abstract:
- BioKEEN: A library for learning and evaluating biological knowledge graph embeddings by Mehdi Ali, Charles Tapley Hoyt, Daniel Domingo-Fernandez, Jens Lehmann, and Hajira Jabeen.
Abstract: Knowledge graph embeddings (KGEs) have received significant attention in other domains due to their ability to predict links and create dense representations for graphs’ nodes and edges. However, the software ecosystem for their application to bioinformatics remains limited and inaccessible for users without expertise in programming and machine learning. Therefore, we developed BioKEEN (Biological KnowlEdge EmbeddiNgs) and PyKEEN (Python KnowlEdge EmbeddiNgs) to facilitate their easy use through an interactive command line interface. Finally, we present a case study in which we used a novel biological pathway mapping resource to predict links that represent pathway crosstalks and hierarchies. Availability: BioKEEN and PyKEEN are open source Python packages publicly available under the MIT License at https://github.com/SmartDataAnalytics/BioKEEN and https://github.com/SmartDataAnalytics/PyKEEN as well as through PyPI.
We thank our partners from the Bio2Vec, MLwin, and SimpleML projects for their assistance. This research was supported by Bio2Vec project (http://bio2vec.net/, CRG6 grant 3454) with funding from King Abdullah University of Science and Technology (KAUST).
We are very pleased to announce that our group got two papers got accepted for presentation at the Thirty-First The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19) workshops (ComplexQA 2019 and RecNLP 2019), which will be held January 27 – February 1, 2019 at the Hilton Hawaiian Village, Honolulu, Hawaii, USA.
The purpose of the Association for the Advancement of Artificial Intelligence (AAAI) conference series is to promote research in artificial intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers in AI and its affiliated disciplines.
Reasoning for Complex Question Answering Workshop is a new series of workshops on the reasoning for complex question answering (QA). QA has become a crucial application problem in evaluating the progress of AI systems in the realm of natural language processing and understanding, and to measure the progress of machine intelligence in general. The computational linguistics communities (ACL, NAACL, EMNLP et al.) have devoted significant attention to the general problem of machine reading and question answering, as evidenced by the emergence of strong technical contributions and challenge datasets such as SQuAD. However, most of these advances have focused on “shallow” QA tasks that can be tackled very effectively by existing retrieval-based techniques. Instead of measuring the comprehension and understanding of the QA systems in question, these tasks test merely the capability of a technique to “attend” or focus attention on specific words and pieces of text. The main aim of this workshop is to bring together experts from the computational linguistics (CL) and AI communities to: (1) catalyze progress on the CQA problem, and create a vibrant test-bed of problems for various AI sub-fields; and (2) present a generalized task that can act as a harbinger of progress in AI.
Recommender Systems Meet Natural Language Processing (RecNLP) is an interdisciplinary workshop covering the intersection between Recommender Systems (RecSys) and Natural Language Processing (NLP). The primary goal of RecNLP is to identify common ideas and techniques that are being developed in both disciplines, and to further explore the synergy between the two and to bring together researchers from both domains to encourage and facilitate future collaborations.
Here is the pre-print of the accepted papers with their abstract:
- Translating Natural Language to SQL using Pointer-Generator Networks and How Decoding Order Matters by Denis Lukovnikov, Nilesh Chakraborty, Jens Lehmann and Asja Fischer
Abstract: Translating natural language to SQL queries for table-based question answering is a challenging problem and has received significant attention from the research community. In this work, we extend a pointer-generator network and investigate how query decoding order matters in semantic parsing for SQL. Even though our model is a straightforward extension of a general-purpose pointer-generator, it outperforms early work for WikiSQL and remains competitive to concurrently introduced, more complex models. Moreover, we provide a deeper investigation of the potential “order-matters” problem due to having multiple correct decoding paths, and investigate the use of REINFORCE as well as a non-deterministic oracle in this context.
- Metaresearch Recommendations using Knowledge Graph Embeddings by Veronika Henk, Sahar Vahdati, Mojataba Nayyeri, Mehdi Ali, Hamed Shariat Yazdi and Jens Lehmann
Abstract: Discovering relevant research collaborations is crucial for performing extraordinary research and promoting the careers of scholars. Therefore, building recommender systems capable of suggesting relevant collaboration opportunities is of huge interest. Most of the existing approaches for collaboration and co-author recommendation focus on semantic similarities using bibliographic metadata such as publication counts, and citation network analysis. These approaches neglect relevant and important metadata information such as author affiliation and conferences attended, affecting the quality of the recommendations. To overcome these drawbacks, we formulate the task of scholarly recommendation as a link prediction task based on knowledge graph embeddings. A knowledge graph containing scholarly metadata is created and enriched with textual descriptions. We tested the quality of the recommendations based on the TransE, TranH and DistMult models that consider only triples in the knowledge graph and DKRL which in addition incorporates natural language descriptions of entities during training.
Looking forward to seeing you at The AAAI-19.
2019 has just started and we want to take a moment to look back at a very busy and successful year 2018, full of new members, inspirational discussions, exciting conferences, accepted research papers, new software releases and a lot of highlights we had throughout the year.
Below is a short summary of the main cornerstones for 2018:
An interesting future for AI and knowledge graphs
Artificial intelligence/machine learning and semantic technologies/knowledge graphs are central topics for SDA. Throughout the year, we have been able to accomplish a range of interesting research achievements. One particularly active area was question answering and dialogue systems (with and without knowledge graphs). We acquired new projects for more than a million Euro this year and were able to transfer our expertise to industry via successful projects at Fraunhofer. External interest in our results has been remarkably high. Furthermore, we extended our already established position in scalable distributed querying, inference, and analysis of large RDF datasets. Among the race for ever-improving achievements in AI, which has gone far beyond what many could have imagined 10 years ago, our researchers were able to deliver important contributions and continued to shape different sub-areas of the growing AI research landscape.
We had 41 papers accepted at well-known conferences (i.e., the AAAI 2019 workshops, ISWC 2018, ESWC 2018, Nature Scientific Data Journal, Journal of Web Semantics, Semantic Web Journal, WWW 2018 workshops, EMNLP 2018 workshops, ECML 2018 workshops, CoNLL 2018, SIGMOD 2018 workshops, SIGIR 2018, ICLR 2018, EKAW 2018, SEMANTiCS 2018, ICWE 2018, ICSC 2018, TPDL 2018, JURIX 2018 and more. We estimate that SDA members had approximately 2500+ citations per year (based on Google Scholar profiles).
SANSA – An open source data flow processing engine for performing distributed computation over large-scale RDF datasets had 2 successfully released during 2018 (SANSA 0.5 and SANSA 0.4).
From the funded projects we were happy to launch the first major release of the Big Data Ocean platform – a platform for Exploiting Ocean’s of Data for Maritime Applications.
There were several other releases:
- SML-Bench – A Structured Machine Learning benchmark framework 0.2 has been released.
- WebVOWL – A web-based visualization for ontologies had several releases in 2018. AS a major new feature characterizing WebVOWL is the integration of the WebVOWL Editor – a Device-Independent Visual Ontology Modeling.
- AskNowQA – A Suite of Natural Language interaction technologies that behave intelligently through domain knowledge. The 0.1 version has been released.
- Move to the brand new Computer Science Campus: After many delays, we finally moved into our new campus where we have modern rooms and equipment.
- A Best Demo Award at ISWC 2018
- Two PhD defenses: Mikhail Galkin and Lavdim Halilaj both successfully defended their PhD thesis. Congratulations to them again! Four more theses have been submitted, with defenses scheduled for January and February.
- Many invited speakers (Prof. Dr. John Domingue, Prof. Dr. Khalid Saeed, Dr. Anastasia Dimou, Svitlana Vakulenko and Dr. Katherine Thornton).
- We did an off-site meeting together with the EIS department of Fraunhofer IAIS, at their place.
Likewise, SDA deeply values team bonding activities. Often we try to introduce fun activities that involve teamwork and teambuilding. At our X-mas party, we enjoyed a very international and lovely dinner together while exchanging a `Secret Santa` gifts and played some ad-hoc games.
Long-term team building through deeper discussions, genuine connections and healthy communication helps us to connect within the group!
Many thanks to all who have accompanied and supported us on this way! So from all of us at SDA, we wish you a wonderful new year!
Jens Lehmann on behalf of The SDA Research Team
Katherine Thornton is an information scientist at the Yale University Library working on creating metadata as linked open data. Katherine earned a PhD in Information Science from the University of Washington in 2016 and works on the Scaling Emulation as a Service Infrastructure (EaaSI) project describing the software and configured environments in Wikidata. Katherine has been a volunteer contributor to the Wikidata project since 2012.
Dr. Thornton was invited to give a talk on “Sharing RDF data models and validating RDF graphs with ShEx“ and “Documenting and preserving programming languages and software in Wikidata” at the SWIB conference (Semantic Web in Libraries). SWIB conference is an annual conference, being held for the 10th time, focusing on Linked Open Data (LOD) in libraries and related organizations. It is well established as an event where IT staff, developers, librarians, and researchers from all over the world meet and mingle and learn from each other. The topics of talks and workshops at SWIB revolve around opening data, linking data and creating tools and software for LOD production scenarios. These areas of focus are supplemented by presentations of research projects in applied sciences, industry applications, and LOD activities in other areas.
At the bi-weekly “SDA colloquium presentations” she gave a talk on “Wikidata for Digital Preservation” and describe the workflow of creating the metadata for resources in the domain of computing using the Wikidata platform. While reusing these URIs in metadata to describe pre-configured emulated computing environments in which users can interact with legacy software. She introduced this project in the context of current work at Yale University Library to provide Emulation as a Service. Afterwords, she discussed her data curation work in Wikidata as well as the Wikidata for Digital Preservation portal available at wikidp.org. WikiDP is a streamlined interface for the digital preservation community to interact with Wikidata. The system is available online at http://wikidp.org.
The goal of Dr. Thornton’s visit was to exchange experience and ideas on digital preservation using RDF technologies. In addition to presenting various use-cases where these technologies have applied, Dr. Thornton shared with our group future research problems and challenges related to this research area. During the meeting, SDA core research topics and main research projects were presented and we investigated suitable topics for future collaborations with Dr. Thornton and her research group.
We are happy to announce SANSA 0.5 – the fifth release of the Scalable Semantic Analytics Stack. SANSA employs distributed computing via Apache Spark and Flink in order to allow scalable machine learning, inference and querying capabilities for large knowledge graphs.
- Website: http://sansa-stack.net
- GitHub: https://github.com/SANSA-Stack
- Download: http://sansa-stack.net/downloads-usage/
- ChangeLog: https://github.com/SANSA-Stack/SANSA-Stack/releases
You can find the FAQ and usage examples at http://sansa-stack.net/faq/.
The following features are currently supported by SANSA:
- Reading and writing RDF files in N-Triples, Turtle, RDF/XML, N-Quad format
- Reading OWL files in various standard formats
- Query heterogeneous sources (Data Lake) using SPARQL – CSV, Parquet, MongoDB, Cassandra, JDBC (MySQL, SQL Server, etc.) are supported
- Support for multiple data partitioning techniques
- SPARQL querying via Sparqlify and Ontop
- Graph-parallel querying of RDF using SPARQL (1.0) via GraphX traversals (experimental)
- RDFS, RDFS Simple and OWL-Horst forward chaining inference
- RDF graph clustering with different algorithms
- Terminological decision trees (experimental)
- Knowledge graph embedding approaches: TransE (beta), DistMult (beta)
Noteworthy changes or updates since the previous release are:
- A data lake concept for querying heterogeneous data sources has been integrated into SANSA
- New clustering algorithms have been added and the interface for clustering has been unified
- Ontop RDB2RDF engine support has been added
- RDF data quality assessment methods have been substantially improved
- Dataset statistics calculation has been substantially improved
- Improved unit test coverage
Deployment and getting started:
- There are template projects for SBT and Maven for Apache Spark as well as for Apache Flink available to get started.
- The SANSA jar files are in Maven Central i.e. in most IDEs you can just search for “sansa” to include the dependencies in Maven projects.
- Example code is available for various tasks.
- We provide interactive notebooks for running and testing code via Docker.
Greetings from the SANSA Development Team
We are very pleased to announce that our group got a paper accepted at the Scientific Data journal – an open access publication from the Nature Research for the descriptions of scientifically valuable datasets.
Nature is a weekly international journal publishing the finest peer-reviewed research in all fields of science and technology on the basis of its originality, importance, interdisciplinary interest, timeliness, accessibility, elegance and surprising conclusions. Nature also provides rapid, authoritative, insightful and arresting news and interpretation of topical and coming trends affecting science, scientists and the wider public. Scientific Data is a peer-reviewed, open-access journal for descriptions of scientifically valuable datasets, and research that advances the sharing and reuse of scientific data. It covers a broad range of research disciplines, including descriptions of big or small datasets, from major consortiums to single research groups. Scientific Data primarily publishes Data Descriptors, a new type of publication that focuses on helping others reuse data, and crediting those who share.
Here is the accepted paper with its abstract:
- “A linked open data representation of patents registered in the US from 2005-2017” by Mofeed Hassan, Amrapali Zaveri, Jens Lehmann
Abstract: Patents are widely used to protect intellectual property and a measure of innovation output. Each year, the USPTO grants over 150,000 patents to individuals and companies all over the world. In fact, there were more than 280,000 patent grants issued in the US in 2015. However, accessing, searching and analyzing those patents is often still cumbersome and inefficient. To overcome those problems, Google indexes patents and converts them to Extensible Markup Language (XML) files using Optical Character Recognition (OCR) techniques. In this article, we take this idea one step further and provide semantically rich, machine-readable patents using the Linked Data principles. We have converted the data spanning 12 years – i.e. 2005 – 2017 from XML to Resource Description Framework (RDF) format, conforming to the Linked Data principles and made them publicly available for re-use. This data can be integrated with other data sources in order to further simplify use cases such as trend analysis, structured patent search & exploration and societal progress measurements. We describe the conversion, publishing, interlinking process along with several use cases for the USPTO Linked Patent data.
John Domingue is a full Professor at the Open University and Director of the Knowledge Media Institute in Milton Keynes, focusing on research in the Semantic Web, Linked Data, Services, Blockchain, and Education. He also serves as the President of STI International, a semantics focused networking organization which runs the ESWC conference series.
His current work focuses on how a combination of blockchain and Linked Data technologies can be used to process personal data in a decentralized trusted manner and how this can be applied in the educational domain (see http://blockchain.open.ac.uk/). This work is funded by a number of projects. The Institute of Coding is a £20M funded UK initiative which aims to increase the graduate computing skills base in the UK. As leader of the first of five project Themes John Domingue is focusing on the use of blockchain micro-accreditation to support the seamless transition of learners between UK universities and UK industry. From January 2019, he will play a leading role in the EU funded QualiChain project which has the aim of revolutionizing public education and its relationship to the labor market and policy-making by disrupting the way accredited educational titles and other qualifications are archived, managed, shared and verified, taking advantage of blockchain, semantics, data analytics and gamification technologies.
From January 2015 to January 2018 he served as the Project Coordinator for the European Data Science Academy which aimed to address the skills gap in data science across Europe. The project was a success leading to a number of outcomes including a combined data science skills and courses portal enabling learners to find jobs across Europe which match their qualifications.
Prof. Domingue was invited to give a talk “Towards the Decentralisation of Personal Data through Blockchains and Linked Data“ at the Computer Science Colloquium at the University of Bonn co-organized by SDA.
At the bi-weekly “SDA colloquium presentations” he presented KMi and the main research topics of the institute. The goal of Prof. Domingue’s visit was to exchange experience and ideas on decentralized applications using blockchains technologies in combination with Linked Data. In addition to presenting various use-cases where blockchains and linked data technologies have helped communities to get useful insights, Prof. Dr. Domingue shared with our group future research problems and challenges related to this research area. During the meeting, SDA core research topics and main research projects were presented and we investigated suitable topics for future collaborations with Prof. Domingue and his research group.
We are very pleased to announce that our group got one paper accepted for presentation at The 31st international conference on Legal Knowledge and Information Systems (JURIX 2018) conference, which will be held on December 12–14, 2018 in Groningen, The Netherland.
JURIX organizes yearly conferences on the topic of Legal Knowledge and Information Systems. The proceedings of the conferences are published in the Frontiers of Artificial Intelligence and Applications series of IOS Press.
The JURIX conference attracts a wide variety of participants, coming from the government, academia, and business. It is accompanied by workshops on topics ranging from eGovernment, legal ontologies, legal XML, alternative dispute resolution (ADR), argumentation, deontic logic, etc.
Here is the accepted paper with its abstract:
- “A Question Answering System on Regulatory Documents” by Diego Collarana, Timm Heuss, Jens Lehmann, Ioanna Lytra, Gaurav Maheshwari, Rostislav Nedelchev, Thorsten Schmidt, Priyansh Trivedi.
Abstract: In this work, we outline an approach for question answering over regulatory documents. In contrast to traditional means to access information in the domain, the proposed system attempts to deliver an accurate and precise answer to user queries. This is accomplished by a two-step approach which first selects relevant paragraphs given a question; and then compares the selected paragraph with user query to predict a span in the paragraph as the answer. We employ neural network-based solutions for each step and compare them with existing, and alternate baselines. We perform our evaluations with a gold-standard benchmark comprising over 600 questions on the MaRisk regulatory document. In our experiments, we observe that our proposed system outperforms other baselines.
This research was partially supported by an EU H2020 grant provided for the WDAqua project (GA no. 642795).
Looking forward to seeing you at the JURIX 2018.
Khalid Saeed is a full Professor of Computer Science in the Faculty of Computer Science at Bialystok University of Technology and Faculty of Mathematics and Information Science at Warsaw University of Technology, Poland. He was with AGH Krakow in 2008-2014.
Khalid Saeed received the BSc Degree in Electrical and Electronics Engineering from Baghdad University in 1976, the MSc and PhD Degrees from the Wroclaw University of Technology in Poland in 1978 and 1981, respectively. He was nominated by the President of Poland for the title of Professor in 2014. He received his DSc Degree (Habilitation) in Computer Science from the Polish Academy of Sciences in Warsaw in 2007. He has published more than 200 publications – 23 edited books and 8 text and reference books. He supervised more than 110 MSc and 12 PhD theses. He received more than 20 academic awards. His areas of interest are Image Analysis and Processing, Biometrics and Computer Information Systems.
Prof. Jens Lehmann invited the speaker to the bi-weekly “SDA colloquium presentations”. 20-30 researchers and students from SDA attended. The goal of his visit was to exchange experience and ideas on biometrics applications in daily life, including face recognition, fingerprints, privacy and many more. Apart from presenting various use-cases where biometrics has helped scientists to get useful insights from image analysis and processing and row data, Prof. Dr. Saeed shared with our group future research problems and challenges related to this research area and gave a talk on “Biometrics in everyday life”.
As part of a national BMBF funded project, Prof. Saeed (BUT) is cooperating currently with Fraunhofer IAIS in the field of cognitive engineering, and as an outcome of this visit, we expect to strengthen our research collaboration networks with WUT and BUT, mainly on combining semantic knowledge and Ubiquitous Computing and its applications; Emotion Detection and Kansei Engineering.
Simultaneously, this talk was a continuous networking within EU H2020 LAMBDA project (Learning, Applying, Multiplying Big Data Analytics) and as part of this event, Dr. Valentina Janev from the Institute “Mihajlo Pupin” (PUPIN) was attending the SDA meeting to investigate further networking with potential partners from Poland as well. Among other points, co-organizing coming conferences and writing joint-research papers have been discussed.
The International Semantic Web Conference (ISWC) is the premier international forum where Semantic Web / Linked Data researchers, practitioners, and industry specialists come together to discuss, advance, and shape the future of semantic technologies on the web, within enterprises and in the context of the public institution.
We are very pleased to announce that we got 3 papers accepted at ISWC 2018 for presentation at the main conference. Additionally, we also had 5 Posters/Demo papers accepted.
Furthermore, we are very happy to announce that we won the Best Demo Award for the WebVOWLEditor: “WebVOWL Editor: Device-Independent Visual Ontology Modeling” by Vitalis Wiens, Steffen Lohmann, and Sören Auer.
— iswc2018 (@iswc2018) October 12, 2018
Here are some further pointers in case you want to know more about WebVOWL Editor:
- GitHub: https://github.com/VisualDataWeb/WebVOWL/tree/vowl_editor
- Demo: https://www.youtube.com/watch?v=XWXhpEr9LPY
Among the other presentations, our colleagues presented the following presentations:
- “EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs” by Mohnish Dubey, Debayan Banerjee, Debanjan Chaudhuri and Jens Lehmann
Mohnish Dubey presented EARL: A relation & entity linking for DBpedia Question Answering on LC-QuAD via Elasticsearch using fastText embeddings and LSTM. It proposed two fold approaches, using GTSP solver and connection density (3 features) classifier for adaptive re-ranking.
@MohnishDubey is presenting “EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs” for the Research Track at #iswc2018 https://t.co/TCaWRGGqf9 pic.twitter.com/Csc8aOLdjZ
— SDA Research (@SDA_Research) October 10, 2018
- “DistLODStats: Distributed Computation of RDF Dataset Statistics” by Gezim Sejdiu, Ivan Ermilov, Jens Lehmann and Mohamed Nadjib Mami
Gezim Sejdiu presented DistLODStats, a novel software component for distributed in-memory computation of RDF Datasets statistics implemented using the Spark framework. The tool is maintained and has an active community due to its integration into the larger framework, SANSA.
— SDA Research (@SDA_Research) October 11, 2018
- “Synthesizing Knowledge Graphs from web sources with the MINTE+ framework” by Diego Collarana, Mikhail Galkin, Christoph Lange, Simon Scerri, Sören Auer and Maria-Esther Vidal
Diego Collarana presented the synthesizing KG from different web sources using MINTE+, an RDF Molecule-Based Integration Framework, in three domain-specific applications.
— SDA Research (@SDA_Research) October 10, 2018
- Visualization and Interaction for Ontologies and Linked Data (VOILA 2018)
Steffen Lohmann co-organized the International Workshop on Visualization and Interaction for Ontologies and Linked Data (VOILA 2018) for the third time at ISWC. Overall, more than 40 researchers and practitioners took part in this full-day event featuring talks, discussions, and tool demonstrations, including an interactive demo session. The workshop proceedings are published as CEUR-WS vol. 2187.
ISWC18 was a great venue to meet the community, create new connections, talk about current research challenges, share ideas and settle new collaborations. We look forward to the next ISWC conference.
Until then, meet us at SDA!