PhD Student
Computer Science Institute
University of Bonn

Profiles: LinkedInGoogle Scholar, DBLP

Room A120
Römerstr. 164, 53117 Bonn
University of Bonn, Computer Science
esteves@cs.uni-bonn.de

Short CV


Diego Esteves is a PhD Student & Research Associate at the University of Bonn. Esteves’ research interests are in the area of Fact Checking on the Web and Machine Learning Metadata.

Research Interests


  • Fact Checking
  • Natural Language Processing (NLP)
  • Machine Learning (ML)
  • Reproducible Research (RR)
  • #metadata

Projects


Teaching


  • Natural Language Processing – Lab
  • Natural Language Processing – Seminars

Publications


2017

  1. Named Entity Recognition in Twitter using Images and Textby D. Esteves, R. Peres, J. Lehmann and G. Napolitano in The 3rd International Workshop on Natural Language Processing for Informal Text (NLPIT 2017) co-located with the International Conference on Web Engineering (ICWE 2017)
  2. LOG4MEX: A Library to Export Machine Learning Experimentsby D. Esteves, D. Moussallem, T. Soru, C. B. Neto, J. Lehmann, A. N. Ngomo and J. C. Duarte In Web Intelligence (WI), 2017 IEEE/WIC/ACM International Conference on 2017. IEEE.
  3. An Interoperable Service for the Provenance of Machine Learning Experimentsby J. C. Duarte, M. C. R. Cavalcanti, I. S. Costa, and D. Esteves In Web Intelligence (WI), 2017 IEEE/WIC/ACM International Conference on 2017. IEEE.
  4. IDOL: Comprehensive & Complete LOD Insights” by C. Baron Neto, D. Kontokostas, G. Publio, D. Esteves, A. Kirschenbaum and S. Hellmann In 13th International Conference on Semantic Systems (SEMANTiCS 2017), 11-14 September 2017, Amsterdam, Holland.

2016

  • C. B. Neto, D. Esteves, T. Soru, D. Moussallem, A. Valdestilhas, and E. Marx, “Wasota: what are the states of the art?,” in 12th international conference on semantic systems (semantics 2016), 12-15 september 2016, leipzig, germany (posters & demos), 2016.
    [BibTeX] [Abstract] [Download PDF]
    Presently, an amount of publications in Machine Learning and Data Mining contexts are contributing to the improvement of algorithms and methods in their respective fields. However, with regard to publication and sharing of scientific experiment achievements, we still face problems on searching and ranking these methods. Scouring the Internet to search state-of-the-art information about specific contexts, such as Named Entity Recognition (NER), is often a time-consuming task. Besides, this process can lead to an incomplete investigation, either because search engines may return incomplete information or keywords may not be properly defined. To bridge this gap, we present WASOTA, a web portal specifically designed to share and readily present metadata about the state of the art on a specific domains, making the process of searching this information easier.

    @InProceedings{wasota2016,
    Title = {WASOTA: What are the states of the art?},
    Author = {Ciro Baron Neto and Diego Esteves and Tommaso Soru and Diego Moussallem and Andre Valdestilhas and Edgard Marx},
    Booktitle = {12th International Conference on Semantic Systems (SEMANTiCS 2016), 12-15 September 2016, Leipzig, Germany (Posters \& Demos)},
    Year = {2016},
    Abstract = {Presently, an amount of publications in Machine Learning and Data Mining contexts are contributing to the improvement of algorithms and methods in their respective fields. However, with regard to publication and sharing of scientific experiment achievements, we still face problems on searching and ranking these methods. Scouring the Internet to search state-of-the-art information about specific contexts, such as Named Entity Recognition (NER), is often a time-consuming task. Besides, this process can lead to an incomplete investigation, either because search engines may return incomplete information or keywords may not be properly defined. To bridge this gap, we present WASOTA, a web portal specifically designed to share and readily present metadata about the state of the art on a specific domains, making the process of searching this information easier.},
    Keywords = {2016 mex baron ciro esteves moussallem soru marx valdestilhas aksw mole simba group_aksw},
    Url = {http://wasota.aksw.org/#/home}
    }

  • A. Lawrynowicz, D. Esteves, P. Panov, T. Soru, S. Dzeroski, and J. Vanschoren, “The algorithm-implementation-execution ontology design pattern,” in Workshop on ontology and semantic web patterns (7th edition) – wop2016, 2016.
    [BibTeX] [Download PDF]
    @InProceedings{AIEDP2016,
    Title = {The Algorithm-Implementation-Execution Ontology Design Pattern},
    Author = {Agnieszka Lawrynowicz and Diego Esteves and Pance Panov and Tommaso Soru and Saso Dzeroski and Joaquin Vanschoren},
    Booktitle = {Workshop on Ontology and Semantic Web Patterns (7th edition) - WOP2016},
    Year = {2016},
    Series = {WOP2016},
    Url = {http://ontologydesignpatterns.org/wiki/images/4/41/WOP2016_paper_07.pdf}
    Keywords = {group_aksw esteves soru 2016}
    }

  • D. Esteves, P. N. Mendes, D. Moussallem, J. C. Duarte, A. Zaveri, J. Lehmann, C. B. Neto, I. Costa, and M. C. Cavalcanti, “MEX Interfaces: Automating Machine Learning Metadata Generation,” in 12th international conference on semantic systems (semantics 2016), 12-15 september 2016, leipzig, germany, 2016.
    [BibTeX] [Abstract] [Download PDF]
    Despite recent efforts to achieve a high level of interoperability of Machine Learning (ML) experiments, positively collaborating with the Reproducible Research context, we still run into the problems created due to the existence of different ML platforms: each of those have a specific conceptualization or schema for representing data and metadata. This scenario leads to an extra coding-effort to achieve both the desired interoperability and a better provenance level as well as a more automatized environment for obtaining the generated results. Hence, when using ML libraries, it is a common task to re-design specific data models (schemata) and develop wrappers to manage the produced outputs. In this article, we discuss this gap focusing on the solution for the question: “What is the cleanest and lowest-impact solution to achieve both higher interoperability and provenance metadata levels in the Integrated Development Environments (IDE) context and how to facilitate the inherent data querying task?”. We introduce a novel and low impact methodology specifically designed for code built in that context, combining semantic web concepts and reflection in order to minimize the gap for exporting ML metadata in a structured manner, allowing embedded code annotations that are, in run-time, converted in one of the state-of-the-art ML schemas for the Semantic Web: the MEX Vocabulary.

    @InProceedings{estevesMEX2016,
    Title = {{MEX} {I}nterfaces: {A}utomating {M}achine {L}earning {M}etadata {G}eneration},
    Author = {Diego Esteves and Pablo N. Mendes and Diego Moussallem and Julio Cesar Duarte and Amrapali Zaveri and Jens Lehmann and Ciro Baron Neto and Igor Costa and Maria Claudia Cavalcanti},
    Booktitle = {12th International Conference on Semantic Systems (SEMANTiCS 2016), 12-15 September 2016, Leipzig, Germany},
    Year = {2016},
    Abstract = {Despite recent efforts to achieve a high level of interoperability of Machine Learning (ML) experiments, positively collaborating with the Reproducible Research context, we still run into the problems created due to the existence of different ML platforms: each of those have a specific conceptualization or schema for representing data and metadata. This scenario leads to an extra coding-effort to achieve both the desired interoperability and a better provenance level as well as a more automatized environment for obtaining the generated results. Hence, when using ML libraries, it is a common task to re-design specific data models (schemata) and develop wrappers to manage the produced outputs. In this article, we discuss this gap focusing on the solution for the question: ``What is the cleanest and lowest-impact solution to achieve both higher interoperability and provenance metadata levels in the Integrated Development Environments (IDE) context and how to facilitate the inherent data querying task?''. We introduce a novel and low impact methodology specifically designed for code built in that context, combining semantic web concepts and reflection in order to minimize the gap for exporting ML metadata in a structured manner, allowing embedded code annotations that are, in run-time, converted in one of the state-of-the-art ML schemas for the Semantic Web: the MEX Vocabulary.},
    Bdsk-url-1 = {https://www.researchgate.net/publication/305143958_MEX_InterfacesAutomating_Machine_Learning_Metadata_Generation},
    Keywords = {mex 2016 sys:relevantFor:infai sys:relevantFor:bis hobbit projecthobbit esteves baron group_aksw lehmann sda mole moussallem MOLE},
    Url = {https://www.researchgate.net/publication/305143958_MEX_InterfacesAutomating_Machine_Learning_Metadata_Generation}
    }

2015

  • R. Speck, D. Esteves, J. Lehmann, and A. Ngonga Ngomo, “Defacto – a multilingual fact validation interface,” in 14th international semantic web conference (iswc 2015), 11-15 october 2015, bethlehem, pennsylvania, usa (semantic web challenge proceedings), 2015.
    [BibTeX] [Abstract] [Download PDF]
    The curation of a knowledge base is a key task for ensuring the correctness and traceability of the knowledge provided in the said knowledge. This task is often carried out manually by human curators, who attempt to provide reliable facts and their respective sources in a three-step process: issuing appropriate keyword queries for the fact to check using standard search engines, retrieving potentially relevant documents and screening those documents for relevant content. However, this process is very time-consuming, mainly due to the human curators having to scrutinize the web pages retrieved by search engines. This demo paper demonstrate the RESTful implementation for DeFacto (Deep Fact Validation) ???????? an approach able to validate facts in RDF by finding trustworthy sources for them on the Web. DeFacto aims to support the validation of facts by supplying the user with (1) relevant excerpts of web pages as well as (2) useful additional information including (3) a score for the confidence DeFacto has in the correctness of the input fact. To achieve this goal, DeFacto collects and combines evidence from web pages written in several languages. We also provide an extension for finding similar resources obtained from the Linked Data, using the sameas.org service as backend. In addition, DeFacto provides support for facts with a temporal scope, i.e., it can estimate the time frame within which a fact was valid.

    @InProceedings{defactorest,
    Title = {DeFacto - A Multilingual Fact Validation Interface},
    Author = {Ren{\'e} Speck and Diego Esteves and Jens Lehmann and Axel-Cyrille {Ngonga Ngomo}},
    Booktitle = {14th International Semantic Web Conference (ISWC 2015), 11-15 October 2015, Bethlehem, Pennsylvania, USA (Semantic Web Challenge Proceedings)},
    Year = {2015},
    Editor = {Sean Bechhofer and Kostis Kyzirakos},
    Note = {Semantic Web Challenge, International Semantic Web Conference 2015},
    Abstract = {The curation of a knowledge base is a key task for ensuring the correctness and traceability of the knowledge provided in the said knowledge. This task is often carried out manually by human curators, who attempt to provide reliable facts and their respective sources in a three-step process: issuing appropriate keyword queries for the fact to check using standard search engines, retrieving potentially relevant documents and screening those documents for relevant content. However, this process is very time-consuming, mainly due to the human curators having to scrutinize the web pages retrieved by search engines. This demo paper demonstrate the RESTful implementation for DeFacto (Deep Fact Validation) ???????? an approach able to validate facts in RDF by finding trustworthy sources for them on the Web. DeFacto aims to support the validation of facts by supplying the user with (1) relevant excerpts of web pages as well as (2) useful additional information including (3) a score for the confidence DeFacto has in the correctness of the input fact. To achieve this goal, DeFacto collects and combines evidence from web pages written in several languages. We also provide an extension for finding similar resources obtained from the Linked Data, using the sameas.org service as backend. In addition, DeFacto provides support for facts with a temporal scope, i.e., it can estimate the time frame within which a fact was valid.},
    Bdsk-url-1 = {http://jens-lehmann.org/files/2015/swc_defacto.pdf},
    Keywords = {defacto ngonga esteves aksw 2015 lehmann speck rene},
    Url = {http://jens-lehmann.org/files/2015/swc_defacto.pdf}
    }

  • E. Marx, T. Soru, D. Esteves, A. Ngonga Ngomo, and J. Lehmann, “An Open Question Answering Framework,” in The 14th international semantic web conference, posters & demonstrations track, 2015.
    [BibTeX]
    @InProceedings{openqa2015,
    Title = {An {O}pen {Q}uestion {A}nswering {F}ramework},
    Author = {Edgard Marx and Tommaso Soru and Diego Esteves and Axel-Cyrille {Ngonga Ngomo} and Jens Lehmann},
    Booktitle = {The 14th International Semantic Web Conference, Posters \& Demonstrations Track},
    Year = {2015},
    Keywords = {SIMBA group_aksw marx ngonga smart lehmann openqa esteves mole soru 2015},
    Owner = {marx}
    }

  • D. Gerber, D. Esteves, J. Lehmann, L. Bühmann, R. Usbeck, A. Ngonga Ngomo, and R. Speck, “Defacto – temporal and multilingual deep fact validation,” Web semantics: science, services and agents on the world wide web, 2015.
    [BibTeX] [Abstract] [Download PDF]
    One of the main tasks when creating and maintaining knowledge bases is to validate facts and provide sources for them in order to ensure correctness and traceability of the provided knowledge. So far, this task is often addressed by human curators in a three-step process: issuing appropriate keyword queries for the statement to check using standard search engines, retrieving potentially relevant documents and screening those documents for relevant content. The drawbacks of this process are manifold. Most importantly, it is very time-consuming as the experts have to carry out several search processes and must often read several documents.In this article, we present DeFacto (Deep Fact Validation) – an algorithm able to validate facts by finding trustworthy sources for them on the Web. DeFacto aims to provide an effective way of validating facts by supplying the user with relevant excerpts of web pages as well as useful additional information including a score for the confidence DeFacto has in the correctness of the input fact. To achieve this goal, DeFacto collects and combines evidence from web pages written in several languages. In addition, DeFacto provides support for facts with a temporal scope, i.e., it can estimate in which time frame a fact was valid. Given that the automatic evaluation of facts has not been paid much attention to so far, generic benchmarks for evaluating these frameworks were not previously available. We thus also present a generic evaluation framework for fact checking and make it publicly available.

    @Article{gerber2015,
    Title = {DeFacto - Temporal and Multilingual Deep Fact Validation},
    Author = {Daniel Gerber and Diego Esteves and Jens Lehmann and Lorenz B{\"u}hmann and Ricardo Usbeck and Axel-Cyrille {Ngonga Ngomo} and Ren{\'e} Speck},
    Journal = {Web Semantics: Science, Services and Agents on the World Wide Web},
    Year = {2015},
    Abstract = {One of the main tasks when creating and maintaining knowledge bases is to validate facts and provide sources for them in order to ensure correctness and traceability of the provided knowledge. So far, this task is often addressed by human curators in a three-step process: issuing appropriate keyword queries for the statement to check using standard search engines, retrieving potentially relevant documents and screening those documents for relevant content. The drawbacks of this process are manifold. Most importantly, it is very time-consuming as the experts have to carry out several search processes and must often read several documents.In this article, we present DeFacto (Deep Fact Validation) - an algorithm able to validate facts by finding trustworthy sources for them on the Web. DeFacto aims to provide an effective way of validating facts by supplying the user with relevant excerpts of web pages as well as useful additional information including a score for the confidence DeFacto has in the correctness of the input fact. To achieve this goal, DeFacto collects and combines evidence from web pages written in several languages. In addition, DeFacto provides support for facts with a temporal scope, i.e., it can estimate in which time frame a fact was valid. Given that the automatic evaluation of facts has not been paid much attention to so far, generic benchmarks for evaluating these frameworks were not previously available. We thus also present a generic evaluation framework for fact checking and make it publicly available.},
    Bdsk-url-1 = {http://svn.aksw.org/papers/2015/JWS_DeFacto/public.pdf},
    Keywords = {2015 group_aksw simba diesel defacto MOLE sys:relevantFor:infai sys:relevantFor:bis sys:relevantFor:geoknow lehmann esteves gerber usbeck speck ngonga geoknow buehmann},
    Url = {http://svn.aksw.org/papers/2015/JWS_DeFacto/public.pdf}
    }

  • D. Esteves, D. Moussallem, C. B. Neto, T. Soru, R. Usbeck, M. Ackermann, and J. Lehmann, “Mex vocabulary: a lightweight interchange format for machine learning experiments,” in 11th international conference on semantic systems (semantics 2015), 15-17 september 2015, vienna, austria, 2015.
    [BibTeX] [Abstract] [Download PDF]
    Over the last decades many machine learning experiments have been published, giving benefit to the scientific progress. In order to compare machine-learning experiment results with each other and collaborate positively, they need to be performed thoroughly on the same computing environment, using the same sample datasets and algorithm configurations. Besides this, practical experience shows that scientists and engineers tend to have large output data in their experiments, which is both difficult to analyze and archive properly without provenance metadata. However, the Linked Data community still misses a light-weight specification for interchanging machine-learning metadata over different architectures to achieve a higher level of interoperability. In this paper, we address this gap by presenting a novel vocabulary dubbed MEX. We show that MEX provides a prompt method to describe experiments with a special focus on data provenance and fulfills the requirements for a long-term maintenance.

    @InProceedings{estevesMEX2015,
    Title = {MEX Vocabulary: A Lightweight Interchange Format for Machine Learning Experiments},
    Author = {Diego Esteves and Diego Moussallem and Ciro Baron Neto and Tommaso Soru and Ricardo Usbeck and Markus Ackermann and Jens Lehmann},
    Booktitle = {11th International Conference on Semantic Systems (SEMANTiCS 2015), 15-17 September 2015, Vienna, Austria},
    Year = {2015},
    Abstract = {Over the last decades many machine learning experiments have been published, giving benefit to the scientific progress. In order to compare machine-learning experiment results with each other and collaborate positively, they need to be performed thoroughly on the same computing environment, using the same sample datasets and algorithm configurations. Besides this, practical experience shows that scientists and engineers tend to have large output data in their experiments, which is both difficult to analyze and archive properly without provenance metadata. However, the Linked Data community still misses a light-weight specification for interchanging machine-learning metadata over different architectures to achieve a higher level of interoperability. In this paper, we address this gap by presenting a novel vocabulary dubbed MEX. We show that MEX provides a prompt method to describe experiments with a special focus on data provenance and fulfills the requirements for a long-term maintenance.},
    Bdsk-url-1 = {http://svn.aksw.org/papers/2015/SEMANTICS_MEX/public.pdf},
    Keywords = {mex simba 2015 sys:relevantFor:infai sys:relevantFor:bis aligned esteves baron usbeck group_aksw lehmann mole soru neto ackermann mack moussallem MOLE aligned-project},
    Url = {http://svn.aksw.org/papers/2015/SEMANTICS_MEX/public.pdf}
    }

  • D. Esteves, D. Moussallem, C. B. Neto, J. Lehmann, M. C. Cavalcanti, and J. C. Duarte, “Interoperable machine learning metadata using mex.,” in 14th international semantic web conference (iswc 2015), 11-15 october 2015, bethlehem, pennsylvania, usa (posters & demos), 2015.
    [BibTeX] [Download PDF]
    @InProceedings{estevesMNLCD15,
    Title = {Interoperable Machine Learning Metadata using MEX.},
    Author = {Diego Esteves and Diego Moussallem and Ciro Baron Neto and Jens Lehmann and Maria Claudia Cavalcanti and Julio Cesar Duarte},
    Booktitle = {14th International Semantic Web Conference (ISWC 2015), 11-15 October 2015, Bethlehem, Pennsylvania, USA (Posters \& Demos)},
    Year = {2015},
    Editor = {Serena Villata and Jeff Z. Pan and Mauro Dragoni},
    Publisher = {CEUR-WS.org},
    Series = {CEUR Workshop Proceedings},
    Volume = {1486},
    Bdsk-url-1 = {http://ceur-ws.org/Vol-1486/paper_102.pdf},
    Biburl = {http://www.bibsonomy.org/bibtex/291927b04e3cd969e894a6c93fd05af57/dblp},
    Crossref = {conf/semweb/2015p},
    Keywords = {mex esteves aksw dblp 2015 baron neto lehmann moussallem},
    Timestamp = {2015-12-24T12:18:02.000+0100},
    Url = {http://dblp.uni-trier.de/db/conf/semweb/iswc2015p.html#EstevesMNLCD15}
    }

2014

  • D. Esteves, “Prediction of asset trends in financial series using machine learning algorithms,” Master Thesis, Praça General Tibúrcio, 80 – Praia Vermelha, Rio de Janeiro – RJ, Brazil, 2014.
    [BibTeX] [Download PDF]
    @MastersThesis{esteves2014,
    Title = {Prediction of Asset Trends in Financial Series Using Machine Learning Algorithms},
    Author = {Diego Esteves},
    School = {Military Institute of Engineering / Brazilian Army},
    Year = {2014},
    Address = {Pra\c{c}a General Tib\'{u}rcio, 80 - Praia Vermelha, Rio de Janeiro - RJ, Brazil},
    Month = {6},
    Bdsk-url-1 = {http://www.comp.ime.eb.br/pos/images/repositorio-dissertacoes/2014-Diego_Esteves.pdf},
    Keywords = {2014 esteves stock market assets time series},
    Url = {http://www.comp.ime.eb.br/pos/images/repositorio-dissertacoes/2014-Diego_Esteves.pdf}
    }