T1: Cross-lingual Entity Discovery and Linking


讲者:季姮(Rensselaer Polytechnic Institute)


Abstract:Cross-lingual Entity Discovery and Linking (EDL) (Ji et al., 2014) is the task of extracting entity mentions from foreign language texts and linking them to an external English knowledge base (KB). Beyond the motivation that drives the mono-lingual English EDL task – knowledge acquisition and information extraction – in the crosslingual case and especially when dealing with low resource languages, the hope is to provide improved natural language understanding capabilities for the many languages for which we have few linguistic resources and annotation and no machine translation technology. The LoreHLT2016-2018 evaluations and recent NIST TAC-KBP EDL tasks target really low-resource languages like Northern Sotho or Kikuyu which only have about hundreds of Wikipedia pages. The primary goals of this tutorial are to review the framework of cross-lingual EDL and motivate it as a broad paradigm for the Information Extraction task. We will start by discussing the traditional EDL techniques and metrics and address questions relevant to the adequacy of these to across domains and languages. We will then present more recent approaches such as Neural EDL, discuss the basic building blocks of a state-of-the-art neural EDL system. In particular, we will discuss and compare multiple methods that make use of multi-lingual common semantic space construction and cross-lingual transfer learning. The tutorial will be useful for both senior and junior researchers (in academia and industry) with interests in cross-source information extraction and linking, knowledge acquisition, and the use of acquired knowledge in natural language processing and information extraction. We will try to provide a concise road-map of recent approaches, perspectives, and results, as well as point to some of our state-of-the-art EDL data sets, resources and systems that are available to the research community.


Brief introduction: Heng Ji Edward P. Hamilton Chair Professor in Computer Science Department of Rensselaer Polytechnic Institute. She received her B.A. and M. A. in Computational Linguistics from Tsinghua University, and her M.S. and Ph.D. in Computer Science from New York University. Her research interests focus on Natural Language Processing, especially on Information Extraction and Knowledge Base Population. She was selected as "Young Scientist" and a member of the Global Future Council on the Future of Computing by the World Economic Forum in 2016 and 2017. She received "AI's 10 to Watch" Award by IEEE Intelligent Systems in 2013, NSF CAREER award in 2009, Google Research Awards in 2009 and 2014, Sloan Junior Faculty Award in 2012, IBM Watson Faculty Award in 2012 and 2014, Bosch Research Awards in 2015, 2016 and 2017. She coordinated the NIST TAC Knowledge Base Population task since 2010, served as the Program Committee Chair of NAACL2018, NLP-NABD2018, NLPCC2015 and CSCKG2016, ACL2017 Demo Co-Chair, the Information Extraction area chair for NAACL2012, ACL2013, EMNLP2013, NLPCC2014, EMNLP2015, NAACL2016 and ACL2016, the vice Program Committee Chair for IEEE/WIC/ACM WI2013 and CCL2015, Content Analysis Track Chair of WWW2015, and the Financial Chair of IJCAI2016.