NSF CAREER: Utilizing Fine-grained Knowledge Annotations in Text Understanding and Retrieval

Abstract

All members of our information society are constantly in need for quick access to knowledge for work, education, and personal interests. Often the necessary knowledge cannot be found on a single web page or article but deserves a longer, query-specific complex answer. Examples are questions about causes of political events, about nutritional benefits of chocolate, or about the economic viability of wind energy. However, today’s search engines that merely provide a list of multiple sources, but leave users on their own to synthesize sources into knowledge. This project develops novel algorithms that distill relevant key concepts, text passages, and relational information to provide users with a single summary of comprehensive information. The summary is structured into different sections, each covering a different facet of a complex topic. The focus of this project is to identify the relevant facts and connections that will enable users to form their own opinions and make strategic decisions. Embedded in a self-directed-learning environment, it allows users to learn about new topics at their own pace. Integrated educational activities will make use of the software to inspire STEM-interest among middle school students and undergrads of other disciplines.

Today’s algorithms fall short to identify relevant resources for complex topics, as these require knowledge about the world. Utilizing knowledge graphs for information retrieval has led to several important advances, such as the Entity Query Feature Expansion model (EQFE). This project takes a novel holistic approach towards developing representations of knowledge in a knowledge graph, corresponding text annotation algorithms, and retrieval algorithms that work hand-in-hand. The project focuses on three thrusts: (1) Entity aspect linking, which determines the topical context of entity mentions to facilitate a high-precision paragraph ranking. (2) Utilizing relation extractions for open domain information retrieval, where the presence of many non-relevant relations is explicitly addressed. (3) Selecting the query-specific subgraph of the knowledge graph that is suitable to identify relevant entities through long-range dependencies while avoiding concept drift. All three thrusts lead to fine-grained machine-understanding of relevant connections and aspects of entities, aligned through supporting passages that provide provenance for their relevance. The impact of all three thrusts extends beyond information retrieval: many applications that build on entity links will gain topical precision from entity aspect links. Most technology that utilizes relations as-is, will be improved when relational information is considered in context. Any method that extracts information from the knowledge graph structure will perform better when spurious edges are eliminated. Overall, this research effort will lay the foundation for identifying query-relevant complex information in natural large collections.

Project Goals

.. todo …

Broader Impacts

… todo …

Acknowledgement

This material is based upon work supported by the National Science Foundation under Grant No. 1846017.

Disclaimer

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.


Point of contact: Laura Dietz

This page was last updated December 2018.