A Large Test Collection for Entity Aspect Linking

Authors: Jordan Ramsdell and Laura Dietz.

The test collection and all associated data are released under a Creative Commons Attribution-ShareAlike 4.0 International License. Creative Commons License.

We provide a large-scale dataset for training and evaluating methods for entity aspect linking (EAL), a fine-grained variation on entity linking that discerns which particular aspect of an entity is mentioned in the context.

Entity Aspect Linking (EAL) Task.

Building on the definition of Nanni et al. (2018), we formalize the task as a refinement of entity linking as follows:

For every entity a catalog of candidate aspects to be available for each entity. List Nanni et al. we construct the aspect catalog from the top-level sections of an entity’s Wikipedia pages, where each section represents one aspect. Administrative sections without topical nature such as "References" or "See Also" are excluded from the aspect catalog.

Example

See EAL Instances for target entity Oyster.

Results

See Results for reference results. (Please send us your results too!)

Download Entity-aspect-linking-2020

entity-aspect-linking-2020.tar.xz
5.5G contains the 2020 dataset, re-release of Nanni’s 201 dataset, and baseline implementation with features, run files, and evaluation results.

See README.mkd for detailed dataset description.

See data model of test collection for questions about the JSON-L format.

wiki2020-aspect-catalog.jsonl.gz
Aspect catalog for all wiki2020 entities (using the same filter criteria as for the entity-aspect link examples).

Our dataset is derived from an English Wikipedia dump from 01/01/2020 offered by TREC Complex Answer Retrieval track v2.4 release, which exposes section and hyperlink information in a machine-readable format.

We also provide a converted re-release of the original dataset of Nanni et al. (2018) in our data schema, as nanni-201.

Acknowledgement

This material is based upon work supported by the National Science Foundation under Grant No. 1846017. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

License

Creative Commons License
Entity-aspect-linking-2020 by Jordan Ramsdell, Laura Dietz is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at http://trec-car.cs.unh.edu/datareleases/v2.4-release.html, work at www.wikipedia.org, and on a work at https://federiconanni.com/entity-aspect-linking/.