Evaluation Results on Entity-aspect-linking-2020

See Main Page for details and downloads.

Experimental Setup

Small/Test

Train on Train-Small and predict on Test.

Small/Nanni-Test

Train on Train-Small and predict on Nanni-Test.

Small/Nanni’s 201

Train on Train-Small and predict on Nanni’s 201.

Nanni’s 201-CV

5-fold cross-validation on Nanni’s 201. (Original evaluation protocol of Nanni et al. (2018))

Remaining/Test

Train on Train-Remaining and predict on Test.

Remaining/Nanni-Test

Train on Train-Remaining and predict on Nanni-Test.

Remaining/Nanni’s 201

Train on Train-Remaining and predict on Nanni’s 201.

Results

We report results separately for features derived from sentence and paragraph contexts.

Evaluation results using train-small and nanni-200. Significance is analyzed with a standard error overlap test: below standard error, above standard error.

     Paragraph Context      Sentence Context
Small/Test P@1 MAP ndcg@20 P@1 MAP ndcg@20
Rank-lips 0.582±0.007 0.746±0.004 0.810±0.003 0.623±0.007 0.771±0.004 0.828±0.003
RankLib 0.576±0.007 0.740±0.004 0.804±0.003 0.614±0.007 0.765±0.004 0.824±0.003
Small/Nanni-Test
Rank-lips 0.601±0.004 0.755±0.002 0.816±0.002 0.664±0.003 0.802±0.002 0.851±0.002
RankLib 0.594±0.004 0.751±0.002 0.813±0.002 0.668±0.003 0.806±0.002 0.855±0.002
Small/Nanni’s 201
Rank-lips 0.617±0.034 0.762±0.022 0.821±0.017 0.657±0.033 0.784±0.022 0.836±0.017
RankLib 0.632±0.034 0.779±0.021 0.835±0.015 0.677±0.033 0.796±0.021 0.845±0.016
Nanni’s 201-CV
Rank-lips 0.647±0.034 0.780±0.022 0.835±0.017 0.667±0.033 0.785±0.022 0.837±0.017
RankLib 0.602±0.034 0.747±0.022 0.817±0.017 0.612±0.034 0.765±0.022 0.824±0.016
Nanni et al 0.637±0.034 0.777±0.021 0.833±0.016 0.667±0.034 0.790±0.022 0.842±0.016
Remaining/Test
Rank-lips 0.587±0.006 0.751±0.004 0.813±0.003 0.628±0.006 0.774±0.004 0.831±0.003
Remaining/Nanni-Test
Rank-lips 0.604±0.004 0.758±0.002 0.818±0.002 0.697±0.003 0.822±0.002 0.867±0.002
Remaining/Nanni’s 201
Rank-lips 0.626±0.034 0.771±0.022 0.828±0.016 0.682±0.033 0.797±0.022 0.846±0.017

If you obtain new results on this dataset, we want to hear about your results and would be honored to include it in the table below.

Baseline

The baseline uses list-wise learning to rank to combine the following features.

All features are based on word/entity similarities between context and (parts of) an aspect.

The following similarities are used. We exclude Nanni’s RDF2Vec feature since it is difficult to produce and does not perform well.

BM25:

using context as query and aspect part as document, use BM25 with default parameters as a ranking model.1

TFIDF:

cosine tf-idf score between context and aspect part. We use tf-idf variant with tf log normalization and smoothed inverse document frequency.

OVERLAP:

number of unique words/entities shared between context and aspect part (no normalization).

W2VEC:

Word embedding similarity between context and aspect part. Word vectors are weighted by their TF-IDF weight. The pretrained word embeddings were taken from word2vec-slim, a reduced version of Google News word2vec model.2

Features combinations:

context aspect part BM25 TFIDF Overlap W2Vec
sentence words name words X X X
paragraph words name words X X X
sentence words content words X X X X
paragraph words content words X X X X
sentence entities content entities X X X
paragraph entities content entities X X X

Creative Commons License
Entity-aspect-linking-2020 by Jordan Ramsdell, Laura Dietz is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at http://trec-car.cs.unh.edu/datareleases/v2.4-release.html, work at www.wikipedia.org, and on a work at https://federiconanni.com/entity-aspect-linking/.


  1. We provide corpus statistics in our dataset.↩︎

  2. Available at https://github.com/eyaler/word2vec-slim↩︎