Instructions for Reproducing/Running ENT Rank

Installation

Get the ENT-rank code and follow the installation instructions given.

Check out the tag sigir19-ent-rank to get the code version that is used in the SIGIR paper, or the latest version from master for a version with potential bug fixes.

Download dataset

From TREC CAR dataset from http://trec-car.cs.unh.edu/datareleases/v2.1 download and unpack the following files

Get the the DBpedia V2 extension

Prerequisites for Running ENT Rank

Download input runs

The ENT Rank code creates features from (unsupervised) input rankings created with a large range of different retrieval models.

The ENT rank code reads any input runs in TREC Run file format – feel free to use your own!

Download the TREC Run files used to produce the experiments of the SIGIR paper here.

Alternatively you can re-produce the input runs. See instructions Alternative: Reproduce input runs.

Extract Edge Contexts

Extract paragraph/page/section contexts for edges from the $paragraphCbor and $allButBenchmarkCbor.

Or download edgeContexts from main page.

Running ENT Rank

ENT Rank code proceeds in different phases: training, testing, visualization (graphvis) each described in detail below.

Each of the ENT Rank commands creates features from input ranking files. These are passed in on the command line through multiple “–grid-run” arguments, which each specifies the semantics of the run toether with the $runFileName (containing the corresponding ranking).

--grid-run "${queryModel} ${retrievalModel} ${expansionModel} ${indexType} ${contextType} ${runFileName}"

Options for

With ${runFileList} we refer to all the list of “–grid-run” options.

Training ENT Rank

To train an ENT Rank model use this command:

Description of different parts of the command line:

$confExtraParams configures individual experiments as follows:

The options “-N20 -A64M -qn” are instructing the run time system to use 20 threads and with 64 megabyte initial memory for each thread and 6 garbage collection threads. We found this configuration to work well on a 50 CPU, 1TB memory machine. Consult the documentation of the Haskell run time system for details.

Testing ENT Rank (predicting a ranking)

To predict a ranking using a pre-trained model use this command:

The valid command line options for the test command are a subset of the train command, see description above.

Output Files from training and testing

Training without cross validation will produce these files

Testing on a pretrained model will produce these files - ${outputFilePattern}-model-predict.run – test ranking, but we chose a different name to not confuse it with k-fold CV experiments

Trainig with cross-validation will produce a set of files like these (each for for $fold in 0-4):

All experiments in the paper conducted on benchmarkY1train and dbpediaV2 are produced with the cross-validation’s test ranking.

All other experiments in the paper conducted by training on benchmarkY1train (trained without cross validation) and are predicted on the test benchmark.

Graphviz visualization

To produce the graphvis visualization, use this command:

The valid command line options for the test command are a subset of the train command, see description above.

$graphvisConf required to set visualization for one particular example for which the candidate graph between two entities is to be visualized

--query $queryId --graphviz-target-entity $targetEntityId --graphviz-source-entity $sourceEntityId --graphviz-path-restriction $numEdges

where

In the paper we used

--query enwiki:Zika%20fever --graphviz-target-entity enwiki:South%20America --graphviz-source-entity enwiki:Zika%20fever --graphviz-path-restriction 2```

Evaluation

We evaluate the runs with trec_eval using the qrels provided by in the data releases.

We use minir-plots to compute mean results with standard errors, produce paired-t-test results and create plots.

  1. Extract Edge contexts (or download “edgeContexts” archive)
  2. Create or download input runs
  3. Create train data
    • Run train command with --exp AllExp --do-write-train-data True --do-train-model False
  4. For each feature subset, train models with 5-fold cv
    • Run train command with -exp $featureSet --exp AllExp --do-write-train-data False --include-cv True
  5. To train on one benchmark and test on another benchmark (or to evaluate a pre-trained model)
    • Run train command with -exp $featureSet --exp AllExp --do-write-train-data False --include-cv False (or with --include-cv True)
    • Identify the trained model file which is indicated by the file suffix model-train.json
    • Run test command with --test-model ${trainedModelFile}
  6. Evaluate rankings
    • Identify the run files which end with .*-test.run for crossvalidation results and when testing on a different benchmark.
    • Identify the right qrels file for this run (matching benchmark, page-level vs section-level, passage vs entity, auto vs manual)
    • Run trec_eval -q -c -m map -m Rprec -m ndcg -m ndcg_cut.10 ${qrels} ${run} and store result in a file (we call it below ${evalFile})
    • Important: Use -c or your evaluation results are wrong !!!
  7. Plot and analyze the results with minir-plots
    • for all metrics in map Rprec ndcg ndcg_cut_10:
    • Plot and create table with standard errors: nix run -f $minirPlotsDirectory -c minir-column ${evalFile} --metric ${metric} --format trec_eval --out ${pdfFileName} >| ${tableFileName}
    • Paired-t-test: nix run -f $minirPlotsDirectory -c minir-pairttest ${baselinerun} ${evalFile} --metric ${metric} --format trec_eval >| ${pairedTtestFileName}
    • Hurts/helps analysis: nix run -f $minirPlotsDirectory -c minir-hurtshelps ${baselinerun} ${evalFile} --metric ${metric} --format trec_eval --delta 0.1 >| ${hurtshelpsFileName}