Reproducing Input Run Files for “ENT Rank”

Alternative: Reproduce input runs

For this paper, runs were produced with a custom add-on to Lucene found in the “trec-car-methods” repository (using the tag “sigir-ent-rank”; also available as source release)

URL: https://github.com/laura-dietz/trec-car-methods/tree/sigir19-ent-rank

git clone git@github.com:laura-dietz/trec-car-methods.git
git checkout sigir19-ent-rank
cd trec-car-methods

Compile with maven3 and create one single jar

mvn compile assembly:single

You find the jar file in ./target/trec-car-methods-${version}-jar-with-dependencies.jar

The trec-car-methods code runs in two phases: indexing and retrieving

Call the program with

java -cp $jar -XmX25g edu.unh.cs.lucene.TrecCarLuceneIndexer [...]
java -cp $jar -XmX25g edu.unh.cs.lucene.TrecCarLuceneQuery [...]

Indexing

Create an index for Paragraph, Page, Entity, Aspect by running above program these $indexerArguments (use the same $indexFile – it is a directory in which each of the different indexes will be places)

Retrieving

Create rankings with different indexes, different retrieval models, expansion models, and query models, use a command call like this

java -cp $jar -XmX25g edu.unh.cs.lucene.TrecCarLuceneQuery paragraph section run $queryCbor $indexDir $outputRunFileName $queryModel $retrievalModel $expansionModel $tokenizer $numResults $expansionRmDocuments $expansionTerms $expansionEcmDocuments Text

Each call will produce one run file which is used as input for ENT Rank.