Prof. Dr.-Ing. Laura Dietz

Table of Contents


I am assistant professor at the department of Computer Science at UNH with a focus on text-based machine learning and information retrieval as well as data science on watersheds.

Previously I was a Post-doctoral Research Scientist at the Data and Web Science Group of Mannheim University, working with Prof. Simone Paolo Ponzetto. Before that I was a Research Scientist at the Center for Intelligent Information Retrieval (CIIR) working with Bruce Croft at University of Massachusetts. Before that I did a post-doc with Andrew McCallum. I graduated from Max Planck Insititute for Informatics in Saarbruecken, Germany in January 2011.


In Fall 2020 I will be teaching a new course: Machine Learning for Sequences and Text.

The topic is on machine learning models that learn from data with
sequential nature (think: text, genomes etc). Covering Conditional
Random Fields as well as neural networks Bi-LSTMs and transformers.

The course is designed to be orthogonal for Marek Petrik's "Machine Learning"
class. (Which is NOT a prereq) The only overlap is logistic regression,
which the new course will cover at a different level.

The course is temporarily listed as CS780 (03) and CS880 (03). It will
serve as an alternative prereq for the implementation-intensive data
science seminar CS953 in Spring 2021.

Catalog entry for CS780 (03):

The course catalog lists a recitation (F), which is a mistake: There is
no separate recitation in this course.

See teaching page for more information about my courses offered at UNH

Computational Social Science Summer School Lecture 2018 on “What is in my documents?”.

Conference tutorial on Utilizing Knowledge Graphs for Text-centric Retrieval at ICTIR 2016, WSDM 2017, SIGIR 2017. (Literature Overview).

Research Grants


Since 2020, I serve on the Steering Committe of the Northeast Big Data Innovation Hub.

Since 2020, I am a member of the NIST TAC-KBP Scientific Advisory Board

Since 2019, I am Associate Editor for the ACM Transactions on Information Systems (TOIS)

I am a founding co-chair of the Women in IR network of women in information retrieval. (also see Women in IR)

From 2016 to 2019 I was the ACM SIGIR Student Affairs Chair.

I was giving a keynote at ECIR in 2017 on Retrieving Knowledge from the Webs.

I am coordinating the TREC Complex Answer Retrieval track at the Text REtrieval Conference.

I am organizing the KG4IR workshop Series at SIGIR 2017 and 2018 and a presenter of the conference tutorial for Utilizing Knowledge Graphs for Text-centric Retrieval. I am guest-editing the KG4IR Special Issue of the Information Retrieval Journal.

I received a best paper award at JCDL 2018 for our work on Entity Aspect Linking.

Together with Adam Wymore, I am a co-PI on the UNH CoRE Pilot Research Partnership on “Watershed Informatics: Integrating Big Data to Understand Watersheds in a Changing World”.

I am reviewing for different venues ranging from information retrieval (SIGIR, CIKM, ICTIR), natural language processing (ACL, ACL, KDD, EMNLP, NAACL), machine learning (ICML, NIPS, UAI), and data mining (KDD, CIKM).

Research Interests

I am interested in Text Retrieval, Extraction, Machine Learning, and Analysis (TREMA).

I also cover research areas Biomedical NLP/IR, Question Answering and Answer-Passage Retrieval, Topic Models for Graph Structured Data

My research is placed in the intersection between Information Retrieval and Information Extraction, where I am striving towards a deep integration rather than a pipelined combination. My tool of choice are graphical models, often generative probabilistic models. This pattern underlies all the different facets of my research, where some are detailed in the following:

Complex Answer Retrieval

From 2017-2019, I coordinated the Complex Answer Retrieval track at the Text Retrieval Conference (TREC). It is an international evaluation track on how can retrieve the most best passages and and entities on topics about popular science and society. For more information about the data, task and evaluation, please see the official TREC Complex Answer Retrieval site.

Track overview papers:

Automatic Wikipedia Construction

Together with my students I am working on methods to automatically, and in a query-driven manner, retrieve materials from the Web and compose Wikipedia-like articles. Especially for information needs, where the user has very little prior expert knowledge about, the web search paradigm of 10 blueToe hyperlinks is not sufficient. Instead we envision to provide a synthesis of the Web materials that strives to mimick the comprehensiveness of Wikipedia articles. We limit ourselves to a content-only setting where query-log, click, or session information is not available. Consequently, we aim to maximize the utility of information retrieval models in combination with methods from natural language processing. A particular emphasis is to utilize information from structured knowledge resources such as Wikipedia, Freebase, or DBpedia together with text-based reasoning on general document and Web corpora.

An early feasibility study was presented at AKBC 2014, a later demo presented at the ESAIR workshop at CIKM 2015 (demo). The method paper for the demo is under submission (information available on request).

Closely related work on reranking entities for web queries was presented at CIKM 2015 (appendix) as well as work on using relation extraction in information retrieval presented at [ECIR 2016 (supervised relations)][relevant-relation-ecir2016 and SIGIR 2017 (OpenIE)

The project was awarded with an Amazon AWS in education research grant and a stipdend by the Eliteprogramm for Postdoktorandinnen und Postdoktoranden of the Baden-Württemberg Stiftung.

Entity-based Enrichment for Document Retrieval

Together with Jeff Dalton, I am studying how to effectively leverage Knowledge Bases such as Wikipedia and Freebase in ad hoc document retrieval. In a first step, documents and queries are enriched with links to the knowledge base. During the retrieval stage, these links can be used as an additional vocabulary as well as in feedback-based query expansions. For instance entities that are linked from the query are expected to also be linked in relevant documents. However, we may compensate for errors in the entity linking stage by also considering terms from the entities’ article text, as well as name variants. An additional option are feedback methods, where documents retrieved in a preliminary pass are inspected for entity links to update the belief on which entities are relevant for the query. We also use the feedback documents to build an entity-context model to understand how each entity is related to the query.

This work was presented at SIGIR 2014.

With Federico Nanni, I am working on building document collections for events. We found that entity links are too unspecific, as the same entity can be mentioned in different contexts (we call them entity aspects). In our JCDL 18 paper on entity aspect linking, we demonstrated that such aspects can be harvested section headings of the entity’s Wikipedia article. To post-process entity links, we propose a method for entity-aspect linking to refine the entity link with aspect information. When applied to retrieval problems, aspect linking improved the accuracy of rankings and classifications. This work received a best paper award at JCDL 2018.

Entity Linking

Entity linking refers to a problem setting where the algorithm is given a string in a document and has to predict which Wikipedia entity it refers to. Our solution involved a retrieval model that incorporates the string itself, and surrounding entity mentions to predict entity candidates as a ranking. We show that this model is an approximation to state-of-the-art models which optimize a joint assignment of mentions to entities. This solution can be further refined with supervised re-rankers but also provides reasonable performance “out-of-the-box”.

We participate with this solution in TAC KBP 2012 and TAC KBP 2013 (talkposter). Also see our publication at OAIR 2012 (general-talktech-talk).

The code is available as part of the KB-Bridge project.

Entity Tracking and Retrieval

In order to monitor a stream of news and social documents for stories involving one or more target entities. We tap on symmetric relationships in our Entity Linking approach both retrieve relevant documents (KB to text) and entity link them (text to KB) with the same underlying model. This requires to integrate low-level NLP algorithms into a retrieval framework.

We participate with this solution in TREC KBA 2012 and TREC KBA 2013. A paper on time-aware IR-based evaluation is published at [TAIA 2013] (streameval/index.html). The time-aware evaluation methods are used to analyze our KBA 2013 results with results presented at in our 2013 talk at TREC.

… and more …

I further work on “senti-PRF”, a pseudo relevance feedback approach to optimize retrieval for opinionated questions. Published at CIKM 2013.

I am still interested in unsupervised algorithms for identifying shared aspects and quantifying influence in social networks. Work on symmetric networks is published at ICWSM 2012 ( Code & Supplement ) and asymmetric networks at ICML 2007 (talkSupplement).

Other work revolved around localizing bugs in software, published at NIPS 2009. (supplementproject page)

Further, I am working on a scalable MCMC inference framework “bayes-stack”, available on GitHub.

My PhD thesis was mainly focused on topic models and other generative models for data with link structure.


PhD Students

Master Students

Undergrad Students

Serving on PhD Committees

Organized Mini-Conferences

The vision of HIPstIR is that early stage information retrieval (IR) researchers get togetherto develop a future for non-mainstream ideas and research agendas inIR. Important priorresearch can be discussed in the form of reading groups. A future vision of what IR can (orshould) be—and how to get there—must be developed. It is like SWIRL (Moffat et al., 2005,Allan et al., 2012, Culpepper et al., 2018) in spirit but focusing ontopics that may otherwisebe considered “niche”, “alternative”, “indie”, or “left field”. An explicit goal of this workshop isto foment collaboration and cross-group fertilization. The hopeis that participation will giverise to conference workshop topics and joint paper projects. Primaryfocus is on early stageresearchers that are anywhere between defending their PhD within one year to one year intobeing a tenured professor or a senior scientist, but few senior people may also be invited

We hope more folks will branch off and organize HIPstIR’s all over the place. HIPstIR is public domain / CC0.

Organized Shared Tasks

Laura Dietz, Ben Gamari, Jeff Dalton, Manisha Verma, Prasenjit Mitra, Nick Craswell. TREC Complex Answer Retrieval at the Text REtrieval Conference. 2016–2018. - www - dataset - Mailinglist - TREC homepage

TREC CAR concluded in 2019. Thanks to all the participants! – Dear Reviewers: Please keep on mind that TREC CAR offered multiple tasks whose numbers are not comparable.

Organized Workshops, Keynotes, and Tutorials

Women in IR

Women in IR Mailinglist

Invited Talks


Recent Positions

Since August 2016: Assistant Professor (tenure-track) in the Computer Science Department at University of New Hampshire. Head of the TREMA lab.

March 2015 - August 2016: Post-doctoral Research Scientist at Data and Web Science Group (DWS), Mannheim University (DWS, Simone Paolo Ponzetto)

August 2012 - March 2015: Research Scientist at Center for Intelligent Information Retrieval (CIIR), University of Massachusetts (CIIR, Bruce Croft)

October 2010 - August 2012: Post-doctoral researcher at University of Massachusetts (IESL, Andrew McCallum).

January 2008 - January 2011: PhD Student at Max-Planck-Institute for Informatics (Databases and Information Systems, Prof Gerhard Weikum), Saarbruecken

January 2007 - December 2008: PhD Student at Max-Planck-Institute for Informatics (Machine Learning, Prof. Tobias Scheffer), Saarbruecken

October 2006 - December 2006: PhD Scholarship at Knowledge Management Group (Prof. Tobias Scheffer), Humboldt University, Berlin

December 2002 - September 2006: Research Associate at Concert Division and I-Info Division, Fraunhofer Institute for Publication and Information Systems (IPSI), Darmstadt

Open Source Releases