I am assistant professor at the department of Computer Science at UNH with a focus on text-based machine learning and information retrieval.
Previously I was a Post-doctoral Research Scientist at the Data and Web Science Group of Mannheim University, working with Prof. Simone Paolo Ponzetto. Before that I was a Research Scientist at the Center for Intelligent Information Retrieval (CIIR) working with Bruce Croft at University of Massachusetts. Before that I did a post-doc with Andrew McCallum. I graduated from Max Planck Insititute for Informatics in Saarbruecken, Germany in January 2011.
Fall 2016: “CS 780/880 Topics/Information Retrieval”
I am reviewing for different venues ranging from information retrieval (SIGIR, CIKM, ICTIR), natural language processing (ACL, ACL, KDD, EMNLP, NAACL), machine learning (ICML, NIPS, UAI), and data mining (KDD, CIKM).
Since 2015, I am organizer of the SIGIR Student Party.
I am interested in Text Retrieval, Extraction, Machine Learning, and Analysis (TREMA).
I also cover research areas Biomedical NLP/IR, Question Answering and Answer-Passage Retrieval, Topic Models for Graph Structured Data
My research is placed in the intersection between Information Retrieval and Information Extraction, where I am striving towards a deep integration rather than a pipelined combination. My tool of choice are graphical models, often generative probabilistic models. This pattern underlies all the different facets of my research, where some are detailed in the following:
Together with Simone Paolo Ponzetto and Michael Schuhmacher, we are working on methods to automatically, and in a query-driven manner, retrieve materials from the Web and compose Wikipedia-like articles. Especially for information needs, where the user has very little prior expert knowledge about, the web search paradigm of 10 blue hyperlinks is not sufficient. Instead we envision to provide a synthesis of the Web materials that strives to mimick the comprehensiveness of Wikipedia articles. We limit ourselves to a content-only setting where query-log, click, or session information is not available. Consequently, we aim to maximize the utility of information retrieval models in combination with methods from natural language processing. A particular emphasis is to utilize information from structured knowledge resources such as Wikipedia, Freebase, or DBpedia together with text-based reasoning on general document and Web corpora.
An early feasibility study was presented at AKBC 2014, a later demo presented at the ESAIR workshop at CIKM 2015 (demo). The method paper for the demo is under submission (information available on request).
The project was awarded with an Amazon AWS in education research grant and a stipdend by the Eliteprogramm for Postdoktorandinnen und Postdoktoranden of the Baden-Württemberg Stiftung.
Together with Jeff Dalton, I am studying how to effectively leverage Knowledge Bases such as Wikipedia and Freebase in ad hoc document retrieval. In a first step, documents and queries are enriched with links to the knowledge base. During the retrieval stage, these links can be used as an additional vocabulary as well as in feedback-based query expansions. For instance entities that are linked from the query are expected to also be linked in relevant documents. However, we may compensate for errors in the entity linking stage by also considering terms from the entities’ article text, as well as name variants. An additional option are feedback methods, where documents retrieved in a preliminary pass are inspected for entity links to update the belief on which entities are relevant for the query. We also use the feedback documents to build an entity-context model to understand how each entity is related to the query.
This work was presented at SIGIR 2014.
Assuming the existence of a large corpus and a large general purpose knowledge base, we want to support a user to explore a question in terms three facets: entities, pertinent relationships and relevant text passages. We devise a solution that reasons about distributions over entities, relations, and documents in a unified manner. For instance, we can arrive at a prior distribution over entities by issueing a query against the knowledge base. The distribution over entities helps to identify relevant document passages. Applying Bayes-rule, we can update the distribution over entities, given retrieved document passages. This is formalized in a generative model, which includes factors comprising probabilistic retrieval models.
This work was presented at AKBC 2013.
Entity linking refers to a problem setting where the algorithm is given a string in a document and has to predict which Wikipedia entity it refers to. Our solution involved a retrieval model that incorporates the string itself, and surrounding entity mentions to predict entity candidates as a ranking. We show that this model is an approximation to state-of-the-art models which optimize a joint assignment of mentions to entities. This solution can be further refined with supervised re-rankers but also provides reasonable performance “out-of-the-box”.
The code is available as part of the KB-Bridge project.
In order to monitor a stream of news and social documents for stories involving one or more target entities. We tap on symmetric relationships in our Entity Linking approach both retrieve relevant documents (KB to text) and entity link them (text to KB) with the same underlying model. This requires to integrate low-level NLP algorithms into a retrieval framework.
We participate with this solution in TREC KBA 2012 and TREC KBA 2013. A paper on time-aware IR-based evaluation is published at [TAIA 2013] (streameval/index.html). The time-aware evaluation methods are used to analyze our KBA 2013 results with results presented at in our 2013 talk at TREC.
I further work on “senti-PRF”, a pseudo relevance feedback approach to optimize retrieval for opinionated questions. Published at CIKM 2013.
Relatedly, I am interested in “vague” Question Answering, such questions asking for opinions, advice, or research questions. Here I work both with general-purpose data sets and bio-medical question-answering.
I am still interested in unsupervised algorithms for identifying shared aspects and quantifying influence in social networks. Work on symmetric networks is published at ICWSM 2012 ( Code & Supplement ) and asymmetric networks at ICML 2007 (talk – Supplement).
My PhD thesis was mainly focused on topic models and other generative models for data with link structure.
(Bryan) Hang Zhang. Bryan is working on topic models and retrieval models for article construction.
Federico Nanni. Federico is working on consolidation, tracking, and summarization of historical events in text.
Lydia Weiland. Lydia is working on understanding the message of iconic images we often find in news articles.
… your name here? …
Amina Kadry (Masters Thesis): Using ClausIE, an unsupervised relation extraction to find good explanation of entity-relevance.
Thomas Stach (Masters Thesis): Wikipedia Reconstruction - How to compose Wikipedia articles out of text passages?
Team project (multiple students): Supporting large-scale meeting facilitation - automated clustering of participant-contributed ideas.
Shiri Dori-Hacohen, ongoing.
Jeffrey Dalton, Ph.D. 2014, now on the Google Knowledge Discovery team.
Weiland, Lydia; Hulpus, Ioana; Ponzetto, Simone Paolo; Dietz, Laura. Using Object Detection, NLP, and Knowledge Bases to Understand the Message of Images. Proceedings of the International Conference on Multimedia Modeling, 405-418, 2017.
Dietz, Laura; Kotov, Alexander; Meij, Edgar. Utilizing Knowledge Bases in Text-centric Information Retrieval. Proceedings of the 2016 ACM on International Conference on the Theory of Information Retrieval. 5-5. 2016.
Weiland, Lydia; Hulpus, Ioana; Ponzetto, Simone Paolo; Dietz, Laura. Understanding the message of images with knowledge base traversals. Proceedings of the 2016 ACM on International Conference on the Theory of Information Retrieval. 199-208. 2016.
Nanni, Federico; Dietz, Laura; Faralli, Stefano; Glavaš, Goran; Ponzetto, Simone Paolo. Capturing interdisciplinarity in academic abstracts. D-lib magazine, 22, 9/10, Corporation for National Research Initiatives. 2016.
Nanni, Federico; Dietz, Laura; Faralli, Stefano; Glavas, Goran, Ponzetto, Simone Paolo. Capturing Interdisciplinarity in Academic Abstracts. Workshop on Mining Scientific Publications at JCDL, 2016.
Kling, Christoph Carl; Posch, Lisa; Bleier, Arnim; Dietz, Laura. Topic model tutorial: A basic introduction on latent dirichlet allocation and extensions for web scientists. Proceedings of the 8th ACM Conference on Web Science. 10-10. 2016.
Schuhmacher, Michael; Roth, Benjamin; Ponzetto, Simone Paolo; Dietz, Laura. Finding Relevant Relations in Relevant Documents. In Proceedings of European Conference on Information Retrieval (ECIR), 2016. .pdf
Dietz, Laura. Query-specific Wikipedia Construction. Invited talk at ILPS at Amsterdam University, The Netherlands. November 5, 2015.
Dietz, Laura. Query-specific Wikipedia Construction and Network Topic-Models. Invited talk at GESIS Leibnitz-Institut fuer Sozialwissenschaften (Computational Social Science Seminar), Cologne, Germany. September 17, 2015.
Dietz, Laura; Schuhmacher, Michael. An Interface Sketch for Queripidia: Query-driven Knowledge Portfolios from the Web. Proceedings of the Workshop for Exploiting Semantic Annotations in IR (ESAIR) at CIKM, 2015. .pdf – demo
Schuhmacher, Michael; Dietz, Laura; Ponzetto, Simone Paolo. Ranking Entities for Web Queries Through Text and Knowledge. Proceedings of ACM International Conference on Information and Knowledge Management (CIKM), 2015. .pdf – appendix
Dietz, Laura. Network Topic Models. Invited talk at Heidelberg University (Stat NLP Colloquium). May 22, 2015.
Dietz, Laura; Schuhmacher, Michael; Ponzetto, Simone Paolo. Queripidia: Query-specific Wikipedia Construction. In Proceedings fo the Automatic Knowledge Base Construction Workshop (AKBC) at NIPS 2014. .pdf
Dietz, Laura. Entity Linking wth Document Retrieval and Vice Versa. Invited talk at University of Illinois, Urbana-Chamaign, USA. October 22, 2014.
Dalton, Jeffrey; Dietz, Laura; Allan, James: Entity Query Feature Expansion using Knowledge Base Links. In Proceedings of the 37th Annual International ACM SIGIR conference, Gold Coast, Queensland, Australia, July 6-11, 2014. .pdf – appendix
Dalton, Jeffrey; Dietz, Laura: UMass CIIR at TAC KBP 2013 Entity Linking: Query Expansion using Urban Dictionary. Text Analysis Conference (TAC), Gaithersburg, MD, USA, November 19-20, 2013. .pdf
Dietz, Laura; Dalton, Jeffrey: UMass at TREC 2013 Knowledge Base Acceleration Track: Bi-directional Entity Linking and Time-aware Evaluation. Text Retrieval Conference (TREC), Gaithersburg, MD, USA, November 20-22, 2013. .pdf
Dietz, Laura and Dalton, Jeffrey: Query-specific Knowledge Sketches: A Joint Retrieval Model for Text, Entities, and Relations. CIIR Technical Report, 2013.
Dietz, Laura; Wang, Ziqi; Huston, Samuel; Croft, W. Bruce: Retrieving Opinions from Discussion Forums. Proceedings of ACM International Conference on Information and Knowledge Management (CIKM), 2013. .pdf
Dietz, Laura; Dalton, Jeffrey; Balog, Krisztian: Time-aware Evaluation of Cumulative Citation Recommendation Systems. Proceedings of SIGIR 2013 Workshop on Time-aware Information Access, TAIA, 2013 .pdf. code & supplement
Dalton, Jeffrey; Dietz, Laura: Constructing Query-Specific Knowledge Bases. Proceedings on the CIKM Workshop on Automated Knowledge Base Construction, 2013. .pdf.
Dalton, Jeffrey; Dietz, Laura: A Neighborhood Relevance Model for Entity Linking. Proceedings of the 10th International Conference in the RIAO series (OAIR), 2013. .pdf
Dietz, Laura: A Neighborhood-Relevance Model for Entity Linking. Mt Holyoke College, South Hadley, MA, USA, 20th of February, 2013. talk view in Web browser!
Dietz, Laura: A Neighborhood-Relevance Modelfor Entity Linking. Machine Learning and Friends Lunch Talk, University of Massachusetts, MA, USA, 14th of February, 2013. talk view in Web browser!
Dalton, Jeffrey; Dietz, Laura: Bi-directional Linkability From Wikipedia to Documents and Back Again: UMass at TREC 2012 Knowledge Base Acceleration Track. In: Proceedings of Text REtrieval Conference (TREC), 2012. .pdf
Dietz, Laura; Gamari, Benjamin; Guiver, John; Snelson, Edward; Herbrich, Ralf: De-Layering Social Networks with Shared Tastes of Friendships. In: Proceedings of the Sixth International AAAI Conference on Weblogs and Social Media (ICWSM), 2012 .pdf – Code & Supplement
Konietzny, Sebastian; Dietz, Laura; McHardy, Alice: Inferring functional modules of protein families with probabilistic topic models. In: BMC Bioinformatics, vol. 12, no. 1, 141+, 2011 .html
Dietz, Laura: Exploiting Graph-Structured Data in Generative Probabilistic Models. PhD Thesis, January 2011. Max Planck Institute for Informatics and Saarland University, 2011 .pdf
Dietz, Laura: Inferring Shared Interests from Social Networks“. In: NIPS Workshop on Computational Social Science and the Wisdom of Crowd : Text and Beyond, 2010 .pdf
Dietz, Laura: Directed Factor Graph Notation for Generative Models. Technical Report, 2010 .pdf – TIKZ macros and algorithms module: .zip - Thanks to Jaakko Luttinen for creating an improved version at github.com/jluttine/tikz-bayesnet !
Dietz, Laura: Modeling Shared Tastes in Online Communities. In: NIPS Workshop on Applications for Topic Models: Text and Beyond, 2009 .pdf
Dietz, Laura ; Dallmeier, Valentin ; Zeller, Andreas ; Scheffer, Tobias: Localizing Bugs in Program Executions with Graphical Models. In: Advances in Neural Information Processing Systems, 2009 .pdf – supplement – project
Dietz, Laura; Bickel, Steffen;Scheffer Tobias : Unsupervised Prediction of Citation Influences. In: Proceedings of the 24th International Conference on Machine Learning. Corvallis, Oregon, USA, June 2007 .pdf – Watch the Talk – project
March 2015 - present: Post-doctoral Research Scientist at Data and Web Science Group (DWS), Mannheim University (DWS, Simone Paolo Ponzetto)
August 2012 - March 2015: Research Scientist at Center for Intelligent Information Retrieval (CIIR), University of Massachusetts (CIIR, Bruce Croft)
October 2010 - August 2012: Post-doctoral researcher at University of Massachusetts (IESL, Andrew McCallum).
January 2008 - January 2011: PhD Student at Max-Planck-Institute for Informatics (Databases and Information Systems, Prof Gerhard Weikum), Saarbruecken
January 2007 - December 2008: PhD Student at Max-Planck-Institute for Informatics (Machine Learning, Prof. Tobias Scheffer), Saarbruecken
October 2006 - December 2006: PhD Scholarship at Knowledge Management Group (Prof. Tobias Scheffer), Humboldt University, Berlin
December 2002 - September 2006: Research Associate at Concert Division and I-Info Division, Fraunhofer Institute for Publication and Information Systems (IPSI), Darmstadt
Strepsirrhini, a modular composable toolkit in scala for retrieval, reranking, and expansion with and without entity annotations, Laura Dietz, 2014.
Riffle, open hardware and software for a water-quality sensor with data analysis software. Benjamin Gamari, Don Blair, Laura Dietz, 2014.
Stream-Eval, an evaluation framework for time-aware evaluation of cumulative citation recommendation systems. Laura Dietz, Jeffrey Dalton, Krizstian Balog, 2013.
KB-Bridge, a framework for entity linking. Jeffrey Dalton, Laura Dietz, 2013.
Hphoton and photon-tools - overview - walkthrough Open source hardware and software for single-molecule fluorescence analysis. Benjamin Gamari, Laura Dietz, Lori Goldner, 2013. (Received OSSI Award 2013 from UMass ICB3)
Bayes-Stack, a framework for inference on probabilistic graphical models. Laura Dietz, Benjamin Gamari, 2012.
Tikz-Bayesnet, open source latex add-on / TIKZ package for graphical model diagrams. Laura Dietz, 2010. (Forked and continued by Jaakko Luttinen, 2012).