( provisional )
8:50 - 9:00
09:00-10:30
Question Answering with Knowledge Bases
Scott Yih
invited speaker, 45 min
Using Sparse Coding for Answer Summarization in Non-Factoid Community Question-Answering
Zhaochun Ren, Hongya Song, Piji Li, Shangsong Liang, Jun Ma and Maarten de Rijke
workshop paper, 15 min
CQADupStack: Gold or Silver?
Doris Hoogeveen, Karin Verspoor and Timothy Baldwin
workshop paper, 15 min
A Proposal for Evaluating Answer Distillation from Web Data
Bhaskar Mitra, Grady Simon, Jianfeng Gao, Nick Craswell and Li Deng
workshop paper, 15 min
10:30-11:00
11:00-13:00
Getting Rid of the Ten Blue Links
Mark Sanderson
invited speaker, 45 min
Introduction to cQA SemEval Challenge
Alessandro Moschitti
introduction, 15 min
Addressing Community Question Answering in English and Arabic
Giovanni Da San Martino, Alberto Barrón-Cedeño, Salvatore Romeo, Alessandro Moschitti, Shafiq Joty, Fahad A. Al Obaidli, Kateryna Tymoshenko, Antonio Uva and Daniele Bonadiman
external contribution, 15 min
Structural Models for Ranking Tasks of Community Question Answering
Simone Filice, Danilo Croce, Alessandro Moschitti and Roberto Basili
external contribution, 15 min
12:30-14:00
14:00-15:30
Cura Te Ipsum: answering symptom queries with question intent.
Evgeniy Gabrilovich
invited speaker, 45 min
Beyond Factoid QA: Real Questions from Real Users in Real Time
Eric Nyberg
invited speaker, 45 min
15:30-16:00
16:00-17:15
Web Question Answering with CQA data
David Carmel
invited speaker, 45 min
LiveQA: system and experiments
Denis Savenkov and Eugene Agichtein
external contribution, 15 min
Complex questions: Let me Google it for you
Alexandra Vtyurina and Charles Clarke
workshop paper, 15 min
17:15-17:30
In this talk, I will describe the work we are doing at RMIT to change one of the commonest web pages we all look at: the Search Result Page (SERP). In our work we are looking to replace the SERP with a set of answer passages that address the user’s query. In the context of general web search, the problem of finding answer passages has not been explored extensively. Previous studies have found that many informational queries can be answered by a passage of text extracted from a retrieved document, relieving the user from having to read the actual document. While current passage retrieval methods that focus on topical relevance have been shown to be not effective at finding answers, the result shows that more knowledge is required to identify answers in the document.
We have been formulating the answer passage extraction problem as a summarization task. We initially used term distributions extracted from a Community Question Answering (CQA) service to generate more effective summaries of retrieved web pages. An experiment was conducted to see the benefit of using the CQA data in finding answer passages. We analyze the fraction of answers covering a set of queries, the quality of the corresponding result from the answering service, and their impact on the generated summaries. I will also talk about recent work where we re-rank retrieved passages according to the summary quality and incorporating document summarizability into the ranking function.
Prof. Mark Sanderson works in Computer Science at the School Science at RMIT University in Melbourne, Australia, where he is head of the RMIT Information Retrieval (IR) group, which is regarded as the leading IR group in Australia. He is co-editor of Foundations and Trends in Information Retrieval, which is currently the highest impact rated IR journal. He is also an associate editor of IEEE TKDE and of ACM TWeb. Prof. Sanderson was co-PC chair of ACM SIGIR in 2009 and 2012, and general chair of the conference in 2004. Prof Sanderson is also a visiting professor at NII in Tokyo.
In 2011, the first IBM Watson system showed that a question-answering system could be: a) trained to be accurate and confident, and b) scaled to be fast -- enough to win at a question-answer game like Jeopardy! against human opponents in real time. Since 2011, a variety of research projects at Carnegie Mellon have focused on advancing the state of the art beyond the Jeopardy! challenge, along specific dimensions: a) automatic domain adaptation and system optimization for complex information systems (e.g., ECD, CSE); b) application of advanced QA approaches to practical domains (e.g., QUADS, BioASQ); c) generalization of QA technology to support live question-answering on the web (e.g., LiveQA). In this talk, I will present an overview of these recent results, and some ideas for future research along these dimensions.
Eric Nyberg is a Professor in the Language Technologies Institute in the School of Computer Science at Carnegie Mellon University. He is Director for the Master's Program in Computational Data Science (formerly known as the M.S. Program in Very Large Information Systems). Nyberg has made significant research contributions to the fields of automatic text translation, information retrieval, and automatic question answering. He received his Ph.D. from Carnegie Mellon University (1992), and his BA from Boston University (1983). He has pioneered the Open Advancement of Question Answering, an architecture and methodology for accelerating collaborative research in automatic question answering. In 2012, Nyberg received the Allen Newell Award for Research Excellence for his scientific contributions to the field of question answering and his work on the Watson project. He received the BU Computer Science Distinguished Alumna/Alumnus Award on September 27, 2013.
In this talk I’ll cover some of our recent work on WebQA with CQA data. I’ll describe a general framework for enriching general Web search with related questions and answers extracted from CQA sites. I’ll get into the details of several components of the platform including identifying Web queries with question intent, high precision retreival from CQA data, and supporting human answers for advice seeking questions. I’ll finally overview the Trec LiveQA track which we organize the second year in a row. This track challenges participants with real-time questions extracted from the live stream of questions submitted to the Yahoo Answers site.
David is a Principal Research Scientist at Yahoo Lab at Haifa. David’s research is focused on search and content quality analysis in Web and Email, query performance prediction, vertical search, and text mining. David has published more than 100 papers in IR and Web journals and conferences, and serves on the editorial board of the IR journal and as a senior PC member or as Area Chair of many ACM conferences (SIGIR, WWW, WSDM. CIKM). He organized a number of workshops and taught several tutorials at SIGIR, and WWW. David is co-author of the book “Estimating the Query Difficulty for Information Retrieval”, published by Morgan & Claypool in 2010, and the co-author of the paper “Learning to estimate query difficulty”, which won the Best Paper Award at SIGIR 2005. David earned his PhD in Computer Science from the Technion, Israel Institute of Technology in 1997.
Building a question answering system to automatically answer natural-language questions is a long-standing research problem. While traditionally unstructured text collections are the main information source for answering questions, the development of large-scale knowledge bases provides new opportunities for open-domain factoid question answering. In this talk, I will present our recent work on semantic parsing, which maps natural language questions to structured queries that can be executed on a graph knowledge base to answer the questions. Our approach defines a query graph that resembles subgraphs of the knowledge base and can be directly mapped to a logical form. With this design, semantic parsing is reduced to query graph generation, formulated as a staged search problem. Compared to existing methods, our solution is conceptually simple and yet outperforms previous state-of-the-art results substantially.
Scott Wen-tau Yih is a Senior Researcher at Microsoft Research. His research interests include natural language processing, machine learning and information retrieval. Yih received his Ph.D. in computer science at the University of Illinois at Urbana-Champaign, where he developed the joint inference framework using integer linear programming (ILP). The approach has been widely adopted in the NLP community since then. After joining MSR in 2005, he has worked on email spam filtering, keyword extraction and search & ad relevance. His recent work focuses on continuous semantic representations using neural networks and matrix/tensor decomposition methods, with applications in lexical semantics, knowledge base embedding and question answering. Yih received the best paper award from CoNLL-2011, an outstanding paper award from ACL-2015 and has served as area chairs (HLT-NAACL-12, ACL-14, EMNLP-16), program co-chairs (CEAS-09, CoNLL-14) and action editor (Transactions of ACL) in recent years.
About 1 percent of searches on Google are symptom-related. Those queries are rarely formulated as questions yet the question intent is often clear as users are searching for pertinent medical conditions. In this talk we will discuss the recently launched symptom search experience on Google. We use machine learning methods to identify queries with condition-seeking intent. We extract relevant health conditions by analyzing the web search results as well as by consulting the Knowledge Graph. Finally, we learn a ranker for ordering the list of relevant conditions, and evaluate the system performance with medical doctors.
Dr. Evgeniy Gabrilovich is a senior staff research scientist at Google, where he works on improving healthcare. Prior to joining Google in 2012, he was a director of research and head of the natural language processing and information retrieval group at Yahoo! Research. Evgeniy is an ACM Distinguished Scientist, and is a recipient of the 2014 IJCAI-JAIR Best Paper Prize. He is also a recipient of the 2010 Karen Sparck Jones Award for his contributions to natural language processing and information retrieval. Evgeniy is currently serving as a program chair for WWW 2017, and has served as a program chair for WSDM 2015. He earned his PhD in computer science from the Technion - Israel Institute of Technology. Recently, he graduated (with extra credit) from the Executive MD training program at Harvard Medical School.
Web search engines have made great progress at answering factoid queries. However, they are not well-tailored for managing more complex questions, especially when they require explanation and/or description. WebQA workshop aims at exploring diverse approaches to answering questions on the Web. This year, particular emphasis will be given to Community Question Answering (CQA), where comments by the users engaged in the community can be used to answer new questions. Questions posted on the Web can be short and ambiguous (similarly to Web queries to a search engine). These issues make the WebQA task more challenging than traditional QA, and finding the most effective approaches for it remains an open problem.
Unlike the more formal conference format, the aim of this workshop is to bring together researchers in diverse areas working on this problem, including those from NLP, IR, social media and recommender systems communities. This workshop is specifically designed for the SIGIR audience. However, due to its format, its goal, as compared to the main conference, is to conduct a more focused and open discussion, encouraging the presentation of work in progress and late-breaking initial results in Web Question Answering. Both academic and industrial participation will be solicited, including keynotes and invited speakers.
Workshop Tumblr page: http://webqa2016.tumblr.com/
We encourage submissions describing ongoing research, late breaking, preliminary results, and position papers on all topics related to Web question answering, including:
We solicit papers on preliminary or late-breaking research (from 4 to 6 pages), as well as short “opinion” or “position” papers (from 2 to 4 pages). All submissions should be in the ACM conference paper style.
Papers presented at the workshop will be uploaded to arXiv.org and will be considered non-archival. They may be submitted elsewhere (modified or not). This makes the workshop suitable for presenting anddiscussing current work, without preventing future publications.
Submission link:
https://easychair.org/conferences/?conf=webqa2016
Charlie Clarke is a Professor in the School of Computer Science at the University of Waterloo, Canada. His research interests include search, information retrieval, and text data mining. He has published on a wide range of topics, including papers related to question answering, filesystem search, search security, search interfaces, user modeling, statistical natural language processing, and the evaluation of search systems. He is a co-author of the book Information Retrieval: Implementing and Evaluating Search Engines, MIT Press, 2010. In addition, he has held a number of software development and consulting positions across industry.
Preslav Nakov is a Senior Scientist at the Qatar Computing Research Institute. His research interests include computational linguistics, interactive question answering, lexical semantics, machine translation, Web as a corpus, and biomedical text processing. Preslav Nakov co-authored a Morgan & Claypool book on Semantic Relations between Nominals, two books on computer algorithms, and over 40 research papers in top-tier conferences and journals. He is an Associate Editor of the AI Communications journal and an elected member of the SIGLEX board. He served on the program committee of the major conferences and workshops in computational linguistics, including as a co-organizer and an area/publication/tutorial chair; he co-chaired SemEval 2014-2016 and was an area co-chair of *SEM and EMNLP.
Lluís Màrquez is a Principal Scientist at the Qatar Computing Research Institute, previously associate professor at the Technical University of Catalonia (UPC). His research focuses on machine learning methods for natural language structure prediction problems, including syntactic and semantic parsing. He works on applications in statistical machine translation and its evaluation, and question answering in community forums. He has 120+ papers in computational linguistics and machine learning journals and conferences. He has been general and program chair of major conferences in the area, and held several roles in offices of ACL, EACL and the ACL special interest groups (SIGNLL and SIGDAT).
Alessandro Moschitti is a Principal Scientist of the Qatar Computing Research Institute and an associate professor at the CS Department of Trento University, Italy. He has participated to the Jeopardy! Grand Challenge with IBM Watson and has been working on QA since 2002. His expertise concerns theoretical and applied machine learning in the areas of NLP, IR and Data Mining. His research on kernels for advanced syntactic/semantic processing is documented in more than 220 scientific articles published in major venues. He has been General Chair of EMNLP 2014 and PC co-Chair of CoNLL 2015. He has received four IBM Faculty awards, one Google Faculty award and five best paper awards.
Eugene Agichtein is an Associate Professor in the Math and Computer Science department at Emory University. Eugene’s research interest and publications are in web search and information retrieval (question answering, CQA, search behavior, social media), text and data mining, and human-computer interaction. He coauthored over 100 publications, including best paper awards at SIGMOD, SIGIR, and WSDM, and the 2013 Karen Spark Jones award.
Idan Szpektor is a senior research scientist at Yahoo Research in Haifa. Idan’s research is focused on Natural Language Processing and its applications in Web technologies. He published over 25 papers in leading conferences and journals related to textual entailment, knowledge acquisition and text generation in NLP, and to information retrieval, question answering, recommender systems and quality analysis in the fields of search and community question answering. Idan was a co-organizer of the first WebQA workshop in 2015.