33rd Annual ACM SIGIR Conference


All workshops will run full-day on Friday July 23rd, 2010

Workshop Co-Chairs :

IMPORTANT: The Workshop on Next-Generation Test Collections has been cancelled

Workshop on Crowdsourcing for Search Evaluation

  • Vitor Carvalho (Microsoft)
  • Matthew Lease (University of Texas at Austin)
  • Emine Yilmaz (Microsoft)

The SIGIR 2010 Workshop on Crowdsourcing for Search Evaluation (CSE2010) solicits submissions on topics including but are not limited to the following areas:

  • Novel applications of crowdsourcing for evaluating search systems (see examples below)
  • Novel theoretical, experimental, and/or methodological developments advancing state-of-the-art knowledge of crowdsourcing for search evaluation
  • Tutorials on how the different forms of crowdsourcing might be best suited to or best executed in evaluating different search tasks
  • New software packages which simplify or otherwise improve general support for crowdsourcing, or particular support for crowdsourced search evaluation
  • Reflective or forward-looking vision on use of crowdsourcing in search evaluation as informed by prior and/or ongoing studies
  • How crowdsourcing technology or process can be adapted to encourage and facilitate more participation from outside the USA

The workshop especially calls for innovative solutions in the area of search evaluation involving significant use of a crowdsourcing platform such as Amazon's Mechanical Turk, Crowdflower, LiveWork, etc. Novel applications of crowdsourcing are of particular interest. This includes but is not restricted to the following tasks:

  • cross-vertical search (video, image, blog, etc.) evaluation,
  • local search evaluation
  • mobile search evaluation
  • realtime/news search evaluation
  • entity search evaluation
  • discovering representative groups of rare queries, documents, and events in the long-tail of search
  • detecting/evaluating query alterations

For example, does the inherent geographic dispersal of crowdsourcing enable better assessment of a query's local intent, its local-specific facets, or diversity of returned results? Could crowd-sourcing be employed in near real-time to better assess query intent for breaking news and relevant information?

Most Innovative Awards Sponsored by Microsoft Bing

As further incentive to participation, authors of the most novel and innovative crowdsourcing-based search evaluation techniques (e.g. using Amazon's Mechanical Turk, Livework, Crowdflower, etc.) will be recognized with “Most Innovative Awards” as judged by the workshop organizers. Selection will be based on the creativity, originality, and potential impact of the described proposal, and we expect the winners to describe risky, ground-breaking, and unexpected ideas. The provision of awards is thanks to generous support from Microsoft Bing, and the number and nature of the awards will depend on the quality of the submissions and overall availability of funds. All valid submissions to the workshop will be considered for the awards.

Submission Instructions

Submissions should report new (unpublished) research results or ongoing research. Long paper submissions (up to 8 pages) will be primarily target oral presentations. Short papers submissions can be up to 4 pages long, and will primarily target poster presentations. Papers should be formatted in double-column ACM SIG proceedings format (http://www.acm.org/sigs/publications/proceedings-templates). Papers must be submitted as PDF files. Submissions should not be anonymized.

Important Dates
  • Submissions due: June 7, 2010 (11:59PM US Eastern Standard Time)
  • Notification of acceptance: June 21, 2010
  • Camera-ready submission: June 28, 2010
  • Workshop date: July 23, 2010

Email the organizers at cse2010@ischool.utexas.edu

Program Committee
  • Eugene Agichtein (Emory University)
  • Ben Carterette (University of Delaware)
  • Charlie Clarke (University of Waterloo)
  • Gareth Jones (Dublin City University)
  • Jaap Kamps (University of Amsterdam)
  • Gabriella Kazai (Microsoft)
  • Winter Mason (Yahoo! Research)
  • Stefano Mizzaro (University of Udine)
  • Gheorghe Muresan (Microsoft Bing)
  • Iadh Ounis (University of Glasgow)
  • Mark Sanderson (University of Sheffield)
  • Mark Smucker (University of Waterloo)
  • Siddharth Suri (Yahoo! Research)
  • Fang Xu (Saarland University)

Workshop on Accessible Search Systems

  • Pavel Serdyukov (Technical University of Delft, The Netherlands)
  • Djoerd Hiemstra (University of Twente, The Netherlands)
  • Ian Ruthven (University of Strathclyde, UK)
Web page

Current search systems are not adequate for individuals with specific needs: children, older adults, people with visual or motor impairments, and people with intellectual disabilities or low literacy. Search services are typically created for average users (young or middle-aged adults without physical or mental disabilities) and information retrieval methods are based on their perception of relevance as well. The workshop will be the first to raise the discussion on how to make search engines accessible for different types of users, including those with problems in reading, writing or comprehension of complex content. Search accessibility means that people whose abilities are considerably different from those that average users have will be able to successfully use search systems.

The objective of the workshop is to provide a forum and initiate collaborations between academics and industrial practitioners interested in making search more usable for users in general and for users with specific needs in particular. We encourage presentation and participation from researchers working at the intersection of information retrieval, natural language processing, human-computer interaction, ambient intelligence and related areas.

The workshop will be a mix of oral presentations for long papers (maximum of 8 pages), a session for posters (maximum of 2 pages) and a panel discussion. All submissions will be reviewed by at least two PC members. Workshop proceedings will be available at the workshop.

  • David Elsweiler – University of Erlangen, Germany
  • Gareth J. F. Jones – Dublin City University, Ireland
  • Liadh Kelly – Dublin City University, Ireland
  • Jaime Teevan – Microsoft Research Redmond, USA

Desktop search refers to the process of searching within one’s personal space of information. The information searched during a desktop search can include content that resides on one's personal computer (e.g., documents, emails, visited Web pages, and multimedia files), and may extend to content on other personal devices, such as music players and mobile phones. Despite recent research interest, desktop search is under-explored compared to other search domains such as the web, semi-structured data, or flat text.

Problems with existing desktop search tools include performance issues, an over-reliance on good query formulation, and a failure to fit within the user’s work flow or the user’s mental model. Evaluation of desktop search tools is difficult. There are no established or standardized baselines or evaluation metrics, and no commonly available test collections. Privacy concerns, the challenges of working with personal collections, and the individual differences in behaviour between users all must be addressed to advance research in this domain.

This workshop will bring together academics and industrial practitioners interested in desktop search with the goal of fostering collaborations and addressing the challenges faced in this area. The workshop will be structured to encourage group discussion and active collaboration among attendees. We encourage participation from people in the fields of information retrieval, personal information management, natural language processing, human-computer interaction, and related areas.

Workshop on Simulation of Interaction: Automated Evaluation of Interactive IR

  • Leif Azzopardi,
  • Kalervo Jarvelin
  • Jaap Kamps
  • Mark D Smucker

This workshop aims to explore the use of Simulation of Interactions to enable automated evaluation of Interactive Information Retrieval Systems and Applications.

Standard test collections only enable a very limited type of interaction to be evaluated (i.e. query - response). This is largely due to the high costs involved in going beyond this limited interaction and problems associated with replicability and repeatability of experiments.

Arguably, Simulation of Interaction provides a cost-effective way to construct and repeat evaluations of interactive systems and applications. This powerful automated evaluation technique provides a high degree of control and ensures that experiments can be replicated — but we need your help in developing “standardized” methodologies for simulations, techniques for simulations, models and methods for simulations, measures of performance given simulations, and more.

Sign up to this workshop shop if you are interested in Interactive IR retrieval and the modeling of users, systems, interactions and behaviors and how they can be simulated (or not) within automated evaluation methodologies for IR. The workshop is going to be lively and very interactive (both online and offline) compromising of discussions and debates all aimed at producing valuable community resources and references on simulation in IR.

Workshop on Query Representation and Understanding

  • Bruce Croft and Michael Bendersky, University of Massachusetts Amherst
  • Hang Li and Gu Xu, Microsoft Research Asia

Understanding the user's intent or information need that underlies a query has long been recognized as a crucial part of effective information retrieval. Despite this, retrieval models, in general, have not focused on explicitly representing intent, and query processing has been limited to simple transformations such as stemming or spelling correction. With the recent availability of large amounts of data about user behavior and queries in web search logs, there has been an upsurge in interest in new approaches to query understanding and representing intent.

This workshop has the goal of bringing together the different strands of research on query understanding, increasing the dialogue between researchers working in this relatively new area, and developing some common themes and directions, including definitions of tasks and evaluation methodology. We hope the workshop could bring together researchers from IR, ML, NLP, and other areas of computer and information science who are working on or interested in this area, and provide a forum for them to identify the issues and the challenges, to share their latest research results, to express a diverse range of opinions about this topic, and to discuss future directions.

Workshop on Large-Scale Distributed Information Retrieval

  • Roi Blanco, Yahoo! Research, Barcelona, Spain
  • B. Barla Cambazoglu, Yahoo! Research, Barcelona, Spain
  • Claudio Lucchese, ISTI-CNR, Pisa, Italy
  • Flavio Junqueira, Yahoo! Research, Barcelona, Spain
  • Fabrizio Silvestri, ISTI-CNR, Pisa, Italy

This workshop aims to bring together both experienced and young researchers from distributed IR, including work on P2P search and efficiency of distributed systems for information processing. This edition of the workshop will favor novel, perhaps even outrageous ideas as opposed to finished research work, thus strongly encouraging the submission of position papers in addition to research papers. Position papers are important to foster discussion upon controversial and intriguing ideas on new ways of building distributed infrastructures for information processing.

Workshop on Next-Generation Test Collections

This workshop has been cancelled

  • Ian Soboroff, NIST
  • Ben Carterette, University of Delaware
  • Virgil Pavlu, Northeastern University

Over the last 15 years, Information Retrieval research corpora have experienced more than a thousand-fold increase in size: from the 1990s TIPSTER collections of hundreds of thousands of full-text articles to the 2009 ClueWeb collection of over a billion web pages, researchers are now working with a nearly unimaginable amount of text. The standard evaluation methodology—the Cranfield paradigm of calculating evaluation measures using test collections—has struggled to keep up, as research shows that even test collections for terabyte-sized corpora suffer from unforeseen judgment bias and reusability challenges. This workshop invites cutting-edge research on tackling the problem of building test collections at the multi-terabyte scale that are realistic, fair, and reusable. The goal of the workshop is to map out the critical research questions that need to be asked and the types of collections we need to consider building in order to answer them.

Workshop on Feature Generation and Selection for Information Retrieval

  • Evgeniy Gabrilovich, Yahoo! Research, USA
  • Alex Smola, Australian National University and Yahoo! Research, USA
  • Naftali Tishby, Hebrew University of Jerusalem, Israel

Modern information retrieval systems facilitate information access at unprecedented scale and level of sophistication. However, in many cases the underlying representation of text remains quite simple, often limited to using a weighted bag of words. Over the years, several approaches to automatic feature generation have been proposed (such as Latent Semantic Indexing, Explicit Semantic Analysis, Hashing, and Latent Dirichlet Allocation), yet their application in large scale systems still remains the exception rather than the rule. On the other hand, numerous studies in NLP and IR resort to manually crafting features, which is a laborious and expensive process. Such studies often focus on one specific problem, and consequently many features they define are task- or domain-dependent. Consequently, little knowledge transfer is possible to other problem domains. This limits our understanding of how to reliably construct informative features for new tasks.

An area of machine learning concerned with feature generation (or constructive induction) studies methods that endow computers with the ability to modify or enhance the representation language. Feature generation techniques search for new features that describe the target concepts better than the attributes supplied with the training instances. Complementary to feature generation, the issue of feature selection arises. It aims to retain only the most informative features, e.g., in order to reduce noise and to avoid overfitting, and is essential when numerous features are automatically constructed.

We believe that much can be done in the quest for automatic feature generation for text processing, for example, using large-scale knowledge bases as well as the sheer amounts of textual data easily accessible today. The purpose of this workshop is to bring together researchers from many related areas (including information retrieval, machine learning, statistics, and natural language processing) to address these issues and seek cross-pollination among the different fields.

Web N-gram Workshop

  • Chengxiang Zhai (University of Illinois at Urbana-Champaign)
  • David Yarowsky (Johns Hopkins University)
  • Evelyne Viegas (Microsoft Research)
  • Kuansan Wang (Microsoft Research)
  • Stephan Vogel (Carnegie Mellon University)

The aim of the workshop is to bring together a group of leaders in information retrieval and language modeling to discuss the challenges in information retrieval and how language modeling approaches may help address some of these challenges. At the workshop we will focus on the use of n-gram models to further research in areas such as document representation and content analysis (e.g., clustering, classification, information extraction), query analysis (e.g. query suggestion, query reformulation), retrieval models and ranking, spelling; and the access to n-grams as an enabler of experimental design. Often discussed in the research community is the lack of large scale dataset and benchmarks to run experiments. This workshop will address this issue by bringing together the community of researchers who use n-grams, already made available by Yahoo and Google/LDC along, along with a new Web N-gram service where Microsoft Research, in partnership with Microsoft Bing, is providing the research community access to petabytes of Web N-gram via a cloud-based platform.

The Web N-gram services, currently in Beta at http://research.microsoft.com/web-ngram, will be made available to the participants of the workshop, with properties as follows:

  • Content types: Document Body, Document Title, Anchor Texts
  • Model types: smoothed models
  • Highest order N: 5 (N=5 in N-gram)
  • Training size (Body): over 1.3 trillion
  • #of 1-gram (Body): 1 billion
  • #of 5-gram (Body): 237 billion
  • Availability: Hosted Services by Microsoft
  • Refresh: Regular data updates (e.g. quarterly)

In this workshop, we encourage researchers to use the Microsoft Web N-gram service to explore novel applications of language models (e.g. long tail effects) and use of these data for enhancing the search experience (e.g. use of anchor text as a proxy to queries). We will also consider other applications such as machine translation, speech.

We also encourage research and experiments using or comparing different n-grams data sets to ultimately help create at the workshop a useful n-gram baseline for the research community, in terms of n-gram attributes such as size, access, content and model types needed for researchers.

Workshop Planned Activities:
  • Experiment results presented via talks, poster and demo session
  • Panel on providing access to data: academia needs, challenges and opportunities for industries to provide such data
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported