33rd Annual ACM SIGIR Conference

Keynote Addresses



Lectures from two highly reputed scientists will be included in SIGIR 2010 program.


Is the Cranfield Paradigm Outdated?
Donna Harman, NIST, USA.

Slides (PDF)

The Cranfield paradigm was designed in the early 1960s when information access was via Boolean queries against manually indexed documents and there was (virtually) no text online. Its early implemention using abstracts, one-line queries, and complete relevance judgments for each query has undergone extensive modification over the years in TREC and other evaluation forums as the data and tasks have gotten more complex. It still stands as the model of choice, both for these (mostly) academic evaluations and at least partially for commercial evaluations, however the world of information access has exploded in recent years to encompass online shopping, social networking, personal desktop organization, etc. Is it time to have a new paradigm, and if so, how do we ensure that information retrieval evaluation remains scientifically valid?

Donna Harman graduated from Cornell University as an Electrical Engineer, and started her career working with Professor Gerard Salton in the design and building of several collections, including the first MEDLARS one. Later work was concerned with searching large volumes of data on relatively small computers, starting with the building the IRX system at the National Library of Medicine in 1987, and then the Citator/PRISE system at the National Institute of Standards and Technology in 1988. In 1990 she was asked by DARPA to put together a realistic test collection on the order of 2 gigabytes of text, and this test collection was used in the first Text Retrieval Conference (TREC). TREC is now in its 19th year, and along with its sister evaluations such as CLEF, NTCIR, INEX, and FIRE, serves as a major testing ground for information retrieval algorithms.


Refactoring the Search Problem to Exploit Difference Between Clients and Servers
Dr. Gary Flake, Microsoft Corporation, USA.

Slides (PDF)

The most common way of framing the search problem is as an exchange between a user and a database, where the user issues queries and the database replies with results that satisfy constraints imposed by the query but that also optimize some notion of relevance. There are several variations to this basic model that augment the dialogue between humans and machines through query refinement, relevance feedback, and other mechanism. However, rarely is this problem ever posed in a way in which the properties of the client and server are fundamentally different and in a way in which exploiting the differences can be used to yield substantially different experiences.

In this presentation, I propose a reframing of the basic search problem which presupposes that servers are scalable on most dimensions but suffer from low communication latencies while clients have lower scalability but support vastly richer user interactions because of lower communication latencies. Framed in this manner, there is clear utility in refactoring the search problem so that user interactions are processed fluidly by a client while the server is relegated to pre-computing the properties of a result set that cannot be efficiently left to the client.

I will conclude this presentation with an extensive demo of Pivot, an experimental client application that allows the user to visually interact with thousands of results at once, while using facetted-based exploration in a zoomable interface. I will argue that the evolving structure of the Web will tend to push all IR-based applications in a similar direction, which has the algorithmic intelligence increasingly split between clients and servers. Put another way, my claim is that future clients will be neither thin nor dumb.


Dr. Flake is a Technical Fellow at Microsoft, where he focuses on Internet products and technologies including search, advertising, content, portals, community, and application development. In this capacity, he helps define and evolve Microsoft’s product vision, technical architecture, and business strategy for online services. He is also the founder and director of Live Labs, a “skunk works” team that bridges research and development, and is widely recognized for inventing new best practices for catalyzing and managing innovation.

Prior to joining Microsoft, Dr. Flake founded Yahoo! Research Labs, ran Yahoo!’s corporate R&D activities and company-wide innovation effort, and was Overture's Chief Science Officer. Before joining Overture, he was a research scientist at NEC Research Institute and the leader of its Web data-mining program. He has numerous publications spanning over 20 years which have focused on machine learning, data mining, and self-organization. His other research interests include Web measurements, efficient algorithms, models of adaptation inspired by nature, and time series forecasting.

Dr. Flake earned his Ph.D. in computer science from the University of Maryland and has served on numerous academic conference and workshop organization committees. He also wrote the award-winning book, The Computational Beauty of Nature, which is used in college courses worldwide.

Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported