The 19th ACM International Conference on Information and Knowledge Management

Keynote Speakers

Jamie Callan

Jamie Callan is a Professor of Computer Science at Carnegie Mellon's Language Technologies Institute and School of Information Systems & Management. Prior to joining CMU he was a Research Assistant Professor at the University of Massachusetts, where he also received his Ph.D. His research and teaching focus on text-based information retrieval, primarily search engine architectures, federated search of groups of search engines, adaptive information filtering, text mining, and information retrieval for educational applications. He has published more than 150 scientific papers. He is the Editor-in-Chief of ACM Transactions on Information Systems (TOIS) and was a founding Editor-in-Chief of Foundations and Trends in Information Retrieval (FnTIR). He has served as Chair of ACM SIGIR, and Program Chair of the SIGIR and CIKM conferences.

Speech Title: Search Engine Support for Software Applications (Click here for pdf file)

Abstract:Question-answering, computer-assisted language learning, text mining, and other software applications that use a full-search engine to find information in a large text corpus are becoming common. A software application may use metadata and text annotations to reduce the mismatch between the concept-based representations convenient for inference and the word-based representations typically used for text retrieval. Software applications may also be able to specify detailed requirements that retrieved passages must satisfy. This use of text search is very different than the ad-hoc, interactive search that information retrieval research typically studies.

Search engine developers are beginning to respond by extending indexing and retrieval models developed for structured (e.g., XML) documents to support multiple representations of document content, text annotations, metadata, and relationships. These new requirements force developers to reconsider basic assumptions about index data structures and ranked retrieval models.

How best to use these new capabilities is an open problem. Straightforward transformation of a detailed information need into a complex structured query can produce a query that is effective for exact-match retrieval, but a challenge for the retrieval model to use effectively for best-match retrieval. Bag-of-words retrieval is often disparaged, but its advantage is that it is robust: It works well even when desired documents do not exactly meet expectations.

This talk discusses some of the problems encountered when extending a search engine to support queries posed by other software applications and structured documents with derived annotations.

Divesh Srivastava

Divesh Srivastava is the head of the Database Research Department at AT&T Labs-Research. He received his Ph.D. from the University of Wisconsin, Madison, and his B.Tech from the Indian Institute of Technology, Bombay. He is on the board of trustees of the VLDB Endowment, the associate Editor-in-Chief of the IEEE Transactions on Knowledge and Data Engineering, and an associate editor of the ACM Transactions on Database Systems. He has served as the program committee co-chair of many conferences, including VLDB 2007. His research interests and publications span a variety of topics in data management.

Speech Title: Schema Extraction (Click here for pdf file)

Abstract: Understanding the schema of a complex database is a crucial step in exploratory data analysis. However, gaining such an understanding is challenging for new users for many reasons.

First, complex databases often have thousands of inter-linked tables, with little indication of the important tables or the main concepts in the database schema.

Second, schemas can be inaccurate, e.g., some foreign/primary key relationships are not known to designers but are inherent in the data, while others become invalid due to data inconsistencies. In this talk, we present an approach to effectively address these challenges and automatically extract an understandable schema from a complex database.

The first step in our approach is a robust algorithm to discover foreign/primary key relationships between tables. We present a general rule, termed Randomness, that subsumes a variety of other rules proposed in previous work, and develop efficient approximation algorithms for evaluating randomness, using only two passes over the data.

The second step is a principled approach to summarize the schema consisting of tables linked using foreign/primary keys, so that a user can easily identify the main concepts and important tables. We present an information theoretic approach to identify important tables, and an intuitive notion of table similarity that can be used to cluster tables into the main concepts of the schema. We validate our approach using real and synthetic datasets.

This is based on joint work with Marios Hadjieleftheriou, Beng Chin Ooi, Cecilia M. Procopiuc, Xiaoyan Yang and Meihui Zhang.

Susan Dumais

Susan Dumais is a Principal Researcher and manager of the Context, Learning and User Experience for Search (CLUES) Group at Microsoft Research. Prior to joining Microsoft Research, she was at Bellcore and Bell Labs for many years, where she worked on Latent Semantic Indexing (a statistical method for concept-based retrieval), interfaces for combining search and navigation, and organizational impacts of new technology. Her current research focuses on user modeling and personalization, context and information retrieval, temporal dynamics of information, interactive retrieval, and novel evaluation methods. She has worked closely with several Microsoft groups (Bing, Windows Desktop Search, SharePoint Portal Server, and Office Online Help) on search-related innovations. Susan has published more than 200 articles in the fields of information science, human-computer interaction, and cognitive science, and holds several patents on novel retrieval algorithms and interfaces. Susan is also an adjunct professor in the Information School at the University of Washington. She is Past-Chair of ACM's Special Interest Group in Information Retrieval (SIGIR), and serves on several editorial boards, technical program committees, and government panels. She was elected to the CHI Academy in 2005, an ACM Fellow in 2006, and received the Gerard Salton Award from SIGIR for Lifetime Achievement in 2009.

Speech Title: Temporal Dynamics and Information Retrieval (Click here for pdf file)

Abstract: Many digital resources, like the Web, are dynamic and ever-changing collections of information. However, most of the tools information retrieval and management that have been developed for interacting with Web content, such as browsers and search engines, focus on a single static snapshot of the information. In this talk, I will present analyses of how Web content changes over time, how people re-visit Web pages over time, and how re-visitation patterns are influenced by changes in user intent and content. These results have implications for many aspects of information retrieval and management including crawling, ranking and information extraction algorithms, result presentation, and evaluation. I will describe a prototype system that supports people in understanding how the information they interact with changes over time, and a new retrieval model that incorporates features about the temporal evolution of content to improve core ranking. Finally, I will conclude with an overview of some general challenges that need to be addressed to fully incorporate temporal dynamics in information retrieval and information management systems.

Gregory Grefenstette

Biographical sketch: Gregory Grefenstette (Stanford'78, University of Pittsburgh'93) is chief science officer of Exalead, part of DassaultSystemes. An expert in natural language processing, Grefenstette established the field of Cross Language Information Retrieval by organizing its first Workshop at SIGIR'96. He is also one of the pioneers of distributional semantics, following his PhD work "Exploring Automatic Thesaurus Generation" (Kluwer, 1994). Involved in information retrieval since the early TREC days, he has always been keen on large scale solutions to natural language processing problems, co-editing with Adam Kilgarriff a special issue of "Computational Linguistics" in 2003. Former chief scientist at the Xerox Research Centre Europe (93-01), at Clairvoyance Corporation (01-04), and with the French CEA (04-08), he has been active in transferring research into products as inventor in 16 U.S patents. In recent years, he has been working with Adrian Popescu on Geographical Indexing. With Dassault Systemes, he is interested in exploring multidimensional access to information.

Speech Title: Use of Semantics in Real Life Applications (Click here for pdf file)

Abstract: Much research in computer exploitable semantics performed in the 1990s and early 2000s is finally finding its way into practical applications. This talk will explore these paths from research to industry, illustrated by current products on the market.