Managing and Mining Urban Spatio-Temporal Data

The wide-spread use of smart phones, sensors and other IoT devices in cities world-wide has given rise to a huge volume of urban spatio-temporal data, which often present themselves as high-velocity continuous streams with considerable noise and uncertainties. These data record a vast amount of movement information of people, vehicles, etc., and serve as the backbone of a variety of applications, such as urban traffic management, road network planning, location-based services, and environmental monitoring. While governments, businesses and other organizations have realized the tremendous value of urban spatio-temporal data, how to effectively tap into this potential is still an elusive goal.

The unifying theme of the project is to address the challenges arising from managing and mining urban spatio-temporal data. Some of the questions we strive to answer are: How to improve the quality of such data to provide a reliable basis for data analytics? How to efficiently process continuous queries (such as k nearest-neighbor queries) and discover patterns over spatio-temporal streams? How to construct a probabilistic model to capture the underlying intention of movement? How to use this model to support advanced applications, such as traffic flow forecasting, dynamic navigation, and next location prediction?

Novel models and methods developed from this project will help lay the data management and analytics foundation for a wide spectrum of applications, and provide a better understanding of human mobility patterns.

Supporting Keyword Search over Structured Data

Enabling users to access databases using simple keywords can relieve them from the trouble of mastering a structured query language and understanding complex and possibly fast evolving database schemas. Although keyword search technology has matured in the Web arena, supporting keyword search over structured data, such as the data stored in databases and data warehouses, presents unique challenges. When performing keyword search over structured data, the results are no longer existing Web pages, but "virtual documents" composed by assembling the keyword-matching tuples from (potentially) different tables. Correspondingly, the space that must be explored during search becomes much larger than that encountered in Web search. The long-term objective of this research is to enable keyword search as an efficient and effective means of database navigation and exploration. To this end, the project focuses on three topics: improving the quality of search results, incorporating domain knowledge into keyword query processing, and supporting keyword-driven data analytics. We expect to propose a series of novel models and algorithms that would improve the functionality, effectiveness, and efficiency of keyword search over the state-of-the-art methods. The results are expected to find applications in a wide spectrum of scenarios, such as business intelligence and e-health.