Towards a Large-Scale Video Processing System

We have witnessed an explosion of video data over recent decades. Video accounted for 75% of internet traffic in 2017, and is expected to make up 82% in 2022, with close to 1 million minutes of video crossing the Internet per second. According to the Wall Street Journal there will be a billion cameras on the streets by 2021. However, the way of querying video streams is still primitive, and requires lots of human intervention. In scenarios such as surveillance applications, humans often have to visually inspect a large amount of video to identify persons or objects of interest. It is therefore crucial to develop systems that could support the extraction of meaningful information from videos in real time, utilizing declarative queries, in a way akin to how people are interacting with database systems today.

On the other hand, recent breakthroughs in Deep Learning (DL) have made it possible to achieve highly accurate results in tasks such as image classification, object detection and object tracking, providing the building blocks to make large-scale video query processing a reality.

We are building a video processing system that is capable of performing declarative queries over large video repositories, providing support to a wide spectrum of applications such as crime investigation, content creation, and autonomous driving.

Managing and Mining Urban Spatio-Temporal Data

The wide-spread use of smart phones, sensors and other IoT devices in cities world-wide has given rise to a huge volume of urban spatio-temporal data, which often present themselves as high-velocity continuous streams with considerable noise and uncertainties. These data record a vast amount of movement information of people, vehicles, etc., and serve as the backbone of a variety of applications, such as urban traffic management, road network planning, location-based services, and environmental monitoring. While governments, businesses and other organizations have realized the tremendous value of urban spatio-temporal data, how to effectively tap into this potential is still an elusive goal.

The unifying theme of the project is to address the challenges arising from managing and mining urban spatio-temporal data. Some of the questions we strive to answer are: How to improve the quality of such data to provide a reliable basis for data analytics? How to efficiently process continuous queries (such as k nearest-neighbor queries) and discover patterns over spatio-temporal streams? How to construct a probabilistic model to capture the underlying intention of movement? How to use this model to support advanced applications, such as traffic flow forecasting, dynamic navigation, and next location prediction?

Novel models and methods developed from this project will help lay the data management and analytics foundation for a wide spectrum of applications, and provide a better understanding of human mobility patterns.

Supporting Keyword Search over Structured Data

Enabling users to access databases using simple keywords can relieve them from the trouble of mastering a structured query language and understanding complex and possibly fast evolving database schemas. Although keyword search technology has matured in the Web arena, supporting keyword search over structured data, such as the data stored in databases and data warehouses, presents unique challenges. When performing keyword search over structured data, the results are no longer existing Web pages, but "virtual documents" composed by assembling the keyword-matching tuples from (potentially) different tables. Correspondingly, the space that must be explored during search becomes much larger than that encountered in Web search. The long-term objective of this research is to enable keyword search as an efficient and effective means of database navigation and exploration. To this end, the project focuses on three topics: improving the quality of search results, incorporating domain knowledge into keyword query processing, and supporting keyword-driven data analytics. We expect to propose a series of novel models and algorithms that would improve the functionality, effectiveness, and efficiency of keyword search over the state-of-the-art methods. The results are expected to find applications in a wide spectrum of scenarios, such as business intelligence and e-health.