Our research themes span in the following areas.

Generative Models

While big data is powering up the deep learning models, it is costly and inevitably intrudes privacy to curate such data. Synthetically generated data not only alleviates the cost of collecting data but also overcome the privacy concerns and legislation boundary. How to generate synthetic data that fulfilll the requirements of data similarity, analysis utility, privacy and generalization?

We are exploring a wide range of generative models for synthesizing tabular data, ranging from Generative Adversarial Networks (GANs), latent difussion, flow models, and large language models. We are also actively collaborating with various industrial partners to explore synthetic data as a privacy-preserving data sharing solution, such as major European energy companies, and finacial companies.

Robust, and Private Learning

Artificial intelligence (AI) and machine learning (ML) are ubiquitous in our daily lives in the form of search engines, machine translation, self-driving cars and much more. The prevailing assumptions of existing ML algorithms are that data is neutral and can be freely accessed (without breaching privacy). As a result, the existing algorithms fall short of addressing challenges in realistic scenarios, i.e., against adversarial examples, dirty data, and unreliable execution environments while still preserving data privacy. These issues are further exacerbated by large and distributed learning problems, the data for which is collected over multiple sources and must be computed on distributed nodes.

In this line of research, we are designing robust, privacy-preserving and fair learning algorithms. Topics include:

  • Robust Machine Learning: designing learning algorithms that are robust to dirty data inputs.
  • Adversarial Attacks and Defenses: designing adversarial attacks and defense mechanisms for deployed deep models.
  • Differential private (deep) learning: designing effective differential private ML models with precise accuracy accounting. <!–
  • Fair Information Maximization on Social Media designing learning algorithms that can be debiased, for example in terms of gender or race, via data selection and objective modification of learning algorithms. –>


Federated Learning

Data is constantly generated and collected by edge devices (of the network) to power up today’s AI and ML analyses. With the advancement of algorithmic compression techniques and hardware technology, the ability to train neural networks and run inference on edge devices has gone from myth to reality. Federated learning (FL) is an emerging learning paradigm where distributed edge nodes collaboratively learn the weights of neural networks iteratively without directly sharing data. It is largely unexplored how existing deep learning algorithms can be realized within a FL framework, thereby overcoming network communications and adversarial threats. Moreover, owing to the vast number of available trained models and highly heterogeneous mobile devices, it is no mean feat to identify and deploy the right model for individual edge devices.

In this line of research, we are designing learning algorithms and prototyping system solutions for ML training and inference on distributed edge devices. Topics include:


Robust and Adversarial Machine Learning

Research questions: Can today’s deep neural networks handle noisy data sets, namely corrupted inputs and labels? How to design novel learning algorithms to dstill the data quality and enhance the robustness of learning models when encountering noisy and adversarial input?

We are working on noise resilient learning frameworks, leveraging adversarial examples, expert judgement, and robust loss functions.


Federated Learning Systems: Incentive and Backdoor

Research questions: Federated learning framework preserves privacy by design as user data stays on devices. How to provide the incentives for users at the federated learning systems? How to value the contributed models from other users? Can we trust the models provided by other users?

We are designing incentive mechanisms and defense strategies against backdoor attacks for the federated learning systems, from Bayesian models to deep neuralnetworks.


Deep Model Inferences on Edge Devices

Research questions: How to choose suitable trained models from the plethoral of existing ones and deploy at the edge devices? How to optimize the performance of deep models on the edge devices? Can today’s edge devices efficiently execute multiple DNNs at the same time, e,.g., extracting information of people, aged and gender from images?

We are working on various scheduling and model selection algorithms to adaptively run multiple DNNs on resource limited edge devices, in fulfilling various users’ requirements.


Optimization for Machine Learning Services and Clusters

Research questions: Training deep models consumes tremendous computing time and resources; however tuning the hyperparameter of deep models is even multiple fold higher. Can one design efficient and accurate tuning framework for deep neural networks such that the optimal parameters can be found at minimum computational resources?

We are working on accelerating processing strategies that only execute critical data and leverage the workload similarities when tuning hyperparameters and training a wide range of ML models.


Fair Information Maximization on Social Media

Research questions: Users on social media with high visibility are often selected as seeds to spread information and affect their adoption in target groups. Even though female users are more active on social media than male users, males are regarded as influential in various centrality measures.

We are trying to answer how gender differences and similarities can impact the information spreading process on social media. We are developing disparity-aware seeding algorithms.