RGC Early CAREER Scheme website)
Weakly Supervised Representation LearningModern machine learning is migrating to the era of complex models (e.g., deep neural networks), which emphasizes the data representation highly. This learning paradigm is known as representation learning. It is noted that representation learning normally requires a plethora of well-annotated data. Nonetheless, for startups or non-profit organizations, such data is barely acquirable due to the cost of labeling data or the intrinsic scarcity in the given domain. These practical issues motivate us to research and pay attention to weakly supervised representation learning (WSRL), since WSRL does not require such a huge amount of annotated data. Over the years, we have developed techniques for weakly supervised representation learning, such as label-noise representation learning and wildly transferable representation learning.
Security, Privacy and Robustness in Machine LearningIn this research thrust, I am interested in the following question: How can we preserve the security, privacy and robustness in training complex models? We have investigated learning algorithms for handling large-scale sensitive data safely. One of the key ideas is to bridge private updates of the primal variable with gradual curriculum learning. We have proposed one of the pioneer approaches for investigating the robustness of residual networks from the perspective of dynamic system. Specifically, we exploited the step factor in the Euler method to control the robustness of ResNet in both its training and generalization. More recently, we derived a series of adversarial learning algorithms, which mainly focus on empirical defense.
Automated, Federated and Graph Machine LearningMotivated by the success of automated machine learning (AutoML), we are exploring to leverage the power of AutoML for addressing the domain problems in trustworthy learning, such as searching the small-loss percentage under noisy labels or robust network structures under adversarial examples. In high level, we have formulated the synertistic interaction between trustworthy learning and automated learning as a bi-level programming. Specifically, we designed a domain-specific search space based on domain knowledge in trustworthy learning. Meanwhile, we proposed a novel Newton algorithm to solve the bi-level optimization problem efficiently. Motivated by the success of federated learning (FL), we are exploring to leverage the power of FL for addressing the data privacy and governance issues, meanwhile maintains the model robustness to noisy labels and adversarial attacks. Besides, in industrial-level FL environments, we are the first to study the collaboration between the device and the cloud, namely the device-cloud collaborative learning (DCCL) framework. More recently, we are working on trustworthy graph neural networks and knowledge graphs.
Interdisciplinary Problems: Healthcare Analytics and Drug DiscoveryUnlabeled data and data with noisy labels are commonly encountered in medical image analysis. To tackle these two intractable problems, this proposed project will use machine learning (ML) technologies to develop robust, efficient and automated diagnosis algorithms, which can be applied to identify diverse diseases. We will verify our proposed methods on a series of public datasets, such as MICCAI BraTS, MICCAI iSeg2019, ChestX-ray14 and ISBI CHAOS. The aim of this project is to reduce the demands of annotated medical data, decrease the costs of manual screening, and prompt the development of smart healthcare. We hope that our designed model can provide reasonable medical interpretation for doctors, helping them better understand the functioning mechanism of intelligent medical diagnosis. More recently, we are working on the synergy between machine learning and drug discovery.