RGC Young Collaborative Research Grant (PC: Prof. Bo Han, Department of Computer Science, Hong Kong Baptist University)
Project Award Information
Award Number: RGC YCRG C2005-24Y
Title: Towards Trustworthy Foundation Models under Imperfect Scenarios
Principal Investigator (PC): Prof. Bo Han, Department of Computer Science, Hong Kong Baptist University
Project Summary
We have entered a new age of artificial intelligence, since Foundation Models (FMs) like ChatGPT and Sora emerge as pivotal tools with great capabilities in a broad range of domains and tasks. However, the deployment of FMs has surfaced critical concerns, particularly in robustness, safety, fairness, and reliability. In social science, while FMs offer advanced analysis of extensive qualitative data sets, they also face the problem of ensuring robustness against data anomalies and fairness in representation. In medical sciences, FMs promise a revolution through their ability to process large-scale medical datasets, yet they must do so with utmost safety and reliability to prevent harmful outcomes. Therefore, this project introduces solutions to the issues of FMs by developing trustworthy FMs. Specifically, trustworthy FMs will address the four grand challenges, including robustness against noisy inputs, safety against adversarial prompts, fairness against biased training data, and reliability against insufficient knowledge. Moreover, by developing advanced and targeted solutions, this project aims to bolster the functionality and dependability of trustworthy FMs, particularly within the critical spheres of social and medical sciences, thereby facilitating their responsible and beneficial integration into these fields. In summary, this collaborative project is expected to address the four grand challenges and construct trustworthy FMs, which can be further deployed to broader scientific and industrial applications.
Research Publications
The following papers focus on robustness against noisy inputs:
noisy test-time adaptation in vision-language models (ICLR'25)
active reasoning benchmark (ICML'25)
belief-driven multi-agent LLM reasoning (ICML'25)
learning to instruct for visual instruction tuning (NeurIPS'25)
multi-agent debate with memory masking (ICLR'26)
stable self-supervised RL for LLMs reasoning (ICLR'26)
visualizing the reasoning process of large language models (ICLR'26)
The following papers focus on safety against adversarial prompts:
understanding and enhancing the transferability of jailbreaking attacks (ICLR'25)
effective evaluations and comparisons for LLM unlearning (ICLR'25)
exploring criteria of loss reweighting to enhance LLM unlearning (ICML'25)
ensuring jailbreak defense via answer-then-check (ICLR'26)
your downloaded LoRA from sharing platforms might be unsafe (ICLR'26)
LLM unlearning with LLM beliefs (ICLR'26)
The following papers focus on fairness against biased training data:
interpretability through the lens of semantic dependency (ICML'25)
conditional independence test (NeurIPS'25)
a robust method to discover causal or anticausal relation (ICLR'25)
on the thinking-language modeling gap in large language models (ICLR'26)
The following papers focus on reliability against insufficient knowledge:
advancing machine-generated text detection (NeurIPS'25)
detecting generated images by fitting natural image distributions (NeurIPS'25)
understanding valuable preference data for LLM alignment (ICLR'26)
task-aware data selection for LLM finetuning (ICLR'26)
markov-informed calibration for boosting machine-generated text detection (ICLR'26)
Software
noisy test-time adaptation in vision-language models, [code]
active reasoning benchmark, [code]
belief-driven multi-agent LLM reasoning, [code]
learning to instruct for visual instruction tuning, [code]
multi-agent debate with memory masking, [code]
stable self-supervised RL for LLMs reasoning, [code]
visualizing the reasoning process of large language models, [code]
understanding and enhancing the transferability of jailbreaking attacks, [code]
effective evaluations and comparisons for LLM unlearning, [code]
exploring criteria of loss reweighting to enhance LLM unlearning, [code]
ensuring jailbreak defense via answer-then-check, [code]
your downloaded LoRA from sharing platforms might be unsafe, [code]
LLM unlearning with LLM beliefs, [code]
interpretability through the lens of semantic dependency, [code]
conditional independence test, [code]
a robust method to discover causal or anticausal relation, [code]
on the thinking-language modeling gap in large language models, [code]
advancing machine-generated text detection, [code]
detecting generated images by fitting natural image distributions, [code]
understanding valuable preference data for LLM alignment, [code]
task-aware data selection for LLM finetuning, [code]
markov-informed calibration for boosting machine-generated text detection, [code]
Education
UG Course: COMP3065 (2026 Spring)
PG Course: COMP7250 (2025 Spring, 2026 Spring)
Tutorial: AAAI'26 Trustworthy Machine Reasoning with Foundation Models, AAAI'26 When AI "Forgets" for Good: The Science and Practice of Machine Unlearning for AI Safety, AAAI'26 Handling Out-of-Distribution Data in the Open World: Principles and Practice for Reliable AI
Lecture: DeepLearn 2026 Trustworthy Machine Learning from Data to Models, ESSAI 2026 Trustworthy Machine Learning from Data to Models, UTokyo 2026 Guest Lecture Trustworthy Foundation Models
Collaborators
University: Stanford University, Carnegie Mellon University, The University of Texas at Austin, University of California San Diego, University of California Santa Cruz, Université de Montréal, HEC Montréal, The University of Sydney, The University of Melbourne, University of Technology Sydney, The University of Tokyo, Mohamed bin Zayed University of Artificial Intelligence, The Chinese University of Hong Kong, Hong Kong University of Science and Technology, The Hong Kong Polytechnic University, The Hong Kong University of Science and Technology (GuangZhou), University of Macau, Shanghai Jiao Tong University, Fudan University, University of Science and Technology of China, South China University of Technology, Northeastern University, Hefei University of Technology
Institute: RIKEN Center for Advanced Intelligence Project, Mila Québec AI Institute
Industry: ByteDance Seed, Tencent WeChat, Microsoft Research, Intel AI Lab
Acknowlewdgement
This material is based upon work supported by the RGC under Grant No. C2005-24Y. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the RGC.
|